[Bio-Linux] Blasting Multiple Fasta Files
Clifford Beall
cliffbeall at gmail.com
Tue May 5 11:17:10 EDT 2015
I have a bash script, written by a previous colleague, that splits up queries then generates blast commands and parallelizes them through xargs.
It does speed up the process a lot, depending on how many cores you have.
It would require some hacking for your use case since the splitting is kind of idiosyncratic, it’s doing a nucleotide blast, and we then post-process the blast results which you would not need.
So you might be better off starting from scratch but let me know if you want to take a look at it.
Clifford Beall, PhD, MSc
cliffbeall at gmail.com <mailto:cliffbeall at gmail.com>
beall.3 at osu.edu <mailto:beall.3 at osu.edu>
Research Assistant Professor
Division of Biosciences
Ohio State U. College of Dentistry
>
> Message: 4
> Date: Tue, 5 May 2015 16:54:59 +0200
> From: Andreas Leimbach <aleimba at gwdg.de>
> To: Bio-Linux help and discussion <bio-linux at nebclists.nerc.ac.uk>
> Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files
> Message-ID: <5548D9C3.7050501 at gwdg.de>
> Content-Type: text/plain; charset="windows-1252"
>
> Hey,
>
> blast+ is not parallelized all that well. Thus, you might want to try
> GNU parallel to speed up your calculations somewhat, depending on your
> machine. Here are some links:
>
> https://www.biostars.org/p/63816/
> https://www.biostars.org/p/76009/
>
> Cheers,
> Andreas
>
>
> Andreas Leimbach
> Universit?t M?nster
> Institut f?r Hygiene
> Mendelstr. 7
> D-48149 M?nster
> Germany
>
> Tel.: +49 (0)551 39 33843
> E-Mail: aleimba at gwdg.de
>
> On 05.05.2015 16:31, Zain A Alvi wrote:
>> Hi Marty,
>>
>> I apologize for the confusion. I am splitting a fasta file that contains approximately 100,000 fasta sequences to 100 fasta files that contains 1000 sequences each. I am hoping this will expedite the BLASTx process.
>>
>>
>> Kind regards,
>>
>>
>> Zain
>>
>> ________________________________
>> From: Martin Gollery <mgollery at unr.edu>
>> Sent: Tuesday, May 5, 2015 10:23 AM
>> To: Bio-Linux help and discussion
>> Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files
>>
>> Running a million BLASTX jobs on one sequence each is not going to save you time. It is better to run one BLASTX job on a million sequences.
>>
>> -Marty
>>
>>
>>
>> On Tue, May 5, 2015 at 7:00 AM, Zain A Alvi <zain.alvi at student.shu.edu<mailto:zain.alvi at student.shu.edu>> wrote:
>>
>> Dear Sir or Madam,
>>
>>
>> I hope everything is well. I have downloaded all the viral protein sequences from the NCBI refseq database using their script from their E-book. I have de-novo assembled some viral genomes and I know BLASTX takes a long time if the fasta is large. I have been able to split the large fasta file based on an user specified contig number in each new fasta file.
>>
>>
>> I was wondering is there a method to run BLASTX automatically on each of the fasta files one at a time so that it will be able to complete in a "shorter" amount of time as compared to BLASTing the whole large de-novo assembled fasta file. Then I was hoping to concatenate all the results into one file.
>>
>>
>> Sincerely,
>>
>>
>> Zain
>>
>> _______________________________________________
>> Bio-Linux mailing list
>> Bio-Linux at nebclists.nerc.ac.uk<mailto:Bio-Linux at nebclists.nerc.ac.uk>
>> http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux
>>
>>
>>
>>
>> --
>> --
>> Martin Gollery
>> Senior Bioinformatics Scientist
>> Tahoe Informatics
>> www.bioinformaticist.biz<http://www.bioinformaticist.biz>
>> www.hiddenmarkovmodels.com<http://www.hiddenmarkovmodels.com>
>>
>>
>>
>>
>> _______________________________________________
>> Bio-Linux mailing list
>> Bio-Linux at nebclists.nerc.ac.uk
>> http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux
>>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Bio-Linux mailing list
> Bio-Linux at nebclists.nerc.ac.uk
> http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux
>
>
> ------------------------------
>
> End of Bio-Linux Digest, Vol 80, Issue 3
> ****************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bio-linux-list/attachments/20150505/282d1966/attachment.html>
More information about the Bio-linux-list
mailing list