[Bioclusters] mpiblast

Joe Landman landman at scalableinformatics.com
Wed Mar 15 22:03:03 EST 2006


Hi Rodney:

Rodney Dyer wrote:
> On Wed, 2006-03-15 at 08:17 -0500, Chuming Chen wrote:

[...]

>>It took about 17 seconds per sequence when I run 721  sequences  (542K) 
>>against 22G database.
>>But when I run 88532 sequences (47M) against the same database, it  took 
>>about 1 minute.
>>
>>Can the performance be improved if the query sequence is in a  
>>relatively smaller size?
>>
>>Thank you for your kind comments and suggestions.

[...]

> On some level, I have to tell my students that we need to place this in
> perspective and consider that that you ran 88,532 sequences against a
> 22G database and it _ONLY_ took you about a minute. 

I think Chuming meant a minute per sequence (please correct me if I 
misread this), and 17 seconds per sequence.  Actual execution times are 
17 seconds/sequence * 721 sequences = 12257 seconds or about 1/7th of a 
day, and 60 seconds/sequence * 88532 sequences = 5.3 x 10**6 seconds or 
61 days.  I am not sure on the second one if Chuming meant 1 
minute/sequence or 1 minute for the entire run.  If the latter, I think 
it is worth checking your results, as I am willing to bet that there is 
an error message lurking in there somewhere ...

  So if you scripted
> it, you could possibly get run 42,495,360 sequences in an average work
> day and 212,476,800 in an average work week and somewhere in the
> neighborhood of 11,048,793,600 for a given year.  Not bad for some small
> perl code some data mining.

Back in the SGI-GenomeCluster days (2000) and later in the MSC.Life days 
(2001-2002), we were doing things like this on small clusters (with much 
smaller databases on slower processors).  We would blast 25k sequences 
on 16 CPUs in about 1 hour against a far smaller nr.  Average sequence 
length was about 1k.  FWIW it was in Perl.

Whats nice about mpiblast is that you can do it against nt and other 
hulking monsters of databases.

>  

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615


More information about the Bioclusters mailing list