[Bioclusters] BLAST job time estimates

Joe Landman bioclusters@bioinformatics.org
Tue, 08 Jun 2004 10:28:39 -0400

On Tue, 2004-06-08 at 10:01, Micha Bayer wrote:

> No , the job works fine. The output is fine and strace does not generate
> any errors.


> I get lots of page faults when I run queries against nr and nt.


> > > I plan to run the queries against the standard nr and nt databases and
> > > perhaps whole chromosome dbs as well. nt is currently about 2.6 gb, nr
> > > about 600 mb.
> > 
> What is it you count for the database size? Do you count index sizes?

You are more concerned with the size of the index, as this is what is
mmap'ed in.

> The nr database from ftp.ncbi.nlm.nih.gov/blast/db/ is currently 588348k
> compressed (2nd June version), this uncompresses into a 1.4 gb tar file
> which untars into 7 index files of about 1.6 gb altogether.

Egad... turn your back for a few weeks and the thing almost doubles in

Ok, I see part of a problem here.  More in a moment.

> > nr last I downloaded it on May 20th, is 906.8 x 10**6 bytes (~907 MB). 
> > When you uncompress nt, it is much larger.  If you have 1 GB ram, you
> > want to target about 1/3 to 1/2 GB for the index size.  For nr and nt,
> > try using 
> > 
> > 	-v 300 
> > 
> > on the formatdb command line.  Should give you 3 nr segments, and many
> > nt segments.
> I must try this. How do you refer to the segments when you call BLAST?
> Does the syntax change or do you simply do separate runs against each
> segment in turn?

Ok.  First, I would recommend you get the original FASTA formatted
db's.  Go to ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/ and pick up
nr.gz .  Do your own formatting.  Use the -v 300 option I indicated.  If
this isn't noticeably faster (I presume your disk light is blinking
nearly continuously with nt), then try -v 150 or smaller.

The nice thing about the volumes are that BLAST knows how to handle them
"automagically".  They create the database_name.[np]al files with some
metadata about the segments.  BLAST then handles this correctly
(provided you don't get the dreaded "(null).01 ..."


> cheers
> Micha
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615