[Bioclusters] Memory Usage for Blast - question

Lucas Carey lcarey at odd.bio.sunysb.edu
Thu Mar 10 15:20:03 EST 2005


On Thursday, March 10, 2005 at 12:48 -0600, Dinanath Sulakhe wrote:
> Thank You everyone for the suggestions ..
> 
> I have decided to use -b 100 option with blastall, and removing -F F option 
> as we currently don't need that.
> and -v 200 option for formatdb.
> 
> We are using the  NR (non-redundant database from NCBI) as the DB for blast 
> which is currently about 1 GB with ~2.3 Million sequences in it, so what 
> number would be appropriate for "-v" option in formatdb ?
the lowest value so that you no longer run out of memory
> 
> Does fragmenting the DB affect the speed of blast computations?
This is from a benchmark I did for in November 2002. I don't remember what query/db this was, most likely blastn, or possibly blastp. 
#frag 'wall time'
1 787.29
2 791.69
4 794.67
5 798.521
8 794.01
10 805.812
15 809.324
16 803.7
20 814.032
25 821.86
30 827.643
31 821.38
35 838.594
40 856.84
45 879.683
50 910.147
55 933.658
60 949.782
63 940.81
65 963.216
69 987.196
74 985.41
80 996.06
83 1008.153
86 1014.527
90 1026.124
94 1042.814

Splitting up queries involves much less overhead. I threw up a figure here, no idea where the original data is.
http://flyex.ams.sunysb.edu/~lcarey/nquery_files.gif
-Lucas


More information about the Bioclusters mailing list