[Bioclusters] mpiformatdb problem
Susan Chacko
bioclusters@bioinformatics.org
Thu, 4 Mar 2004 14:18:23 -0500
Has anyone successfully built the human genome db with mpiformatdb? Is
there some special gotcha because there are very few, very large
sequences (25 sequences in 3 Gb)?
I'm using mpiBLAST 1.2.1, with the latest NCBI Toolkit (4 Feb 2004).
Other nucleotide dbs build ok with mpiformatdb, but when I try to build
the genome in 25 pieces (for 25 sequences), I consistently don't get
the 00 piece. i.e. the directory contains #.nsq, #.nin and #.nhr for
every piece except 00, where I only see chr_all.fa.00.nin
I've tried:
- applying the patch (patch-NCBIToolbox_Nov14_2003), just in case,
though the docs imply that it is only important for > 100 fragments, so
I can't see why it would help in this situation. Only two hunks of the
patch 'took'. Still didn't get the missing 00 files.
- using an older version of the NCBI Toolkit (Oct 2003).
mpiformatdb command:
mpiformatdb -f ~/mpiblast.conf -N 25 -i chr_all.fa -p F
The formatdb.log says:
Version 2.2.8 [Jan-05-2004]
Started database file "/fdb/genome/human-aug2003/chr_all.fa"
Closing volume /data/susanc/mpiblast//chr_all.fa with 0 sequences, 0
letters(.nsq file = 6158034
0 bytes; .nhr file = 0 bytes)
FDBFinish: Empty nucleotide database...
Version 2.2.8 [Jan-05-2004]
Started database file "/fdb/genome/human-aug2003/chr_all.fa"
Closing volume /data/susanc/mpiblast//chr_all.fa.01 with 1 sequences,
246,127,941 letters(.nsq f
ile = 122496350 bytes; .nhr file = 67 bytes)
Formatted 1 sequences in volume 1
...
We're new to mpiblast (testing it out by user request), so all
suggestions appreciated.
Susan Chacko.
------------------------------------------------------------------------
----------------------
Susan Chacko
Helix Systems
12B/2N207 Ph:
301-435-2982
National Institutes of Health Fax:
301-402-2190
Bethesda, MD 20814 Email:
susanc@nih.gov