[Bioclusters] Request for discussions-How to build a biocluster Part 5 (BLAST/DB management)
Sylvain Foisy
bioclusters@bioinformatics.org
Thu, 2 May 2002 13:57:15 -0400
Hi,
A reminder: this is coming from a total newbie at this BioCluster stuff.
it is also to serve as the seed of a tutorial/history-of-building site
for our creation. I am a total newbie in UNIX administration and
installation. This is why we will get a system administrator to help us
out. But I still have to figure out the right questions to ask!!
BLAST
OK, which version of BLAST should we use: NCBI or WU? I have used both
and quite franckly for most uses, they are pretty much equal although WU
seems to be faster. Any particular feature from any of these that could
be helpful to specific users?
Also, can BLAST be part of any system image that could be installed from
the head to any node? Or can it be installed on the local disk and then
be access by the system in memory?
THE GENBANK DATABASE
BLAST without the data, what for? OK, what sould be downloaded: the
GenBank database in its own format or the FASTA transformed one that is
found in tha BLAST folder at NCBI? In both cases it is a lot of data.
The idea would be for a user to get the whole GenBank record for a
particular sequence. However, I think that it could be done either way
with scripts.
How should the local database be administered? Reading the archive, I
think that the consensus is that the DB has to be splitted in n pieces
(n=nb of nodes), each piece sent to a particular node, process with
formatdb. Or have I everything wrong? I would be worried that the nodes
which are getting the human sequences or the EST sequences be very hard
working while the ones with the vector sequences are idle. Is it
feasible to divide the DB to split the load over the nodes?
How should the daily updates be performed? The same question applies
because if the same node(s) gets the daily updates, users coming with
daily jobs wil push the nodes hard.
Am I missing something?
This is open for helpful and constructive discussion
Sylvain
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Sylvain Foisy, Ph. D.
Manager
BIONEQ - Le Reseau quebecois de bioinformatique
Genome-Quebec
Tel.: (514) 343-6111 poste 5188
E-mail: foisys@medcn.umontreal.ca
++++++++++++++++++++++++++++++++++++++++++++++++++++++++