The December 2001 version of "formatdb" will split up your targets into chunks of arbitrary size for you, via the "-v <max_size_of_a_chunk>" flag. I think that it was intended to get around file size limitations on some larger datasets / older OS's, but it also works nicely for my group to keep things under the RAM / CPU performace transition point. -Chris Dwan cdwan-at-ccgb.umn.edu CCGB - University of Minnesota Eric Engelhard writes: > I agree with jfreeman that this howto is a good place to start, but you > may not want to bother with RedHat 5.2. I set a personal speed record > building out a small (8 node) cluster last week using RedHat 7.2. I used > the kickstart gui to configure bootdisks for the slave nodes. This is an > embarrassingly parallel blast cluster (NFS, postgres, NCBI blastall, > rexec/rsh, and perl). > > Performance hint: What you really want to do with this kind of cluster > is to have a good enough local RAM to refdb ratio to prevent disk I/O > churning. If you can run a whole batch with only an initial read, then > the next bottleneck will be the CPU/BUS speed, which is a fairly high > bar. I haven't challenged the performance on this little cluster, but my > work cluster (18 nodes 2GB RAM/node) cuts through >1500 queries/minute > against nr. > > In addition to BLAST, this type of system is also ideal for standalone > InterPro. > > I split the reference databases with this (babyperl freebee :-) ): > > > #!/usr/bin/perl > # > # refdb_splitter.pl - Splits a ref fasta db into $N gzipped chunks > for distribution to cluster > # > # Usage: zcat ref_fasta_db(.Z or .gz) | CMGD_splitter.pl > # > > $N = 8; # your number of nodes here (or node number itself if you want > to run an iteration of this script on each node... parallelize the > splitter) > $fasta=""; > $i=0; > $split = 1; > while ($line =<STDIN>) { > if (grep (/^>/, $line)){$i++;} > if ($i == 2){ > if ($split > $N){$split = 1;} # or "if ($split % $N == > 0){" for running at each node > open (PIPE, "|gzip >>ref_fasta_db_$split.gz"); > print PIPE $fasta; > close PIPE; > $fasta = ""; > $split++; > $i = 1; > #} #decomment for parallel version > } > $fasta = "$fasta"."$line"; > } > > -- > Eric Engelhard - www.cvbig.org - www.sagresdiscovery.com > > > jfreeman wrote: > > > > Start Here... > > http://www.beowulf-underground.org/doc_project/BIAA-HOWTO/Beowulf-Installation-and-Administration-HOWTO-5.html > > > > Once you have a small 2 node master/slave cluster running with the slave > > node running starting through tftpboot you are ready for the next level > > of complexity... > > > > Danny Navarro wrote: > > > > > > Hi all, > > > > > > I would like to set up a linux cluster with some pcs to run blast > > > searches against EST human database. First I will try to blast locally > > > in the master node but I would like also to make a blast server > > > available to the intranet. > > > > > > I have to learn a lot about linux clusters but now I don't know exactly > > > how to start to do this, shall I use beowulf or mosix or there are other > > > better alternatives? What do you think is the best system for doing that > > > task? > > > > > > Thanks > > > > > > _______________________________________________ > > > Bioclusters maillist - Bioclusters@bioinformatics.org > > > http://bioinformatics.org/mailman/listinfo/bioclusters > > _______________________________________________ > > Bioclusters maillist - Bioclusters@bioinformatics.org > > http://bioinformatics.org/mailman/listinfo/bioclusters > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > http://bioinformatics.org/mailman/listinfo/bioclusters >