[Bioclusters] Re: blast and nfs

Thu, 24 Apr 2003 11:43:48 -0400 (EDT)

Hi Ognen:

  The price knee is at about the 60GB region.  40GB drives are in the 
60-70$US region and 60GB are in the 80$US region.  I would recommend using 
software RAID0 with IDE (not hardware raid).

  Joe

On Thu, 24 Apr 2003, Duzlevski, Ognen wrote:

> Hi Bruce,
> 
> given the (ever increasing) sizes of databases bioinformatics software is run against - what size of local space would you recommend?
> 
> Ognen
> 
> >-----Original Message-----
> >From: Bruce O'Neel [mailto:bruce.oneel@obs.unige.ch]
> >Sent: Thursday, April 24, 2003 9:55 AM
> >To: bioclusters
> >Subject: [Bioclusters] Re: blast and nfs
> >
> >
> >Hi,
> >
> >I thought that I'd emphasize a few things that Chris and Joseph have
> >already said.
> >
> >Except for a few small subfields, scientific computing tends to be i/o
> >bound.  As already pointed out, feeding a lot of data through what is
> >basically a fast serial connection is a bad idea.  If you use 100
> >megabit ethernet you max out somewhere around 40 megabits or so
> >because you can't use the full channel bandwith.  This is somewhere
> >around 4 or so megabytes per second, which most of you will recognize
> >is way below the low end of one hard disk.  Things only improve by a
> >facter of 10 or so if you use gigbit ethernet so that doesn't really
> >save you there either.
> >
> >That, combined with modern OSs hard work to cache disks well, and then
> >combined with cheap IDE hard disks, means that it almost always is a
> >win to put your data locally.  Using disk striping helps even more but
> >may not always be necessary and should be tested.
> >
> >NFS is good for things like login directories where you read small
> >files once or twice and for source code repositories where you don't
> >keep re-reading the files.
> >
> >NFS is very bad for big files since (basically) every 8k bytes or so
> >requires the file to be reopened on the server, then you have to seek,
> >then 8k bytes is read, and then closed again.
> >
> >To make things worse some labs then do the incremential aproach to
> >NFS, where as you add each system the spare disk space on that system
> >is dedicated to something, and then mounted on all other systems.
> >This is very bad since then for most work to happen ALL systems have
> >to be up and functioning.  Plus you end up with NFS traffic all over
> >your network.  It does keep your switch busy though :-)
> >
> >Far better is to have a central NFS server for all of your home
> >directories, and then have your central archives
> >mirrored/rsynced/whatever to your different compute nodes.
> >
> >Of course, your mileage may vary since each lab is different.
> >
> >cheers
> >
> >bruce
> >
> >-- 
> >.. there is no area or function that someone can't try to put together
> >with bubble gum and bailing wire. -- Strata Chalup
> >
> >Bruce O'Neel                       phone:  +41 22 950 91 57
> >INTEGRAL Science Data Centre               +41 22 950 91 00 (switchb.)
> >Chemin d'Ecogia 16                 fax:    +41 22 950 91 35
> >CH-1290 VERSOIX                    e-mail: Bruce.Oneel@obs.unige.ch
> >Switzerland                        WWW:    http://isdc.unige.ch/
> >
> >_______________________________________________
> >Bioclusters maillist  -  Bioclusters@bioinformatics.org
> >https://bioinformatics.org/mailman/listinfo/bioclusters
> >
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>