[Bioclusters] Re: blast and nfs

Duzlevski, Ognen bioclusters@bioinformatics.org
Thu, 24 Apr 2003 10:01:56 -0500


Hi Bruce,

given the (ever increasing) sizes of databases bioinformatics software =
is run against - what size of local space would you recommend?

Ognen

>-----Original Message-----
>From: Bruce O'Neel [mailto:bruce.oneel@obs.unige.ch]
>Sent: Thursday, April 24, 2003 9:55 AM
>To: bioclusters
>Subject: [Bioclusters] Re: blast and nfs
>
>
>Hi,
>
>I thought that I'd emphasize a few things that Chris and Joseph have
>already said.
>
>Except for a few small subfields, scientific computing tends to be i/o
>bound.  As already pointed out, feeding a lot of data through what is
>basically a fast serial connection is a bad idea.  If you use 100
>megabit ethernet you max out somewhere around 40 megabits or so
>because you can't use the full channel bandwith.  This is somewhere
>around 4 or so megabytes per second, which most of you will recognize
>is way below the low end of one hard disk.  Things only improve by a
>facter of 10 or so if you use gigbit ethernet so that doesn't really
>save you there either.
>
>That, combined with modern OSs hard work to cache disks well, and then
>combined with cheap IDE hard disks, means that it almost always is a
>win to put your data locally.  Using disk striping helps even more but
>may not always be necessary and should be tested.
>
>NFS is good for things like login directories where you read small
>files once or twice and for source code repositories where you don't
>keep re-reading the files.
>
>NFS is very bad for big files since (basically) every 8k bytes or so
>requires the file to be reopened on the server, then you have to seek,
>then 8k bytes is read, and then closed again.
>
>To make things worse some labs then do the incremential aproach to
>NFS, where as you add each system the spare disk space on that system
>is dedicated to something, and then mounted on all other systems.
>This is very bad since then for most work to happen ALL systems have
>to be up and functioning.  Plus you end up with NFS traffic all over
>your network.  It does keep your switch busy though :-)
>
>Far better is to have a central NFS server for all of your home
>directories, and then have your central archives
>mirrored/rsynced/whatever to your different compute nodes.
>
>Of course, your mileage may vary since each lab is different.
>
>cheers
>
>bruce
>
>--=20
>.. there is no area or function that someone can't try to put together
>with bubble gum and bailing wire. -- Strata Chalup
>
>Bruce O'Neel                       phone:  +41 22 950 91 57
>INTEGRAL Science Data Centre               +41 22 950 91 00 (switchb.)
>Chemin d'Ecogia 16                 fax:    +41 22 950 91 35
>CH-1290 VERSOIX                    e-mail: Bruce.Oneel@obs.unige.ch
>Switzerland                        WWW:    http://isdc.unige.ch/
>
>_______________________________________________
>Bioclusters maillist  -  Bioclusters@bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters
>