Hi Bruce, given the (ever increasing) sizes of databases bioinformatics software = is run against - what size of local space would you recommend? Ognen >-----Original Message----- >From: Bruce O'Neel [mailto:bruce.oneel@obs.unige.ch] >Sent: Thursday, April 24, 2003 9:55 AM >To: bioclusters >Subject: [Bioclusters] Re: blast and nfs > > >Hi, > >I thought that I'd emphasize a few things that Chris and Joseph have >already said. > >Except for a few small subfields, scientific computing tends to be i/o >bound. As already pointed out, feeding a lot of data through what is >basically a fast serial connection is a bad idea. If you use 100 >megabit ethernet you max out somewhere around 40 megabits or so >because you can't use the full channel bandwith. This is somewhere >around 4 or so megabytes per second, which most of you will recognize >is way below the low end of one hard disk. Things only improve by a >facter of 10 or so if you use gigbit ethernet so that doesn't really >save you there either. > >That, combined with modern OSs hard work to cache disks well, and then >combined with cheap IDE hard disks, means that it almost always is a >win to put your data locally. Using disk striping helps even more but >may not always be necessary and should be tested. > >NFS is good for things like login directories where you read small >files once or twice and for source code repositories where you don't >keep re-reading the files. > >NFS is very bad for big files since (basically) every 8k bytes or so >requires the file to be reopened on the server, then you have to seek, >then 8k bytes is read, and then closed again. > >To make things worse some labs then do the incremential aproach to >NFS, where as you add each system the spare disk space on that system >is dedicated to something, and then mounted on all other systems. >This is very bad since then for most work to happen ALL systems have >to be up and functioning. Plus you end up with NFS traffic all over >your network. It does keep your switch busy though :-) > >Far better is to have a central NFS server for all of your home >directories, and then have your central archives >mirrored/rsynced/whatever to your different compute nodes. > >Of course, your mileage may vary since each lab is different. > >cheers > >bruce > >--=20 >.. there is no area or function that someone can't try to put together >with bubble gum and bailing wire. -- Strata Chalup > >Bruce O'Neel phone: +41 22 950 91 57 >INTEGRAL Science Data Centre +41 22 950 91 00 (switchb.) >Chemin d'Ecogia 16 fax: +41 22 950 91 35 >CH-1290 VERSOIX e-mail: Bruce.Oneel@obs.unige.ch >Switzerland WWW: http://isdc.unige.ch/ > >_______________________________________________ >Bioclusters maillist - Bioclusters@bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bioclusters >