Thank you all for the very informative replies. I will contact some of you off list for more info on possible solutions. Just to clarify, I meant 32 terabytes of raw space so we expect this to decrease by around 50% when utilizing Raid10. -Bonnie On Oct 1, 2007, at 4:28 AM, Guy Coates wrote: > Bonnie Hurwitz wrote: >> Hi all, >> >> I was just wondering if anyone has a recommendation for mid-range >> storage. We need to purchase a storage server for our cluster to >> act as >> a mySQL database server that has around 32 terabytes of disk space >> utilizing Raid10. Also, we are looking for fast disks and a 10-20Gb >> card since this is meant for a database server and we want to try to >> minimize resource contention from writing to the db server from the >> nodes. We currently have 500 nodes on our cluster. >> >> What are people currently using for similar database servers? >> What has >> the performance been like when writing to the databases from >> compute nodes? > > It is quite easy to overload a well tuned, beefy mysql server from > a small > compute farm. There are several things you can do to increase the > server > performance. > > > 1) Use innodb rather than myisam tables. myisam tables have some > really nasty > performance bottlenecks. update and deletes require an exclusive > table level > lock, so if you have lots of jobs trying to update a database at > the same time, > performance will be abysmal. innodb does not suffer from these > issues, so you > should use it. > > innodb is also a more robust data format, so when you crash your > database, you > don't have to wait an age whilst your myisamchk all your tables. > > > 2) Bump up the various mysql buffer sizes; key_buffer_size / > innodb_buffer_pool_size are the important ones, but be aware that > you can't set > key_buffer > 4GB, otherwise you'll crash the database. (See the > links below for > a full explanation of what these do.) > > 3) Think about throttling your jobs. Here at Sanger, we feed > database load > information into our queuing system. We use a rough metric of > load=(number_of_connections + (number_of_queries*10)). > > The queuing system can then use this load information to throttle > job execution > on the cluster and prevent the database from being overwhelmed. > > > There is a good selection of mysql performance tips here: > > http://www.mysqlperformanceblog.com/ > > eg > > http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in- > mysql-server-after-installation/ > > > Cheers, > > Guy > > -- > Dr. Guy Coates, Informatics System Group > The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK > Tel: +44 (0)1223 834244 x 6925 > Fax: +44 (0)1223 496802 > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioclusters maillist - Bioclusters at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters