[Bioclusters] mid range storage solutions

Mon Oct 1 15:45:18 EDT 2007

Thank you all for the very informative replies.  I will contact some  
of you off list for more info on possible solutions.  Just to  
clarify, I meant 32 terabytes of raw space so we expect this to  
decrease by around 50% when utilizing Raid10.

-Bonnie

On Oct 1, 2007, at 4:28 AM, Guy Coates wrote:

> Bonnie Hurwitz wrote:
>> Hi all,
>>
>> I was just wondering if anyone has a recommendation for mid-range
>> storage.  We need to purchase a storage server for our cluster to  
>> act as
>> a mySQL database server that has around 32 terabytes of disk space
>> utilizing Raid10.  Also, we are looking for fast disks and a 10-20Gb
>> card since this is meant for a database server and we want to try to
>> minimize resource contention from writing to the db server from the
>> nodes.  We currently have 500 nodes on our cluster.
>>
>> What are people currently using for similar database servers?   
>> What has
>> the performance been like when writing to the databases from  
>> compute nodes?
>
> It is quite easy to overload a well tuned, beefy  mysql server from  
> a small
> compute farm. There are several things you can do to increase the  
> server
> performance.
>
>
> 1) Use innodb rather than myisam tables.  myisam tables have some  
> really nasty
> performance bottlenecks. update and deletes require an exclusive  
> table level
> lock, so if you have lots of jobs trying to update a database at  
> the same time,
> performance will be abysmal. innodb does not suffer from these  
> issues, so you
> should use it.
>
> innodb is also a more robust data format, so when you crash your  
> database, you
> don't have to wait an age whilst your myisamchk all your tables.
>
>
> 2) Bump up the various mysql  buffer sizes; key_buffer_size /
> innodb_buffer_pool_size are the important ones, but be aware that  
> you can't set
> key_buffer > 4GB, otherwise you'll crash the database. (See the  
> links below for
> a full explanation of what these do.)
>
> 3) Think about throttling your jobs. Here at Sanger, we feed  
> database load
> information into our queuing system.  We use a rough metric  of
> load=(number_of_connections  + (number_of_queries*10)).
>
> The queuing system can then use this load information to throttle  
> job execution
> on the cluster and prevent the database from being overwhelmed.
>
>
> There is a good selection of mysql performance tips here:
>
> http://www.mysqlperformanceblog.com/
>
> eg
>
> http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in- 
> mysql-server-after-installation/
>
>
> Cheers,
>
> Guy
>
> -- 
> Dr. Guy Coates,  Informatics System Group
> The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
> Tel: +44 (0)1223 834244 x 6925
> Fax: +44 (0)1223 496802
>
>
> -- 
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters