[Bioclusters] mid range storage solutions
Bonnie Hurwitz
hurwitz at cshl.edu
Mon Oct 1 15:45:18 EDT 2007
Thank you all for the very informative replies. I will contact some
of you off list for more info on possible solutions. Just to
clarify, I meant 32 terabytes of raw space so we expect this to
decrease by around 50% when utilizing Raid10.
-Bonnie
On Oct 1, 2007, at 4:28 AM, Guy Coates wrote:
> Bonnie Hurwitz wrote:
>> Hi all,
>>
>> I was just wondering if anyone has a recommendation for mid-range
>> storage. We need to purchase a storage server for our cluster to
>> act as
>> a mySQL database server that has around 32 terabytes of disk space
>> utilizing Raid10. Also, we are looking for fast disks and a 10-20Gb
>> card since this is meant for a database server and we want to try to
>> minimize resource contention from writing to the db server from the
>> nodes. We currently have 500 nodes on our cluster.
>>
>> What are people currently using for similar database servers?
>> What has
>> the performance been like when writing to the databases from
>> compute nodes?
>
> It is quite easy to overload a well tuned, beefy mysql server from
> a small
> compute farm. There are several things you can do to increase the
> server
> performance.
>
>
> 1) Use innodb rather than myisam tables. myisam tables have some
> really nasty
> performance bottlenecks. update and deletes require an exclusive
> table level
> lock, so if you have lots of jobs trying to update a database at
> the same time,
> performance will be abysmal. innodb does not suffer from these
> issues, so you
> should use it.
>
> innodb is also a more robust data format, so when you crash your
> database, you
> don't have to wait an age whilst your myisamchk all your tables.
>
>
> 2) Bump up the various mysql buffer sizes; key_buffer_size /
> innodb_buffer_pool_size are the important ones, but be aware that
> you can't set
> key_buffer > 4GB, otherwise you'll crash the database. (See the
> links below for
> a full explanation of what these do.)
>
> 3) Think about throttling your jobs. Here at Sanger, we feed
> database load
> information into our queuing system. We use a rough metric of
> load=(number_of_connections + (number_of_queries*10)).
>
> The queuing system can then use this load information to throttle
> job execution
> on the cluster and prevent the database from being overwhelmed.
>
>
> There is a good selection of mysql performance tips here:
>
> http://www.mysqlperformanceblog.com/
>
> eg
>
> http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-
> mysql-server-after-installation/
>
>
> Cheers,
>
> Guy
>
> --
> Dr. Guy Coates, Informatics System Group
> The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
> Tel: +44 (0)1223 834244 x 6925
> Fax: +44 (0)1223 496802
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
> _______________________________________________
> Bioclusters maillist - Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
More information about the Bioclusters
mailing list