[Bioclusters] Business-ish stuff: monitoring cluster usage, and how to pay for it all

Tim White bioclusters@bioinformatics.org
Wed, 4 Sep 2002 15:07:34 +1200


Thanks for your quick reply Joseph.

----- Original Message -----
From: "Joseph Landman" <landman@scalableinformatics.com>
To: "biocluster" <bioclusters@bioinformatics.org>
Sent: Wednesday, September 04, 2002 2:32 PM
Subject: Re: [Bioclusters] Business-ish stuff: monitoring cluster usage,
andhow to pay for it all


> On Tue, 2002-09-03 at 22:22, Tim White wrote:
>
> > 1.  What charging scheme do you use?  Options range from a one-off
"lifetime
> > membership" charge for a whole company or university to charging by the
> > wallclock or CPU minute.
>
> Most charging metrics sum up the usage of the entire process, so you
> charge by the process CPU-memory integral.  However, chargeback kills
> usage.  Especially when cycles are so inexpensive to purchase.

We are very wary of the spectre of the woefully underutilised supercomputer.
Unfortunately we have to have some scheme for recouping expenses here, and
it's likely that we will go with the strategy of having users buy the right
to use the cluster for a fixed fee, just once or maybe once a year.  This
seems more in line with the main purpose of the computer, which is
conducting research.  We want to encourage university departments etc. to
budget for use of this cluster by their members.  I'd like to know if this
scheme is typical of other similar setups, and if so, how well it has
worked.

>
> > 2.  How much interest do you have from the commercial sector for using
up
> > unused clock cycles?  Is this a useful approach for meeting costs?
>
> Many commercial organizations want to keep their data in-house for legal
> reasons.

Good point (unfortunately).

>
> > 3.  How do you prioritise these users fairly?
>
> Define "fair" and its context.  Is fair defined as everyone getting an
> equal fraction of the machine?  Or what they paid for?  Or a randomly
> selected ticket from a queue?
>
> There are unix based methods to set nice levels for various users.  For
> clusterwide versions of this, you can use the schedulers tools, and
> create groups of users within the jobscheduler.
>
> > 4.  Do you have a way of deciding how many nodes should be allocated to
a
> > particular batch task, based on the number and size of other batch
requests
> > that have occurred or are likely to occur?
>
> Depends upon the code, the usage patterns, and the definition of
> "fair".  It is best to set a policy and reexamine if users start
> yelling.


At this point, no-one here has a firm idea of what "fair" is either.  I'm
hoping to hear about policies that people running other clusters have
used -- particularly clusters with at least two groups of users,
high-priority and low-priority -- and what degree of success (=1/yelling)
they have had.  Hopefully this will shorten the trial-and-error process a
bit.  Certainly we will not be providing users with guarantees of uptime or
throughput, although there is a reasonable expectation for a decent amount
of both of course.

There is a lot of uncertainty surrounding exactly what kind of work will be
done on the cluster, however it seems likely that there will be many large
(>24 hour) batch jobs, such as phylogenetic tree construction, multiple
sequence alignment, protein folding and all kinds of numerical simulations
(by non-bioinf users from engineering and physics), some of which will
specifically require the use of all 128 nodes for full efficiency.  We will
also definitely be running a distributed BLAST server.

>
> > 5.  Are there particular usage patterns you have discovered (e.g. length
and
> > frequency of batch jobs, number of nodes requested or allocated etc.),
which
> > are important to take into account?
> > 6.  (More technical)  Is there any software you would recommend for
> > collected this information automatically?
>
> There used to be job accounting packages independent of schedulers.  The
> LSF product has this capability (among many others).  I do not know the
> PowerCloud tool, but it may have this capability as well.

Thanks, I have seen LSF mentioned elsewhere and will look into it.

>
>
> > At the moment we are planning to allow an initial 6-month period of free
> > access to any user, to determine the level of interest in using such a
> > system, the kinds of usage patterns and to build up an idea of how to
manage
> > the system as we go along, but it would be really beneficial to hear
from
> > others who have been there.
> >
> > Please let me know if there are any further details you need to know.  I
> > look forward to your comments!
> >
> > Thanks in advance,
> >
> > Tim White
> >
> >
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters@bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
> --
> Joseph Landman, Ph.D
> Scalable Informatics LLC
> email: landman@scalableinformatics.com
>   web: http://scalableinformatics.com
> phone: +1 734 612 4615
>

Thanks again Joseph,

Regards,
Tim White

> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>