[Biodevelopers] Re: [Bioclusters] Nightly updated BLAST databases
Joseph Landman
landman at scalableinformatics.com
Mon Dec 16 21:23:42 EST 2002
On Mon, 2002-12-16 at 20:51, Jeremy Mann wrote:
> > (4) if all looks good and there are no blast jobs running then change
> > the symbolic link(s) so that your newly built databases in volume B are
> > the ones that people use when a search is fired off. As before the
> > methods for figuring out 'are there any searches running' can be as
> > complex or a simple as the production environment demands
>
> If you don't mind me asking, how do you do this? How do I control when and
> if the BLAST jobs are running?
Ahh... Are you using queuing system? If not, then you can write a real
simple shell around blast to only start a job when a semaphore is
cleared. If the semaphore is set, it sleeps for 1 second, and then
reads the semaphore again. It gives up after N seconds. It would look
something like this (call it blastall, and place this in the users path
before the real blastall, which you can take out of their path):
#!/usr/bin/perl -w
use strict;
use constant max_count => 86400; # 86400 seconds in a day ...
use constant true => (1==1);
use constant false => (1==0);
my $count=0;
my $ready_to_run=false;
my $time_elapsed=false;
do {
$count++;
sleep(1); # take a nap
$ready_to_run=(-e "/path/to/semaphore");
$time_elapsed=$count < count_max;
} until ($ready_to_run || $time_elapsed);
if ( $time_elapsed && !$ready_to_run )
{
die "Run timed out. Please look into why the semaphore
file /path/to/semaphore has not been removed\n";
}
if ( !$time_elapsed && $ready_to_run )
{
my $args=join("", at ARGV);
my ($output_handle,$line);
open($output_handle,"/path/to/real/blastall $args|") or die
"cannot run the command /path/to/real/blastall $args. Please
investigate\n";
while ($line=<$output_handle>) {print $line;}
close ($output_handle);
}
Then when you want to stop the next batch of runs, simply
touch /path/to/semaphore
and wait for the machine to quiesce. Once quiet, do the database monte,
and then remove the semaphore:
rm -f /path/to/semaphore
If you are already running a job queuing system, you can pause the queue
after the current set of runs, and then do the db monte.
> I would think there would have to be some
> sort of manual control. Here is what I think I need to do:
>
> 1. Run rsync from crontab (already done)
> 2. Custom script to see if rsync is still running. If so, stop, if not run
> 2nd script, after an hour checks if rsync is still running. I am
> confused as to how to pull this off. If I run it from crontab, I would
> need to add some sort of check to see if 1st script is running, if so,
> don't run again until next day.
> 3. 3rd script runs uncompress | formatdb into another directory. I got
> this one in place.
> 4. 4th script resymlinks db/ from blast/ directory. Need to add a few if
> statements to see if 3rd script is still running and check for existing
> blast jobs.
View the process as a pipeline. You need to inspect for errors every
step of the way. You can use rsync, or the Perl/python modules that do
rsync. Check the error return codes. Don't go to the next pipeline
step if the previous is not done.
--
Joseph Landman <landman at scalableinformatics.com>
More information about the Biodevelopers
mailing list