[Bioclusters] Re : Problem starting sge_schedd in Startupitems in
OS X
Chris Dwan
cdwan at bioteam.net
Mon Apr 25 17:01:01 EDT 2005
My gut reaction is to think it has something to do with NFS shares
(SGE_CELL in particular) not being quite ready for use when SGE tries
to start. I have no evidence for why this would be the case, or why it
persists even when we require that SGE start after the NFS daemons
start and the remote filesystems are mounted. Nothing more than a gut
feeling.
The solutions that have worked for me are:
----------------------------------------------------------
* Cron job to check on the daemons every three minutes or so and
restart them as needed (a hack, but a functional one)
* A startup script which is truly local to the node (I tend to put it
in /etc), submitted to "at" from the startup item. This falls in the
category of "delay the startup", but lets the startup script finish and
actually starts the processes at a later time.
-Chris Dwan
On Apr 25, 2005, at 3:49 PM, Rayson Ho wrote:
> Seems to be a bug in OSX. Can those who have OSX try to run this from
> StartItem:
>
> ====================================================================
>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <unistd.h>
> #include <fcntl.h>
>
> main()
> {
> int fd, ret, size;
> char s[1024];
> struct stat buf;
>
> fd = open("/tmp/log.txt", O_CREAT|O_WRONLY|O_TRUNC);
>
> size = sprintf(s, "%d\n", fd);
>
> write(fd, s, size);
>
> if (ret = fstat(0, &buf))
> {
> size = sprintf(s, "fail: %d\n", ret);
> write(fd, s, size);
> }
>
> if (ret = fstat(1, &buf))
> {
> size = sprintf(s, "fail2: %d\n", ret);
> write(fd, s, size);
> }
>
> if (ret = fstat(2, &buf))
> {
> size = sprintf(s, "fail3: %d\n", ret);
> write(fd, s, size);
> }
> }
>
> =====================================================================
>
> It writes to /tmp/log.txt if it fails.
>
> Rayson
>
>
>
> --- Barry J Mcinnes <Barry.J.Mcinnes at noaa.gov> wrote:
>> I spent a lot of time spinning wheels on this. I started with the
>> standard Startup script, massaged it, renamed it, put diagnostic
>> lines
>> it it, and finally put delays (sleep) at the start of it, which
>> eventually made it work 4 out of 5 times on reboot. When it fails, an
>> immediate startup by hand would always work.
>> In the end I stopped trying to use SGE via StartupItems, and now run
>> a
>> cron job, which if the sge process is not running start it -> no more
>> problems, its always running on the client.
>> FWIW, I did even try the PBS startup script in test mode, which never
>> fails, so I still do not know why SGE startup fails randomly.
>>
>> Barry
>> barry.j.mcinnes at noaa.gov
>>
>> _______________________________________________
>> Bioclusters maillist - Bioclusters at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioclusters maillist - Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
More information about the Bioclusters
mailing list