[Bioclusters] Re : Problem starting sge_schedd in Startupitems in OS X

Chris Dwan cdwan at bioteam.net
Mon Apr 25 17:01:01 EDT 2005


My gut reaction  is to think it has something to do with NFS shares 
(SGE_CELL in particular) not being quite ready for use when SGE tries 
to start.  I have no evidence for why this would be the case, or why it 
persists even when we require that SGE start after the NFS daemons 
start and the remote filesystems are mounted.  Nothing more than a gut 
feeling.

The solutions that have worked for me are:
----------------------------------------------------------
* Cron job to check on the daemons every three minutes or so and 
restart them as needed (a hack, but a functional one)

* A startup script which is truly local to the node (I tend to put it 
in /etc), submitted to "at" from the startup item.  This falls in the 
category of "delay the startup", but lets the startup script finish and 
actually starts the processes at a later time.

-Chris Dwan

On Apr 25, 2005, at 3:49 PM, Rayson Ho wrote:

> Seems to be a bug in OSX. Can those who have OSX try to run this from
> StartItem:
>
> ====================================================================
>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <unistd.h>
> #include <fcntl.h>
>
> main()
> {
>  int fd, ret, size;
>  char s[1024];
>  struct stat buf;
>
>  fd = open("/tmp/log.txt", O_CREAT|O_WRONLY|O_TRUNC);
>
>  size = sprintf(s, "%d\n", fd);
>
>  write(fd, s, size);
>
>  if (ret = fstat(0, &buf))
>  {
>   size = sprintf(s, "fail: %d\n", ret);
>   write(fd, s, size);
>  }
>
>  if (ret = fstat(1, &buf))
>  {
>   size = sprintf(s, "fail2: %d\n", ret);
>   write(fd, s, size);
>  }
>
>  if (ret = fstat(2, &buf))
>  {
>   size = sprintf(s, "fail3: %d\n", ret);
>   write(fd, s, size);
>  }
> }
>
> =====================================================================
>
> It writes to /tmp/log.txt if it fails.
>
> Rayson
>
>
>
> --- Barry J Mcinnes <Barry.J.Mcinnes at noaa.gov> wrote:
>> I spent a lot of time spinning wheels on this. I started with the
>> standard Startup script, massaged it, renamed it, put diagnostic
>> lines
>> it it, and finally put delays (sleep) at the start of it, which
>> eventually made it work 4 out of 5 times on reboot. When it fails, an
>> immediate startup by hand would always work.
>> In the end I stopped trying to use SGE via StartupItems, and now run
>> a
>> cron job, which if the sge process is not running start it -> no more
>> problems, its always running on the client.
>> FWIW, I did even try the PBS startup script in test mode, which never
>> fails, so I still do not know why SGE startup fails randomly.
>>
>> Barry
>> barry.j.mcinnes at noaa.gov
>>
>> _______________________________________________
>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters



More information about the Bioclusters mailing list