My gut reaction is to think it has something to do with NFS shares (SGE_CELL in particular) not being quite ready for use when SGE tries to start. I have no evidence for why this would be the case, or why it persists even when we require that SGE start after the NFS daemons start and the remote filesystems are mounted. Nothing more than a gut feeling. The solutions that have worked for me are: ---------------------------------------------------------- * Cron job to check on the daemons every three minutes or so and restart them as needed (a hack, but a functional one) * A startup script which is truly local to the node (I tend to put it in /etc), submitted to "at" from the startup item. This falls in the category of "delay the startup", but lets the startup script finish and actually starts the processes at a later time. -Chris Dwan On Apr 25, 2005, at 3:49 PM, Rayson Ho wrote: > Seems to be a bug in OSX. Can those who have OSX try to run this from > StartItem: > > ==================================================================== > > #include <sys/types.h> > #include <sys/stat.h> > #include <unistd.h> > #include <fcntl.h> > > main() > { > int fd, ret, size; > char s[1024]; > struct stat buf; > > fd = open("/tmp/log.txt", O_CREAT|O_WRONLY|O_TRUNC); > > size = sprintf(s, "%d\n", fd); > > write(fd, s, size); > > if (ret = fstat(0, &buf)) > { > size = sprintf(s, "fail: %d\n", ret); > write(fd, s, size); > } > > if (ret = fstat(1, &buf)) > { > size = sprintf(s, "fail2: %d\n", ret); > write(fd, s, size); > } > > if (ret = fstat(2, &buf)) > { > size = sprintf(s, "fail3: %d\n", ret); > write(fd, s, size); > } > } > > ===================================================================== > > It writes to /tmp/log.txt if it fails. > > Rayson > > > > --- Barry J Mcinnes <Barry.J.Mcinnes at noaa.gov> wrote: >> I spent a lot of time spinning wheels on this. I started with the >> standard Startup script, massaged it, renamed it, put diagnostic >> lines >> it it, and finally put delays (sleep) at the start of it, which >> eventually made it work 4 out of 5 times on reboot. When it fails, an >> immediate startup by hand would always work. >> In the end I stopped trying to use SGE via StartupItems, and now run >> a >> cron job, which if the sge process is not running start it -> no more >> problems, its always running on the client. >> FWIW, I did even try the PBS startup script in test mode, which never >> fails, so I still do not know why SGE startup fails randomly. >> >> Barry >> barry.j.mcinnes at noaa.gov >> >> _______________________________________________ >> Bioclusters maillist - Bioclusters at bioinformatics.org >> https://bioinformatics.org/mailman/listinfo/bioclusters >> > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioclusters maillist - Bioclusters at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters