[Bioclusters] Xserve/iNquiry mpi/ssh issues
Chris Dagdigian
dag at sonsorol.org
Thu Feb 10 07:07:29 EST 2005
Hi Dan,
The root user home directory is not shared cluster wide on an inquiry
cluster. If you are running your mpi example out of /private/var/root/
this is probably why it is failing.
Getting passwordless SSH to work for non-root users is generally a very
simple operation. Drop a line to support at bioteam.net and you'll get a
faster response than the inquiry users mailing lists.
-Chris
Dan Swan wrote:
> Hi all,
>
> I originally posted this to the iNquiry list but I'm not sure people
> are using it that heavily.. I'm new to Xserve/iNquiry, my background is
> Linux/Condor.
>
> I wish to run the MPI version of MrBayes. I followed the MPICH
> install instructions from:
>
> http://bioteam.net/faq/index.php?sid=&lang=en&action=artikel&cat=4&id=33&artlang
>
>
> which are very straightforward. The problem is that when I execute the
> cpi test program with mpirun and -np > 1 mpirun hangs for about 5 minutes
> before spitting back:
>
> -su-2.05b# /usr/local/mpich-1.2.6/ch_p4/bin/mpirun -np 2
> /usr/local/mpich-1.2.6/ch_p4/examples/cpi
> p0_3247: p4_error: Timeout in making connection to remote process on
> node01.cluster.private: 0
> Killed by signal 2.
>
> my machines.* file looks like this:
>
> node01.cluster.private
> node02.cluster.private
> node03.cluster.private
> node04.cluster.private
> node05.cluster.private
> node06.cluster.private
> node07.cluster.private
> node08.cluster.private
> node09.cluster.private
> node10.cluster.private
> node11.cluster.private:2
> node12.cluster.private:2
> node13.cluster.private:2
> node14.cluster.private:2
> node15.cluster.private
> node16.cluster.private:2
>
> If I start mpirun and log into node01 a ps -ax shows:
>
> 1162 ?? Ss 0:00.01 /usr/local/mpich-1.2.6/ch_p4/examples/cpi
> biocluster
>
> so the job appears to arrive (at least on one node). I'm a bit stumped.
>
> Googling for that error people start suggesting it might be to do with ssh
> requiring passwordless authentication (although I am currently doing this
> as root and am not sure this is the issue yet). But I started looking
> into it and threw up another problem :/
>
> Passwordless ssh works fine for the root user from the head node to the
> cluster nodes, but not for any other user on the machine. And no matter
> how I try, I cannot get passwordless authentication working for a non-root
> user on the cluster.
>
> Is there some MacOS X voodoo I am not aware of? I cannot ssh to the
> cluster nodes as a non-root user, so I assume there is some "deeper"
> authentication issue here? (passwordless ssh is normally trivial to set
> up). User home directories are being exported from the head node to the
> nodes, although a root user on the node seems unable to ls -l any users
> ~/.ssh/. Any hints gratefully appreciated.
>
> Cheers!
>
> Dan
>
--
Chris Dagdigian, <dag at sonsorol.org>
BioTeam - Independent life science IT & informatics consulting
Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E iChat/AIM: bioteamdag Web: http://bioteam.net
More information about the Bioclusters
mailing list