[Bioclusters] Xserve/iNquiry mpi/ssh issues

Chris Dagdigian dag at sonsorol.org
Thu Feb 10 07:07:29 EST 2005


Hi Dan,

The root user home directory is not shared cluster wide on an inquiry 
cluster. If you are running your mpi example out of /private/var/root/ 
this is probably why it is failing.

Getting passwordless SSH to work for non-root users is generally a very 
simple operation. Drop a line to support at bioteam.net and you'll get a 
faster response than the inquiry users mailing lists.

-Chris




Dan Swan wrote:
> Hi all,
> 
> I originally posted this to the iNquiry list but I'm not sure people 
> are using it that heavily..  I'm new to Xserve/iNquiry, my background is 
> Linux/Condor.
> 
> I wish to run the MPI version of MrBayes.  I followed the MPICH
> install instructions from:
> 
> http://bioteam.net/faq/index.php?sid=&lang=en&action=artikel&cat=4&id=33&artlang
> 
> 
> which are very straightforward.  The problem is that when I execute the 
> cpi test program with mpirun and -np > 1 mpirun hangs for about 5 minutes 
> before spitting back:
> 
> -su-2.05b# /usr/local/mpich-1.2.6/ch_p4/bin/mpirun -np 2
> /usr/local/mpich-1.2.6/ch_p4/examples/cpi
> p0_3247:  p4_error: Timeout in making connection to remote process on
> node01.cluster.private: 0
> Killed by signal 2.
> 
> my machines.* file looks like this:
> 
> node01.cluster.private
> node02.cluster.private
> node03.cluster.private
> node04.cluster.private
> node05.cluster.private
> node06.cluster.private
> node07.cluster.private
> node08.cluster.private
> node09.cluster.private
> node10.cluster.private
> node11.cluster.private:2
> node12.cluster.private:2
> node13.cluster.private:2
> node14.cluster.private:2
> node15.cluster.private
> node16.cluster.private:2
> 
> If I start mpirun and log into node01 a ps -ax shows:
> 
> 1162  ??  Ss     0:00.01 /usr/local/mpich-1.2.6/ch_p4/examples/cpi
> biocluster
> 
> so the job appears to arrive (at least on one node).  I'm a bit stumped.
> 
> Googling for that error people start suggesting it might be to do with ssh
> requiring passwordless authentication (although I am currently doing this 
> as root and am not sure this is the issue yet).  But I started looking 
> into it and threw up another problem :/
> 
> Passwordless ssh works fine for the root user from the head node to the 
> cluster nodes, but not for any other user on the machine.  And no matter 
> how I try, I cannot get passwordless authentication working for a non-root 
> user on the cluster.
> 
> Is there some MacOS X voodoo I am not aware of?  I cannot ssh to the
> cluster nodes as a non-root user, so I assume there is some "deeper"
> authentication issue here? (passwordless ssh is normally trivial to set
> up).  User home directories are being exported from the head node to the 
> nodes, although a root user on the node seems unable to ls -l any users 
> ~/.ssh/.  Any hints gratefully appreciated.
> 
> Cheers!
> 
> Dan
> 

-- 
Chris Dagdigian, <dag at sonsorol.org>
BioTeam  - Independent life science IT & informatics consulting
Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E iChat/AIM: bioteamdag  Web: http://bioteam.net


More information about the Bioclusters mailing list