[Bioclusters] Xserve/iNquiry mpi/ssh issues

Dan Swan dswan at bioinformatics.org
Thu Feb 10 05:48:21 EST 2005


Hi all,

I originally posted this to the iNquiry list but I'm not sure people 
are using it that heavily..  I'm new to Xserve/iNquiry, my background is 
Linux/Condor.

I wish to run the MPI version of MrBayes.  I followed the MPICH
install instructions from:

http://bioteam.net/faq/index.php?sid=&lang=en&action=artikel&cat=4&id=33&artlang


which are very straightforward.  The problem is that when I execute the 
cpi test program with mpirun and -np > 1 mpirun hangs for about 5 minutes 
before spitting back:

-su-2.05b# /usr/local/mpich-1.2.6/ch_p4/bin/mpirun -np 2
/usr/local/mpich-1.2.6/ch_p4/examples/cpi
p0_3247:  p4_error: Timeout in making connection to remote process on
node01.cluster.private: 0
Killed by signal 2.

my machines.* file looks like this:

node01.cluster.private
node02.cluster.private
node03.cluster.private
node04.cluster.private
node05.cluster.private
node06.cluster.private
node07.cluster.private
node08.cluster.private
node09.cluster.private
node10.cluster.private
node11.cluster.private:2
node12.cluster.private:2
node13.cluster.private:2
node14.cluster.private:2
node15.cluster.private
node16.cluster.private:2

If I start mpirun and log into node01 a ps -ax shows:

1162  ??  Ss     0:00.01 /usr/local/mpich-1.2.6/ch_p4/examples/cpi
biocluster

so the job appears to arrive (at least on one node).  I'm a bit stumped.

Googling for that error people start suggesting it might be to do with ssh
requiring passwordless authentication (although I am currently doing this 
as root and am not sure this is the issue yet).  But I started looking 
into it and threw up another problem :/

Passwordless ssh works fine for the root user from the head node to the 
cluster nodes, but not for any other user on the machine.  And no matter 
how I try, I cannot get passwordless authentication working for a non-root 
user on the cluster.

Is there some MacOS X voodoo I am not aware of?  I cannot ssh to the
cluster nodes as a non-root user, so I assume there is some "deeper"
authentication issue here? (passwordless ssh is normally trivial to set
up).  User home directories are being exported from the head node to the 
nodes, although a root user on the node seems unable to ls -l any users 
~/.ssh/.  Any hints gratefully appreciated.

Cheers!

Dan

-- 
Dr Daniel Swan - Bioinformatics Support Unit
924 Claremont Tower, University of Newcastle, Newcastle, NE1 7RU
Tel: +44 (0)191 222 7856
http://www.ncl.ac.uk/bioinformatics/support || d.c.swan at ncl.ac.uk



More information about the Bioclusters mailing list