[Bioclusters] Xserve/iNquiry mpi/ssh issues
Dan Swan
dswan at bioinformatics.org
Thu Feb 10 05:48:21 EST 2005
Hi all,
I originally posted this to the iNquiry list but I'm not sure people
are using it that heavily.. I'm new to Xserve/iNquiry, my background is
Linux/Condor.
I wish to run the MPI version of MrBayes. I followed the MPICH
install instructions from:
http://bioteam.net/faq/index.php?sid=&lang=en&action=artikel&cat=4&id=33&artlang
which are very straightforward. The problem is that when I execute the
cpi test program with mpirun and -np > 1 mpirun hangs for about 5 minutes
before spitting back:
-su-2.05b# /usr/local/mpich-1.2.6/ch_p4/bin/mpirun -np 2
/usr/local/mpich-1.2.6/ch_p4/examples/cpi
p0_3247: p4_error: Timeout in making connection to remote process on
node01.cluster.private: 0
Killed by signal 2.
my machines.* file looks like this:
node01.cluster.private
node02.cluster.private
node03.cluster.private
node04.cluster.private
node05.cluster.private
node06.cluster.private
node07.cluster.private
node08.cluster.private
node09.cluster.private
node10.cluster.private
node11.cluster.private:2
node12.cluster.private:2
node13.cluster.private:2
node14.cluster.private:2
node15.cluster.private
node16.cluster.private:2
If I start mpirun and log into node01 a ps -ax shows:
1162 ?? Ss 0:00.01 /usr/local/mpich-1.2.6/ch_p4/examples/cpi
biocluster
so the job appears to arrive (at least on one node). I'm a bit stumped.
Googling for that error people start suggesting it might be to do with ssh
requiring passwordless authentication (although I am currently doing this
as root and am not sure this is the issue yet). But I started looking
into it and threw up another problem :/
Passwordless ssh works fine for the root user from the head node to the
cluster nodes, but not for any other user on the machine. And no matter
how I try, I cannot get passwordless authentication working for a non-root
user on the cluster.
Is there some MacOS X voodoo I am not aware of? I cannot ssh to the
cluster nodes as a non-root user, so I assume there is some "deeper"
authentication issue here? (passwordless ssh is normally trivial to set
up). User home directories are being exported from the head node to the
nodes, although a root user on the node seems unable to ls -l any users
~/.ssh/. Any hints gratefully appreciated.
Cheers!
Dan
--
Dr Daniel Swan - Bioinformatics Support Unit
924 Claremont Tower, University of Newcastle, Newcastle, NE1 7RU
Tel: +44 (0)191 222 7856
http://www.ncl.ac.uk/bioinformatics/support || d.c.swan at ncl.ac.uk
More information about the Bioclusters
mailing list