[Bioclusters] RSH fails on seaguest cluster

Michael Edwards miedward at gmail.com
Wed May 14 15:38:32 EDT 2008


Did your switch break?  Try swapping it out for another one, especially if
all the broken nodes are on one switch (and to a lesser extent if others are
still working and aren't).

On Tue, May 13, 2008 at 5:48 PM, Jeff Thomas <Jeff at cvm.msstate.edu> wrote:

> Hello All,
> We have a cluster with a windows2003 master node  and 8 linux (Fedora 4)
> slave nodes. Everything was working properly but now rsh fails to connect
> to nodes 1-7.
>
> pvm> add node1
> add node1
> 0 successful
>                    HOST     DTID
>                   node1 Can't start pvmd
>
> Auto-Diagnosing Failed Hosts...
> node1...
> Verifying Local Path to "rsh"...
>
> Error - File /usr/ucb/rsh Not Found!
> Determine the path to the "rsh" command on your
> system, and edit PVM_ROOT\conf\WIN32.def
> to adjust the path for the -DRSHCOMMAND=\"\"
> flag.  Then recompile PVM and your applications.
>
>
> I have restarted the entire cluster and cleaned the /tmp pvm*.* files on
> each node multiple times. As well as restarting the BsdRshd service.
>
> I can not rsh from the slave to the master:
>
> [root at node1 ~]# rsh master "c:\cluster\wrshd\id.exe"
> connect to address 192.168.66.250: Connection refused
> Trying krb4 rsh...
> connect to address 192.168.66.250: Connection refused
> trying normal rsh (/usr/bin/rsh)
> Access denied.
>
> WRSHD in debug mode yields this:
>
> C:\Cluster\wrshd>rshd -d
> (/5/9 10:48:58) Checking WinSockets Version... (/5/9 10:48:58) done.
> (/5/9 10:48:58) Loading Equivalence List...(/5/9 10:48:58) Getting
> Information f
> rom Trustbase
> (/5/9 10:48:58) done.
> (/5/9 10:48:58) Binding main socket.
> (/5/9 10:48:58) cannot bind to the rshd daemon port.Debugging BsdRshd
> In StartServiceCtrlDispatcher
> Error number: 1063
>
> The pvml Log file
>
> [t80040000] master (192.168.66.250:1036) WIN32 3.4.3
> [t80040000] ready Fri May 09 10:46:05 2008
> [t80040000] netinput() bogus pkt from 192.168.66.1:32774
> [t80040000] netinput() bogus pkt from 192.168.66.2:32771
> [t80040000] netinput() bogus pkt from 192.168.66.3:32771
> [t80040000] netinput() bogus pkt from 192.168.66.5:32771
> [t80040000] netinput() bogus pkt from 192.168.66.6:32771
> [t80040000] netinput() bogus pkt from 192.168.66.7:32771
> [t80040000] netinput() bogus pkt from 192.168.66.8:32770
> [t80040000] startack() host node1 expected version, got "PvmCantStart"
> [t80040000] startack() host node2 expected version, got "PvmCantStart"
> [t80040000] startack() host node3 expected version, got "PvmCantStart"
> [t80040000] startack() host node4 expected version, got ""
> [t80040000] startack() host node5 expected version, got "PvmCantStart"
> [t80040000] startack() host node6 expected version, got "PvmCantStart"
> [t80040000] startack() host node7 expected version, got "PvmCantStart"
> [t80040000] startack() host node8 expected version, got "PvmCantStart"
> [t80040000] netinput() bogus pkt from 192.168.66.1:32775
> [t80040000] startack() host node1 expected version, got "PvmCantStart"
> [t80040000] netinput() bogus pkt from 192.168.66.8:32771
> [t80040000] startack(
>
> I know it must be something simple becuase it was working fine before
> this, any suggestions would be greatly appreciated.
>
> Thanks
> Jeff Thomas
>
>
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bioclusters
>


More information about the Bioclusters mailing list