[BioBrew Users] disabling non-SGE access to the cluster

Glen Otero gotero at linuxprophet.com
Fri Aug 29 22:37:31 EDT 2003


On Friday, August 29, 2003, at 03:53  PM, Bill Barnard wrote:

> My cluster is working okay. I've tested submitting small jobs via SGE,
> which seems to work fine. I submitted a few small HPL jobs via SGE,
> which worked fine. Large HPL jobs still end up with a zombie process
> using SGE. (Will troubleshoot that later...)

Zombie processes suck because even if you kill the processes on the 
frontend, they still will be running on the compute nodes.  You have to 
individually kill them on each of the nodes.  Here's an easy way to 
clean up all the nodes:

% cluster-fork skill -KILL -u <username>

Do this for any users that have processes, and if you do it as yourself 
(not root) it will probably give you a disconnection message from each 
of the nodes, but don't worry about that.  After that, if you run 'ps' 
you shouldn't see any user processes out there.

WRT to HPL zombie processes, if the compute nodes are not pentium 4 
processors, then you might see zombie process behavior. The binaries 
for hpl were optimized for the
Pentium 4 and uses instructions (SSE2) not available on Pentium III or
Athlon. The solution is to recompile the ATLAS library, install it and
rebuild hpl against it. It is easiest to just download the Atlas 
libraries from netlib (prebuilt)

http://www.netlib.org/atlas/archives/linux/

But if you want to rebuild atlas and hpl from scratch, you should start 
by checking out a Rocks CVS source tree.

# cvs -d:pserver:anonymous at cvs.rocksclusters.org:/home/cvs/CVSROOT/ \
checkout -r ROCKS_2_3_2_i386 rocks-src

and make sure to get the 2_3_2 version and not the HEAD

Rebuild and install ATLAS:

	# cd rocks/src/contrib/atlas
	# make rpm
	# rpm -Uvh --force /usr/src/redhat/RPMS/i386/atlas*rpm

Rebuild HPL (no need to install it on the frontend if you don't run hpl 
on the frontend):

	# cd rocks/src/contrib/hpl
	# make rpm

Rebuild your distribution:

	# cd /home/install
	# rocks-dist dist

Reinstall your compute nodes:

	#shoot-node compute-0-1 compute-0-1...

  The new hpl package will be bound into the new distribution 
(rocks-dist knows to look in /usr/src/redhat/RPMS for new packages). 
Then you should be able to run linpack on your cluster.
************

Here is what one user did to build rpms for the Pentium III:

I had the same problem with hpl and linpack on Rocks 2.3.2.  You can get
the source rpm's at this location.

ftp://ftp.harddata.pub.rocks.athlon/SRPMS/

These binaries are compiled for the Athlon.  The will not work on the
PIII.  What you need to do for each source rpm file is a "rpmbuild
--rebuild --target=i386 atlas....." (Replace atlas ..... With complete
source rpm filename.)  "rpmbuild --rebuild --target=i386 hpl.....
(Replace hpl..... With complete source rpm filename.)

If I remember correctly the atlas rebuild went into a loop on a question
about a fortran compiler.  If that happens you need to edit the spec
file (specification file).  To get to this file you must extract the
files from the source rpm file.  To achieve this do the following:

1.  "rpm -ivh atlas....."

2.  Change into "usr/src/redhat/SPECS"

3.  vi the Atlas.spec file.  The section you want to edit is the Pentium
III section (Shown below)

	#Pentium III
	#
	export PATH=/opt/gcc32/bin:$PATH
	echo "0
	y
	y
	n
	y
	y <---- This was the line that gave me trouble, I had to remove
this line completely.
	linux
	0
	/opt/gcc32/bin/g77
	-0
	y
	" | make
	else

4.  Save the file and exit vi.

5.  do a "rpm -ba atlas.spec" from the "SPECS" directory.  This will
create a new rpm file.

6.  Wait for compile to complete.  (Elevator music playing)

7.  Change into the "/usr/src/redhat/RPMS/i386" directory and retrieve
your new RPM file for the PIII.

8.  Install the new rpm on the frontend and all compute nodes.  You will
also need to reinstall hpl to all nodes as well.
**********

HTH!

Glen

>
> Before I open the cluster for use I want to set it up so all jobs are
> submitted via SGE/qsub. I can currently submit mpirun directly, so I 
> can
> clearly bypass SGE. Has anyone done this yet? (Not to say that I'm 
> lazy,
> but of course I am lazy...)
>
> Thanks,
>
> Bill
> -- 
> Bill Barnard <bill at barnard-engineering.com>
>
> _______________________________________________
> BioBrew-Users mailing list
> BioBrew-Users at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/BioBrew-Users
>
>
Glen Otero, Ph.D.
Linux Prophet
619.917.1772

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 4688 bytes
Desc: not available
Url : http://bioinformatics.org/pipermail/biobrew-users/attachments/20030829/b932bab3/attachment.bin


More information about the BioBrew-Users mailing list