MBG wiki | RecentChanges | Blog | 2024-12-22 | 2024-12-21

The MBG cluster for experienced users

Beware: incomplete work in progress …

Prelude

The basic differences from a simple bunch of networked PCs are :

LAN and Accessibility

The cluster uses an isolated LAN with IPs ranging from 190.100.100.100 to 190.100.100.XXX. DNS for the LAN is offered by the head node which is the 190.100.100.100 machine (server.cluster.mbg.gr). The other nodes are pc01.cluster.mbg.gr, …, with aliases pc1, pc2, … The head-node also serves as a http-proxy-server (running squid) allowing the other nodes to access the network (only http allowed).

The head-node (server from now on) has two network interfaces. One is the internal (cluster) interface. The second interface connects the server (and the server only) to the outside world. The address of this second interface is dynamic of the form dhcp-XXX.mbg.duth.gr. The DHCPD-based interface is firewalled with respect to foreign addresses, with the following exceptions:

Filesystem things

Each node (plus the server, of course) has its local (ext3 formatted) disk, complete with a full copy of the operating system plus programs (mostly in /usr/local). To be able to run parallel jobs, a common filesystem is needed and this is offered by openmosix's MFS (mosix filesystem). The mount point for the cluster-wide filesystem is /work. To make this clear: commands like cd /home/myself or cd /usr/local/bin will take you to the corresponding directory on the local filesystem of the node that you are currently logged-in. On the other hand, all files and directories that reside inside /work are visible from all nodes. Now the crucial question: what is the physical location of /work ? (in other words, where is the /work stuff get written to?). The answer is that /work is physically mounted on the server's /tmp directory. To make this clear: /work is a handle pointing to the server's /tmp directory. If you are logged to the server, then doing an ls in /work and in /tmp will show you the same files. But if you are logged to, say, pc08, the command cd /tmp ; ls will show you the contents of the local (pc8's) /tmp directory.

Based on the discussion above, you could in principle do a

cd /work
mkdir mystuff
cd mystuff
cp /home/myself/myprogram ./
./myprogram < input > output
and run your programs (even the stand-alone) from the common filesystem. This is not a good idea for two reasons: The first is that all input and output …

Approaching the cluster

After logging-in to the server, or (physically) to one of the cluster's nodes, open a unix shell and type mosmon. You should get a graphical view of how many machines are alive and whether they are busy or otherwise. If, for example, you see something like

 mosmon output example

then you will know things are quite busy. Things to note :

Given the number of nodes used in this example (17), this is probably a parallel job submitted via SGE. You can check this out by typing 'q' (to quit mosmon) and then qstat. If you prefer GUIs, give 'qmon' and then click 'Job control' followed by 'Running Jobs'. Exit with 'Done' and then 'Exit'.

Being nice

Parallel jobs are rather sensitive to disturbances of any of the participating nodes. For this reason, when the cluster is busy running a parallel job it is important to treat the machines gently:

The real thing : submitting a job

For the discussion that follows I will assume that your program is properly installed and running (from the command line). To simplify the discussion that follows I will categorise the problem in program-specific types:


An interactive single-processor job

From the mosmon output identify a node that is alive but idle. Copy your files to