MBG wiki | RecentChanges | Blog | 2024-11-21 | 2024-11-20

My job on mpi slow doesn't start

If you submitted a job on the mpi_slow (parallel) queue and your job doesn't start, check the contents of the myjob.e_ID file. If you see something like :

# cat Test.e394
Charmrun> error 8 attaching to node:
Timeout waiting for node-program to connect

then the problem is that SGE chosen node 15 to server as a master queue. Unfortunately, node 15 was installed with a minimal set of packages and can not be used as a master queue. The way to avoid pc15 is to add the following line in your SGE qsub script :

#$ -masterq pc01.q,pc02.q,pc03.q,pc04.q,pc05.q,pc06.q,pc07.q,pc08.q,pc09.q,pc10.q,pc11.q,pc12.q,pc13.q,pc14.q,pc16.q