The version of Qs used is dated September, 10th, 2004. The parallel version was compiled with
mpicc -DMPI -static -I/usr/local/include -O3 -unroll -tpp6 -xK -wp_ipo Qs_working.c -lsrfftw_intel -lsfftw_intel -limf -lm
where
mpicc resolves to
icc -O2 -tpp6 -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1 -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1 -L/usr/local/mpich-ssh/lib -lmpich
Before measuring (using wall-clock time) the performance of the program, a comparison was made between running Qs with a simple 'mpirun' command, and running it via SGE. The setting for this test was
- 4156 unique reflections
- Space group P422
- 100000 steps per minimisation
- 1 minimisation
- 4 processors (server + 3 celerons) The results were
Method | Wall-clock time |
mpirun | 919 seconds |
Grid Engine | 894 seconds |
The tests : Celerons & server
The parameters tested are
- Number of unique reflections
- Number of crystallographic symmetry operators (space group)
- Number of processors
In all cases only one minimisation was performed lasting 100,000 steps. The results are :
8 symmetry operators
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
4156 | P422 | 1 (celeron) | 2032 | 1.000 |
4156 | P422 | 1 (Pentium IV) | 1690 | 1.202 |
4156 | P422 | 2 (mixed) | 1241 | 1.637 |
4156 | P422 | 4 | 894 | 2.272 |
4156 | P422 | 6 | 802 | 2.533 |
4156 | P422 | 8 | 753 | 2.698 |
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
2049 | P422 | 1 (celeron) | 1058 | 1.000 |
2049 | P422 | 1 (Pentium IV) | 818 | 1.293 |
2049 | P422 | 2 | 664 | 1.593 |
2049 | P422 | 4 | 500 | 2.116 |
2049 | P422 | 6 | 487 | 2.172 |
2049 | P422 | 8 | 449 | 2.356 |
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
8224 | P422 | 1 (celeron) | 4148 | 1.000 |
8224 | P422 | 1 (Pentium IV) | 3449 | 1.202 |
8224 | P422 | 2 | 2377 | 1.745 |
8224 | P422 | 4 | 1685 | 2.501 |
8224 | P422 | 6 | 1564 | 2.652 |
8224 | P422 | 8 | 1329 | 3.121 |
4 symmetry operators
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
4156 | P222 | 1 (celeron) | 1056 | 1.000 |
4156 | P222 | 1 (Pentium IV) | 870 | 1.213 |
4156 | P222 | 2 (mixed) | 752 | 1.404 |
4156 | P222 | 4 | 659 | 1.602 |
4156 | P222 | 6 | 661 | 1.597 |
4156 | P222 | 8 | 650 | 1.624 |
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
2049 | P222 | 1 (celeron) | 568 | 1.000 |
2049 | P222 | 1 (Pentium IV) | 428 | 1.327 |
2049 | P222 | 2 | 409 | 1.388 |
2049 | P222 | 4 | 374 | 1.518 |
2049 | P222 | 6 | 401 | 1.416 |
2049 | P222 | 8 | 408 | 1.392 |
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
8224 | P222 | 1 (celeron) | 2180 | 1.000 |
8224 | P222 | 1 (Pentium IV) | 1796 | 1.213 |
8224 | P222 | 2 | 1466 | 1.487 |
8224 | P222 | 4 | 1211 | 1.800 |
8224 | P222 | 6 | 1158 | 1.882 |
8224 | P222 | 8 | 1114 | 1.956 |
2 symmetry operators
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
10044 | P2 | 1 (celeron) | 1464 | 1.000 |
10044 | P2 | 1 (Pentium IV) | 1208 | 1.211 |
10044 | P2 | 2 (mixed) | 1215 | 1.204 |
10044 | P2 | 4 | 1163 | 1.258 |
10044 | P2 | 6 | 1216 | 1.203 |
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
19879 | P2 | 1 (celeron) | 2718 | 1.000 |
19879 | P2 | 1 (Pentium IV) | 2271 | 1.196 |
19879 | P2 | 2 (mixed) | 2321 | 1.171 |
19879 | P2 | 4 | 2169 | 1.253 |
19879 | P2 | 6 | 2158 | 1.259 |
Conclusions
On the newer nodes :
For high symmetry space groups it is probably worth using the parallel version. For orthorhombic space groups you will have to measure scale-up. For the low symmetry cases (monoclinic, triclinic), forget it. Just spawn 5 jobs (each with a different seed for the random number generator) and forget them.