Difference (from prior major revision)
Changed: 5c5
< where <b>mpicc<b> resolves to
to
> where <b>mpicc</b> resolves to
Added: 8a9,84
> Before measuring (using wall-clock time) the performance of the program, a comparison was made between running Qs with a simple 'mpirun' command, and running it via SGE. The setting for this test was
> * 4156 unique reflections
> * Space group <i>P422</i>
> * 100000 steps per minimisation
> * 1 minimisation
> * 4 processors (server + 3 celerons)
> The results were
> || *Method* || *Wall-clock time* ||
> || mpirun || 919 seconds ||
> || Grid Engine || 894 seconds ||
> = The tests : Celerons & server =
> The parameters tested are
> # Number of unique reflections
> # Number of crystallographic symmetry operators (space group)
> # Number of processors
> In all cases only one minimisation was performed lasting 100,000 steps. The results are :
> == 8 symmetry operators ==
> || /Reflections/ || /Space group/ ||/Processors/ || /Wall-clock time in seconds/ || /Scale-up/ ||
> || 4156 || P422 || 1 (celeron) || 2032 || 1.000 ||
> || 4156 || P422 || 1 (Pentium IV) || 1690 || 1.202 ||
> || 4156 || P422 || 2 (mixed) || 1241|| 1.637 ||
> || 4156 || P422 || 4 || 894 || 2.272 ||
> || 4156 || P422 || 6 || 802 || 2.533 ||
> || 4156 || P422 || 8 || 753 || 2.698 ||
> || /Reflections/ || /Space group/ ||/Processors/ || /Wall-clock time in seconds/ || /Scale-up/ ||
> || 2049 || P422 || 1 (celeron) || 1058 || 1.000 ||
> || 2049 || P422 || 1 (Pentium IV) || 818 || 1.293 ||
> || 2049 || P422 || 2 || 664 || 1.593 ||
> || 2049 || P422 || 4 || 500 || 2.116 ||
> || 2049 || P422 || 6 || 487 || 2.172 ||
> || 2049 || P422 || 8 || 449 || 2.356 ||
> || /Reflections/ || /Space group/ ||/Processors/ || /Wall-clock time in seconds/ || /Scale-up/ ||
> || 8224 || P422 || 1 (celeron) || 4148 || 1.000 ||
> || 8224 || P422 || 1 (Pentium IV) || 3449 || 1.202 ||
> || 8224 || P422 || 2 || 2377 || 1.745 ||
> || 8224 || P422 || 4 || 1685 || 2.501 ||
> || 8224 || P422 || 6 || 1564 || 2.652 ||
> || 8224 || P422 || 8 || 1329 || 3.121 ||
> == 4 symmetry operators ==
> || /Reflections/ || /Space group/ ||/Processors/ || /Wall-clock time in seconds/ || /Scale-up/ ||
> || 4156 || P222 || 1 (celeron) || 1056 || 1.000 ||
> || 4156 || P222 || 1 (Pentium IV) || 870 || 1.213 ||
> || 4156 || P222 || 2 (mixed) || 752 || 1.404 ||
> || 4156 || P222 || 4 || 659 || 1.602 ||
> || 4156 || P222 || 6 || 661 || 1.597 ||
> || 4156 || P222 || 8 || 650 || 1.624 ||
> || /Reflections/ || /Space group/ ||/Processors/ || /Wall-clock time in seconds/ || /Scale-up/ ||
> || 2049 || P222 || 1 (celeron) || 568 || 1.000 ||
> || 2049 || P222 || 1 (Pentium IV) || 428 || 1.327 ||
> || 2049 || P222 || 2 || 409 || 1.388 ||
> || 2049 || P222 || 4 || 374 || 1.518 ||
> || 2049 || P222 || 6 || 401 || 1.416 ||
> || 2049 || P222 || 8 || 408 || 1.392 ||
> || /Reflections/ || /Space group/ ||/Processors/ || /Wall-clock time in seconds/ || /Scale-up/ ||
> || 8224 || P222 || 1 (celeron) || 2180 || 1.000 ||
> || 8224 || P222 || 1 (Pentium IV) || 1796 || 1.213 ||
> || 8224 || P222 || 2 || 1466 || 1.487 ||
> || 8224 || P222 || 4 || 1211 || 1.800 ||
> || 8224 || P222 || 6 || 1158 || 1.882 ||
> || 8224 || P222 || 8 || 1114 || 1.956 ||
> == 2 symmetry operators ==
> || /Reflections/ || /Space group/ ||/Processors/ || /Wall-clock time in seconds/ || /Scale-up/ ||
> || 10044 || P2 || 1 (celeron) || 1464 || 1.000 ||
> || 10044 || P2 || 1 (Pentium IV) || 1208 || 1.211 ||
> || 10044 || P2 || 2 (mixed) || 1215 || 1.204 ||
> || 10044 || P2 || 4 || 1163 || 1.258 ||
> || 10044 || P2 || 6 || 1216 || 1.203 ||
> || /Reflections/ || /Space group/ ||/Processors/ || /Wall-clock time in seconds/ || /Scale-up/ ||
> || 19879 || P2 || 1 (celeron) || 2718 || 1.000 ||
> || 19879 || P2 || 1 (Pentium IV) || 2271 || 1.196 ||
> || 19879 || P2 || 2 (mixed) || 2321 || 1.171 ||
> || 19879 || P2 || 4 || 2169 || 1.253 ||
> || 19879 || P2 || 6 || 2158 || 1.259 ||
> = Conclusions =
> === On the newer nodes : ===
> For high symmetry space groups it is probably worth using the parallel version. For orthorhombic space groups you will have to measure scale-up. For the low symmetry cases (monoclinic, triclinic), forget it. Just spawn 5 jobs (each with a different seed for the random number generator) and forget them.
The version of Qs used is dated September, 10th, 2004. The parallel version was compiled with
mpicc -DMPI -static -I/usr/local/include -O3 -unroll -tpp6 -xK -wp_ipo Qs_working.c -lsrfftw_intel -lsfftw_intel -limf -lm
where
mpicc resolves to
icc -O2 -tpp6 -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1 -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1 -L/usr/local/mpich-ssh/lib -lmpich
Before measuring (using wall-clock time) the performance of the program, a comparison was made between running Qs with a simple 'mpirun' command, and running it via SGE. The setting for this test was
- 4156 unique reflections
- Space group P422
- 100000 steps per minimisation
- 1 minimisation
- 4 processors (server + 3 celerons) The results were
Method | Wall-clock time |
mpirun | 919 seconds |
Grid Engine | 894 seconds |
The tests : Celerons & server
The parameters tested are
- Number of unique reflections
- Number of crystallographic symmetry operators (space group)
- Number of processors
In all cases only one minimisation was performed lasting 100,000 steps. The results are :
8 symmetry operators
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
4156 | P422 | 1 (celeron) | 2032 | 1.000 |
4156 | P422 | 1 (Pentium IV) | 1690 | 1.202 |
4156 | P422 | 2 (mixed) | 1241 | 1.637 |
4156 | P422 | 4 | 894 | 2.272 |
4156 | P422 | 6 | 802 | 2.533 |
4156 | P422 | 8 | 753 | 2.698 |
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
2049 | P422 | 1 (celeron) | 1058 | 1.000 |
2049 | P422 | 1 (Pentium IV) | 818 | 1.293 |
2049 | P422 | 2 | 664 | 1.593 |
2049 | P422 | 4 | 500 | 2.116 |
2049 | P422 | 6 | 487 | 2.172 |
2049 | P422 | 8 | 449 | 2.356 |
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
8224 | P422 | 1 (celeron) | 4148 | 1.000 |
8224 | P422 | 1 (Pentium IV) | 3449 | 1.202 |
8224 | P422 | 2 | 2377 | 1.745 |
8224 | P422 | 4 | 1685 | 2.501 |
8224 | P422 | 6 | 1564 | 2.652 |
8224 | P422 | 8 | 1329 | 3.121 |
4 symmetry operators
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
4156 | P222 | 1 (celeron) | 1056 | 1.000 |
4156 | P222 | 1 (Pentium IV) | 870 | 1.213 |
4156 | P222 | 2 (mixed) | 752 | 1.404 |
4156 | P222 | 4 | 659 | 1.602 |
4156 | P222 | 6 | 661 | 1.597 |
4156 | P222 | 8 | 650 | 1.624 |
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
2049 | P222 | 1 (celeron) | 568 | 1.000 |
2049 | P222 | 1 (Pentium IV) | 428 | 1.327 |
2049 | P222 | 2 | 409 | 1.388 |
2049 | P222 | 4 | 374 | 1.518 |
2049 | P222 | 6 | 401 | 1.416 |
2049 | P222 | 8 | 408 | 1.392 |
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
8224 | P222 | 1 (celeron) | 2180 | 1.000 |
8224 | P222 | 1 (Pentium IV) | 1796 | 1.213 |
8224 | P222 | 2 | 1466 | 1.487 |
8224 | P222 | 4 | 1211 | 1.800 |
8224 | P222 | 6 | 1158 | 1.882 |
8224 | P222 | 8 | 1114 | 1.956 |
2 symmetry operators
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
10044 | P2 | 1 (celeron) | 1464 | 1.000 |
10044 | P2 | 1 (Pentium IV) | 1208 | 1.211 |
10044 | P2 | 2 (mixed) | 1215 | 1.204 |
10044 | P2 | 4 | 1163 | 1.258 |
10044 | P2 | 6 | 1216 | 1.203 |
Reflections | Space group | Processors | Wall-clock time in seconds | Scale-up |
19879 | P2 | 1 (celeron) | 2718 | 1.000 |
19879 | P2 | 1 (Pentium IV) | 2271 | 1.196 |
19879 | P2 | 2 (mixed) | 2321 | 1.171 |
19879 | P2 | 4 | 2169 | 1.253 |
19879 | P2 | 6 | 2158 | 1.259 |
Conclusions
On the newer nodes :
For high symmetry space groups it is probably worth using the parallel version. For orthorhombic space groups you will have to measure scale-up. For the low symmetry cases (monoclinic, triclinic), forget it. Just spawn 5 jobs (each with a different seed for the random number generator) and forget them.