MBG wiki: HPL benchmarks (9 nodes)

The 9 nodes are the server (Pentium IV) and the 8 newest nodes (Celerons). The mpich library and executables are ssh-based, icc-compiled.

Initial tests :

The last one was based on atlas 3.6.0 compiled on one of the celeron nodes (aspera), and it looks reasonably good (reaching 8.2 Gflops).

Optimisation :

Block size (constant problem size) best seems to be NB=100. Stick to that.
Broadcasts (constant N, NB) BlongM (BCAST=5) looks best. Stick to that.
re-refine block size with constant BCAST : NB=40 or 60 ?
NB 40 or 60, swapping threshold 40 or 60, Ok stick to NB=60, swapping threshold 60.
NBmin 8 looks better
Look-ahead depth set to 0
Final test with respect to problem size Top speed 9.5 Gflops

Final HPL 9-node benchmark, best performance 10.32 Gflops.