For trivially parallel problems (running, for example, many independent instances of the same program without any communication between them), the total performance will be the sum of the individual nodes' performance and the scaling-up (assuming a homogeneous cluster) will be linear. For non-trivially parallel problems however, the multiple instances of the parallel program must communicate to complete the task. This means that the performance (and scaling-up) depends on the communication medium (in this case a switched fast ethernet) and its characteristics (mainly bandwidth and latency). Because different problems (and programs) have different communication requirements and patterns, one of the most meaningful benchmarks for any given cluster is to examine its performance on the actual (real-life) calculations that will be run on it (for example, the NAMD runs shown below), instead of trying to guess its real-life performance from the various general cluster benchmarks (like the HPL runs shown bellow).
All these being said, for embarrassingly parallel problems the present cluster would ideally deliver approximately [ 1·2.6 + 8·2.4 + 9·0.733 ] = 28.4 Gflops. To get a feeling for this number, the following table shows what position in the top 500 fastest supercomputers (http://www.top500.org) a computer delivering 30 Gflops would get at various years since 1993 :
Year | Position |
1993 | 3 |
1995 | 19 |
1997 | 83 |
June 1999 | 362 |
Nov 1999 | Out of list (last entry at 33Gflops) |
It is worth noting that the slowest supercomputer in the year's 2005 top500 list delivers a hefty 1.166 Teraflops. So, now you know. The tests now :