Performance
Turn around times
are greatly reduced for the parallel code. For a 10 PE job, the reduction
in turnaround time is a factor of 15. The superlinear speedup is attributable
to increased cache availability. The code runs on the SGI-challenge and
T3E. Timing results for a 3 row, 10 PE run.
| Machine |
NPES |
time/(iter*grid
pt) |
| SGI-Challenge |
1 |
397.
E-6 |
| C - 90 |
1 |
33.5
E-6 |
| SGI-Challenge |
10 |
25.5
E-6 |
| T3E-900 |
10 |
25.0
E-6 |
The scalability of the
code on the Challenge is limited by the
number of processors available.
Real-world configurations require
larger number of PE's. For a 15 PE run on the T3E,
the following timings
were recorded,
| Machine |
Clock(MHz) |
Time(s) |
Speedup |
| T3E-600 |
300 |
34 |
1.000 |
| T3E-900 |
450 |
27 |
1.259 |
| T3E-1200 |
600 |
20 |
1.700 |