NAS Integer Sort on Multi-Threaded Shared Memory Machines

Thomas Gr{\"u}n and Mark A. Hillebrand

Abstract
We investigate the performance of multi-threaded shared memorymachines, like the commercial Tera MTA or the experimentalSB-PRAM, on the Integer Sort benchmark of the NAS ParallelBenchmark Suite. The benchmark models the ranking phase of acounting sort algorithm, i.e. it computes for each element xof a vector of small integers the position of the firstoccurrence of x in the sorted version of that vector. Bothaforementioned machines require for the execution of thebenchmark about three times less clock cycles than vectorcomputers and an order of magnitude less cycles than generalpurpose distributed memory machines or share memory machines.The reasons for this behavior are investigated. It turns outthat both processors can take advantage of a fetch-and-addoperation and that due to multi-threading no time is lostwaiting for memory accesses to complete.
Contact
Dr. Thomas Gruen
Computer Science Department,- Chair of Prof. Dr. Paul,University of the Saarland,Postfach 151150,66041 Saarbruecken, Germany
gruen@cs.uni-sb.de