Dimitrios S. Nikolopoulos and Eleftherios D. Polychronopoulos and Theodore S. Papatheodorou

Abstract
Autoscheduling is a parallel program compilation and execution model that combines three unique features: Automatic extraction of loop and functional parallelism at any level of granularity, dynamic scheduling of parallel tasks, and dynamic program adaptability to the machine resources on multiprogrammed shared memory multiprocessors. This paper presents a technique that attempts to enhance the performance of autoscheduling in Distributed Shared Memory (DSM) multiprocessors, targetting mainly at medium to large-scale systems, where poor data locality and excessive data communication impose performance bottlenecks. Our technique partitions the application Hierarchical Task Graph (HTG) and maps the derived partitions to clusters of processors in the DSM architecture. Autoscheduling is then applied independently for each partition in order to enhance data locality and reduce communication costs. Our results for application and synthetic benchmarks show that this technique achieves remarkable performance improvements up to 44% on average compared to an existing autoscheduling environment and 54% on average compared to a commercial parallelizing compiler.
Contact
Dimitrios S. Nikolopoulos High Performance Computing Architectures Laboratory,Computer Engineering and Informatics Department,University of Patras,26500, Rion-Patras, Achaia, Greece,, dsn@hpclab.ceid.upatras.gr