


This is especially true when indirectly indexing the elements of a shared array, for which the induced between-thread data communication can be irregular and have a fine-grained pattern. The programmer friendliness, however, can come at the cost of substantial performance penalties. One convenient feature of UPC is its ability to automatically execute between-thread data movement, such that the entire content of a shared data array appears to be freely accessible by all the threads. The UPC programming language offers parallelism via logically partitioned shared memory, which typically spans physically disjoint memory sub-systems.
BERKELEY UPC CODE
When compared to MPI+OpenMP implementations that employ optimizations of the same type as those applied to the Chapel codes, our optimized code is 3.7x faster on average for BFS but 1.3x slower for PageRank. We then describe optimized versions that utilize message aggregation and data replication in ways that could potentially be applied automatically, improving performance by as much as 1,219x for BFS and 22x for PageRank. We present high-level implementations of the Breadth First Search (BFS) and PageRank applications. In this work, we explore techniques to bridge the gap between high productivity and high performance for irregular applications using the Chapel programming language. However, irregular applications written in Chapel often struggle to achieve high performance due to implicit fine-grained remote communication. The Chapel programming language provides a PGAS programming model and offers high productivity for irregular application developers, as remote communication is performed implicitly. The Partitioned Global Address Space (PGAS) model simplifies the development of distributed-memory irregular applications, as all the memory in the system is viewed logically as a single shared address space. Large scale irregular applications, such as sparse linear algebra and graph analytics, exhibit fine-grained memory access patterns and operate on very large data sets.
