Harshvardhan,; West, Brandon; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence(2015 IEEE International Parallel and Distributed Processing Symposium, Institute of Electrical and Electronics Engineers (IEEE), 2015-05)[Conference Paper]
With the advent of big-data, processing large graphs quickly has become increasingly important. Most existing approaches either utilize in-memory processing techniques that can only process graphs that fit completely in RAM, or disk-based techniques that sacrifice performance. In this work, we propose a novel RAM-Disk hybrid approach to graph processing that can scale well from a single shared-memory node to large distributed-memory systems. It works by partitioning the graph into sub graphs that fit in RAM and uses a paging-like technique to load sub graphs. We show that without modifying the algorithms, this approach can scale from small memory-constrained systems (such as tablets) to large-scale distributed machines with 16, 000+ cores.
Oancea, Cosmin E.; Rauchwerger, Lawrence(2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Institute of Electrical and Electronics Engineers (IEEE), 2015-02)[Conference Paper]
Subscripts using induction variables that cannot be expressed as a formula in terms of the enclosing-loop indices appear in the low-level implementation of common programming abstractions such as Alter, or stack operations and pose significant challenges to automatic parallelization. Because the complexity of such induction variables is often due to their conditional evaluation across the iteration space of loops we name them Conditional Induction Variables (CIV). This paper presents a flow-sensitive technique that summarizes both such CIV-based and affine subscripts to program level, using the same representation. Our technique requires no modifications of our dependence tests, which is agnostic to the original shape of the subscripts, and is more powerful than previously reported dependence tests that rely on the pairwise disambiguation of read-write references. We have implemented the CIV analysis in our parallelizing compiler and evaluated its impact on five Fortran benchmarks. We have found that that there are many important loops using CIV subscripts and that our analysis can lead to their scalable parallelization. This in turn has led to the parallelization of the benchmark programs they appear in.
Harshvardhan,; Amato, Nancy M.; Rauchwerger, Lawrence(Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015, Association for Computing Machinery (ACM), 2015)[Conference Paper]
Large-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is exacerbated in scale-free networks, such as social and web graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many graph algorithms and computations send the same data to each of the neighbors of a vertex. Our proposed approach recognizes this, and reduces communication performed by the algorithm without change to user-code, through a hierarchical machine model imposed upon the input graph. The hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent. It is also able to better exploit the machine hierarchy to further reduce the communication costs, by aggregating traffic between different levels of the machine hierarchy. Results of an implementation in the STAPL GL shows improved scalability and performance over the traditional level-synchronous approach, with 2.5 × - 8× improvement for a variety of graph algorithms at 12, 000+ cores.
Papadopoulos, Ioannis; Thomas, Nathan; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence(Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15, Association for Computing Machinery (ACM), 2015)[Conference Paper]
Fidel, Adam; Jacobs, Sam Ade; Sharma, Shishir; Amato, Nancy M.; Rauchwerger, Lawrence(2014 IEEE 28th International Parallel and Distributed Processing Symposium, Institute of Electrical and Electronics Engineers (IEEE), 2014-05)[Conference Paper]
Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence(Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14, Association for Computing Machinery (ACM), 2014)[Conference Paper]
Tomkins, Daniel; Smith, Timmie; Amato, Nancy M.; Rauchwerger, Lawrence(Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14, Association for Computing Machinery (ACM), 2014)[Conference Paper]
Tarjan's famous linear time, sequential algorithm for finding the strongly connected components (SCCs) of a graph relies on depth first search, which is inherently sequential. Deterministic parallel algorithms solve this problem in logarithmic time using matrix multiplication techniques, but matrix multiplication requires a large amount of total work. Randomized algorithms based on reachability - the ability to get from one vertex to another along a directed path - greatly improve the work bound in the average case. However, these algorithms do not always perform well; for instance, Divide-and-Conquer Strong Components (DCSC), a scalable, divide-and-conquer algorithm, has good expected theoretical limits, but can perform very poorly on graphs for which the maximum reachability of any vertex is small. A related algorithm, MultiPivot, gives very high probability guarantees on the total amount of work for all graphs, but this improvement introduces an overhead that increases the average running time. This work introduces SCCMulti, a multi-pivot improvement of DCSC that offers the same consistency as MultiPivot without the time overhead. We provide experimental results demonstrating SCCMulti's scalability; these results also show that SCCMulti is more consistent than DCSC and is always faster than MultiPivot.
Oancea, Cosmin E.; Rauchwerger, Lawrence(Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation - PLDI '12, Association for Computing Machinery (ACM), 2012)[Conference Paper]
The export option will allow you to export the current search results of the entered query to a file. Different
formats are available for download. To export the items, click on the button corresponding with the preferred download format.
By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.
For anonymous users the allowed maximum amount is 50 search results.
To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export.
The amount of items that can be exported at once is similarly restricted as the full export.
After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.