Declarative Data Collections for Portable Parallelism

Image by Pixabay

Abstract

I would like to introduce Declarative Abstractions for Data Collections, which provides a novel, declarative approach to data collections for convenient, portable, and efficient parallel computation. Modern programming languages provide programmers with rich abstractions for data collections as part of their standard libraries, e.g., containers in the C++ STL, the Java Collections Framework, or the Scala Collections API. Typically, these collections frameworks are organized as hierarchies that provide programmers with common abstract data types (ADTs) like lists, queues, and stacks. While convenient, this approach introduces problems that ultimately affect application performance due to users over-specifying collection data types, limiting implementation flexibility. With the introduced framework, programmers explicitly select properties for their collections, thereby truly decoupling specification from implementation. By making collection properties explicit, immediate benefits materialize in the form of reduced risk of over-specification and increased implementation flexibility. In terms of computational performance, our framework helps shield the application developer from parallel implementation details, where the property-based data collection can be ported to multiple platforms, including GPU and FPGA, without modifying the declaration on the properties. The framework provides a data-centric approach for high performance computation, where the users focus on what properties the container(collection) would have and do not need to work around the implementation details. The framework has been developed based on C++ metaprogramming and provides modern C++ API for the users. This framework will benefit the community as a convenience and high-performance programming model for parallel data processing in heterogeneous environment. The audience will get to know a practical programming model for data-centric parallelism, which is useful for their everyday job regarding parallel data analyzing, data storage/filter, etc.

Date
Jun 19, 2023 10:40 AM — 11:00 AM