Collection Skeletons: Declarative Abstractions for Data Collections

Björn Franke, Zhibo Li, Magnus Morton, Michel Steuwer

July, 2024

Image credit: Pixabay

Abstract

Modern programming languages provide programmers with rich abstractions for data collections as part of their standard libraries, e.g., Containers in the C++ STL, the Java Collections Framework, or the Scala Collections API. Typically, these collections frameworks are organised as hierarchies that provide programmers with common abstract data types (ADTs) like lists, queues, and stacks. While convenient, this approach introduces problems which ultimately affect application performance due to users over-specifying collection data types limiting implementation flexibility. In this article, we develop Collection Skeletons which provide a novel, declarative approach to data collections. Using our framework, programmers explicitly select properties for their collections, thereby truly decoupling specification from implementation. By making collection properties explicit, immediate benefits materialise in forms of reduced risk of over-specification and increased implementation flexibility. We have prototyped our declarative abstractions for collections as a C++ library, and demonstrate that benchmark applications rewritten to use Collection Skeletons incur little or no overhead. We also show how Collection Skeletons help shielding the application developer from parallel implementation details, either by encapsulating implicit parallelism or through explicit properties that capture the requirements of parallel algorithmic skeletons. We observe performance improvements across most of the 17 benchmarks resulting from the use of Collection Skeletons before trying to parallelise those benchmarks, while also enhancing performance portability across three different hardware platforms.

Type

Journal article

Publication

Journal of Systems and Software

Zhibo Li

Research Assistant

My research interests include System & Architecture, especially in Data-Centric Parallelism, Code Generation and Programming model.