Representing Data Collections in an SSA Form [pdf]

matt_d · 2024-01-12T20:15:44.000000Z

Abstract:

"Compiler research and development has treated computation as the primary driver of performance improvements in C/C++ programs, leaving memory optimizations as a secondary consideration. Developers are currently handed the arduous task of describing both the semantics and layout of their data in memory, either manually or via libraries, prematurely lowering high-level data collections to a low-level view of memory for the compiler. Thus, the compiler can only glean conservative information about the memory in a program, e.g., alias analysis, and is further hampered by heavy memory optimizations. This paper proposes the Memory Object Intermediate Representation (MEMOIR), a language-agnostic SSA form for sequential and associative data collections, objects, and the fields contained therein. At the core of MEMOIR is a decoupling of the memory used to store data from that used to logically organize data. Through its SSA form, MEMOIR compilers can perform element-level analysis on data collections, enabling static analysis on the state of a collection or object at any given program point. To illustrate the power of this analysis, we perform dead element elimination, resulting in a 26.6% speedup on mcf from SPECINT 2017. With the degree of freedom to mutate memory layout, our MEMOIR compiler performs field elision and dead field elimination, reducing peak memory usage of mcf by 20.8%."

Programming Languages Amenable to MEMOIR

"At its core, MEMOIR proposes collections as value types. In this paper, we implement a library in C/C++ to provide this functionality, however many languages exist which provide the guarantees needed for a MEMOIR compiler. Languages with mutable value semantics [48], which degrades references to second-class citizens, are amenable to SSA construction, as they are analogous to our MUT library. Such languages include Swift’s struct types [69] and Hylo [70].

Languages with single-ownership, i.e., “borrowing”, which guarantee that only one mutable reference will exist at a time can be used to construct a MEMOIR program. An example of this is Rust [71], which is steadily entering the programming zeitgeist. Similarly to Rust, newer languages such as Mojo [72] and Vale [73] have similar ownership models. Of note, use ϕ ’s cannot be constructed for these languages, as multiple immutable references may exist at once. While the aforementioned languages are promising directions of future work, the lack of accepted benchmark suites implemented in them, unlike C/C++, was deemed too large a barrier to adoption in our research at present.

Collection-oriented languages [74–77] have existed for many years now. APL [74] and SETL [77] serve as prime examples of their philosophy, focusing on arrays and sets, respectively, as prime concepts of the language. As such they are interesting source languages for compilation and, furthermore, their implementations provide a wealth of resources on optimizing collection-oriented programs [78]. A recent example of these concepts being exploited outside of their original languages is parallel block-delayed sequences [79], which implements loop-fusion techniques on sequences as a library for Parallel ML and C++. Investigating the extent these optimizations could be performed statically with MEMOIR provides an interesting starting point for this line of research."