Hacker News new | past | comments | ask | show | jobs | submit login

In compilers the N value is commonly considered to be the number of nodes in an AST. Compiler performance is generally evaluated based on its performance over the code, which is generally bound by the number of nodes in the AST.

The identity pass is linear in the size of the AST in almost all functional compiler designs. Optimizations would then make this constant time for most functional languages with adequate optimizations. The identity pass in Co-dfns would be the single character ⊢. The Co-dfns compiler would produce code for this pass which is constant time.

Likewise, with a bit of work, this would also be true for a Nanopass compiler function, unless you insisted on traversing the AST and rebuilding it yourself, in which case the framework could not optimize this away.

The majority of compilers use algorithms that are quadratic in the size of the AST (number of nodes) for a number of passes. Many compilers use a variety of graph coloring register allocation, which is NP-complete. However, better register allocation algorithms exist that are quadratic or quadratic log in the number of nodes in the AST. The Nanopass research has demonstrated that for real life compilers, the cost of certain compiler passes generally overwhelms most of the cost of the rest of the compiler, and so adding another linear factor by splitting a pass into multiple passes, generally does not affect the big-O complexity of the entire compiler, and thus, the overall performance at scale, though it can affect the constant factors by a small amount, depending on how good your allocation engine is for your host language.

This Nanopass research has only been done on compilers implemented on the CPU. My research is exploring whole compilers on the GPU, which is an area that shows promise, since even memory bound computations such as most compiler core compiler transformations, can see significant performance speedups, even if they move from a linear to log linear complexity.

The passes are not run serially. Each pass is data parallel, and while there is a serial dependency in that each pass is dependent on the results of the previous pass, nothing in the way the compiler is written forces a hard barrier (such as a join in fork-join programming) at the compiler pass boundaries. It is conceivable that optimizations techniques for stencils such as Trapezoidal decomposition could be applied to compilers written in this fashion to allow intra-pass optimization to occur, but this is future work.

The passes themselves execute in parallel on the GPU, or at least, that is their design. They are expressed as natively data parallel, or vectorized, algorithms. If run on the CPU through the interpreter, then they will execute using the Intel Vectorized code units using the optimized primitives in the Dyalog APL interpreter.




Thanks. Your work is inspiring.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: