> Out in the literature we can find a huge number of dataflow analyses, some of which are useful to optimize some kinds of code — but it’s hard to know which ones to actually implement. […] First, implementing the analysis itself, which requires creating an abstract version of each instruction in the compiler’s IR: these are called dataflow transfer functions. For example, to implement the addition operation for integer ranges, we can use [lo1, hi1] + [lo2, hi2] = [lo1 + lo2, hi1 + hi2] as the transfer function.
What papers would you recommend for learning implementing dataflow analysis? For example, foundational or tutorial papers.
Every good book on compilers. I learned this from [1]. The (hard to read) original papers include [2, 3]. Interesting tidbit: Gary Kildall, one of the pioneers of abstracting dataflow analyses, later played a role in the (in)famous deal between Microsoft and IBM that became the starting point for the dominance of the former. The foundational theory behind this is abstract interpretation for which you have many introductions, including [5, 6], with the (hard to read) original being [7].
[5] F. Nielson, H. Riis Nielson, C. Hankin, Principles of Program Analysis.
[6] X. Rival, K. Yi, Introduction to Static Analysis: An Abstract Interpretation Perspective.
[7] P. Cousot, R. Cousot, Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints.
The order goes from simpler to more complex data flow analysis frameworks. These frameworks allow you to encode dataflow problems and solve them.
* Kam, John B., and Jeffrey D. Ullman. "Monotone data flow analysis frameworks." Acta informatica 7.3 (1977): 305-317.
* Reps, Thomas, Susan Horwitz, and Mooly Sagiv. "Precise interprocedural dataflow analysis via graph reachability." Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 1995.
* Sagiv, Mooly, Thomas Reps, and Susan Horwitz. "Precise interprocedural dataflow analysis with applications to constant propagation." TAPSOFT'95: Theory and Practice of Software Development: 6th International Joint Conference CAAP/FASE Aarhus, Denmark, May 22–26, 1995 Proceedings 20. Springer Berlin Heidelberg, 1995.
* Reps, Thomas, et al. "Weighted pushdown systems and their application to interprocedural dataflow analysis." Science of Computer Programming 58.1-2 (2005): 206-263.
* Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on Programming Languages 3.POPL (2019): 1-29.
One thing is that you can think of static analysis as building facts about the program. You can for example start by assuming nothing and then adding facts about the program. And you need to iteratively propagate these facts from one line of code to the next. But you can also start by assuming the universe and removing facts from this universe.
Some classes of program analysis are safe to stop early. For example, if I have a static analysis that tries to find the target of virtual calls (also known as a devirtualization), you can stop early after a time out. Not finding the target just implies a missed optimization.
There are some other classes of program analysis that are not safe until the algorithm finishes. For example, if you have to prove that two variables do not alias each other, you cannot stop until you have all possible points-to sets and verify that for each of those two variables, their points-to sets do not overlap.
So, given the above restriction, the first class (early termination) is perhaps more desirable and throwing more compute time would yield a better approximation. For the second one, it wouldn't.
Another thing to keep in mind is that most of these data flow frameworks are not easily parallelized. The only paper I've read (but I haven't kept up with these avenue of research) that implemented a control flow analysis in the GPU is the following:
* Prabhu, Tarun, et al. "EigenCFA: Accelerating flow analysis with GPUs." ACM SIGPLAN Notices 46.1 (2011): 511-522.
I'm sure people are working on it. (I should mention that there are some program analyses written in Datalog and Datalog can be parallelized, but I think this is a processor based parallelization and not a GPU one).
The third thing is that when you say whether we are limited by algorithms or compute, I think it is important to note that it is impossible to find all possible facts *precisely* about a program without running it. There is some relation between static program analysis and the halting problem. We want to be able to guarantee termination of our static program analysis and some facts are just unobtainable without running. However, there is not just static program analyses, but also dynamic program analyses which can analyze a program as it is running. An example of a dynamic program analysis can be value profiling. Imagine that you have a conditional and for 99% of the time, the conditional is false. With a virtual machine, you can add some instrumentation to find out the probability distribution of this conditional and then generate code without this condition, optimize the code, and only if the condition is false then run a less optimized version of the code with an additional penalty. Some virtual machines already do this for types and values. Type profiling and value profiling.
One last thing, when you say a meaningful performance boost, it depends on your code. If your code can be folded away completely at compile time, then yes, we could just generate the solution at compile time and that's it. But if it doesn't or parts of it cannot be folded away / the facts cannot be used to optimize the code, then no matter how much you search, you cannot optimize it statically.
Compilers are awesome :)
As an addendum, it might be desirable in the future to have a repository of analyzed code. Compilers right now are re-analyzing code on every single compile and not sharing their results across the web. It is a fantasy of mine to have a repository that maps some code with equivalent representations and every time one does a local compile it explores a new area and adds it to the repository. Essentially, each time you compile the code, it explores new potential optimizations and all of them get stored online.
yes, this is obviously what you want. there shouldn't be a notion of 'running' the compiler; it should simply be a persistent process. importantly, there are two issues, which are orthogonal but both critically important: iterativity—that is, produce an acceptable result immediately, but then continue to refine it—and incrementality—that is, when making a small change to the code, take advantage of information learned about the previous version.
there is this recent paper on incrementality https://arxiv.org/pdf/2104.01270.pdf, though it has some caveats (e.g. it does not handle iterativity at all, so there is only so far it can go; can do much better)—but it is interesting non-technically because the authors work for amazon, so perhaps some of this is headed for industry
> it might be desirable in the future to have a repository of analyzed code.
We did a code verifier back end for a small app store. Were mostly MScs + some Phd consulting.
The world is indeed waiting for a CloudOptimizingCompiler (CoCo?!) - there are so many possibilities for shared compute, library caching, and machine learning.
What papers would you recommend for learning implementing dataflow analysis? For example, foundational or tutorial papers.