Not really, publication quality. Just a Sunday-morning effort to understand the ceiling on the phased-execution model. All resources like decode, #pipes, cache and branch prediction are unmodeled, assumed infinite. I'm sure anyone can improve on it, but I lost interest with disappointing results.
gunzip the data, run it through the script and count the lines out vs lines in. (I get "IPC" ~2.12.) For a bit more insight, the number on the left of each output line is how many instructions fused. You can make a histogram like:
To think I messed around with Valgrind ;) This is much easier.
I look at the particulars of the grouping and think 'no, that's not right', and 'that can't be with this', and 'that can be with that', and so on. The results are far too back-of-the-envelope to be right. But the approach is excellent, and that's the main thing :)
I'm going to crunch through the IR recording and work out the ILP with/without speculation etc. Will leave pipelining and auto-vectorization off the table. As this is all public stuff I reckon the results can be made public.
Wanna work with me on this? I'd like you back. This time it might work. Mail me :)
https://pastebin.com/14E7X41j
I dump the instruction stream for coremark out of LLI interpreter, hacked a tiny bit. To save trouble, here's one iteration of coremark:
https://drive.google.com/open?id=0B0ygb7T_Ab3kM01DQnlvQi1teG...
gunzip the data, run it through the script and count the lines out vs lines in. (I get "IPC" ~2.12.) For a bit more insight, the number on the left of each output line is how many instructions fused. You can make a histogram like: