Short loops which have very few iterations and are not very long running comprise anywhere from 50-60% of the execution time on SpecINT type workloads (we'll extrapolate that to general purpose code). These are the types of things that kill software pipelining. Furthermore, sometimes these loops have dependencies between iterations.
As a sidenote, the primary advantage of the Mill in software pipelining seems to be that they can software pipeline loops without a need for a prologue or epilogue and thus save a lot of cache space.
As a sidenote, the primary advantage of the Mill in software pipelining seems to be that they can software pipeline loops without a need for a prologue or epilogue and thus save a lot of cache space.
Elbrus (https://www.google.ch/patents/US20020133813) had this too, didn't help them get good ILP.