It doesn't look like the architecture in this paper supports pervasive speculation, which the Mill can via its "Not-a-Result" (NAR) bits on each data element. Vectors on the Mill have such a bit both for the entire vector, and for each element within the vector, which (along with other novelties) is supposed to help both with vectorizing and software pipelining code that can't be on conventional architectures. Ivan emphasizes in the videos that these features are what make the Mill capable of high ILP, which is why they go very wide (33 on "gold").
Short loops which have very few iterations and are not very long running comprise anywhere from 50-60% of the execution time on SpecINT type workloads (we'll extrapolate that to general purpose code). These are the types of things that kill software pipelining. Furthermore, sometimes these loops have dependencies between iterations.
As a sidenote, the primary advantage of the Mill in software pipelining seems to be that they can software pipeline loops without a need for a prologue or epilogue and thus save a lot of cache space.