It begs the question whether or not any current compiler optimizations for a new theoretical VLIW-ish machine (Mill?) would prove to be an effective leg-up on the Itanium.
Being a bit more charitable, I think the problem is that people look at VLIW generated code and think 'wow that's so wasteful look at all the empty slots' without realising those 'slots' (in the form of idle execution units pipeline stages) are empty in OOO processors right now anyway. The additional cost is in the ICACHE, as already described.
Also, these days you would pretty much just need to fix LLVM, C2, ICC, and the MS compiler, and almost everyone would be happy.