All software will use V once it is available. Any operation done in a loop over fixed size elements can have V applied to it. It is much more powerful than SIMD.
I must admit, I've yet to find a performance relevant loop that I can't do in simd that doesn't also have dependencies on previous iterations such that no magic instruction set is going to help unless it's capable of time travel.
LLVM's Polly[0] can often do magic there, by renumbering[1] the iteration space.
Where variable vector length instructions help is decoupling the chunk size from the machine code, because they take care of the remainder that doesn't fit in whole vectors/chunks in an agnostic fashion.
It's so you can get at least most of the gains from wider vector units without needing to change the code.
Right, but portable code != portable performance. See also: OpenCL.
There's also the observation "keep simd vectors small but many" (e.g. Apple's arm chips) over "super long vectors" (intel avx512) is superior as it is much more flexible whilst delivering similar performance for tasks that are amenable to larger vectors. Having an architecture pushing towards the latter seems a retrograde step to me.