For video encoding, things like OMAP/et al already give you hardware video encoding next to the ARM, usually based on an arm "DSP" like cortex m3/m4.
Given this is infinitely cheaper than using general purpose cores, why wouldn't you just build yourself an army of cortex m3/m4's and use them for video encoding now.
ARM doesn't really need to do anything here to make this happen. Since they give you all the chip related stuff necessary to build your own, just go and do it. If you want 64 m3's, go make it.
All the architecture documentation is available. The instruction set is custom. There's a gcc port. The OS is hosted on the ARM, but several of their demos involve calculations offloaded to the Epiphany chip, otherwise there wouldn't be much point.
It is not SIMD. Each core is totally independent of the others. Basically the cores are structured in a grid, and all cores have 32KB in-core static RAM, but can also access the memory of all the other cores (at the cost of a few cycles delay), or main memory.
You could certainly use them as much like a SIMD if you wanted to, but then you'd likely get better performance out of a GPU.
We'll see when they deliver on their kickstarter project what type of performance people get out of it in reality...
Nvidia intends to do that with Project Denver/Boulder (2014) with 64 bit chips, and supposedly a new Maxwell-based GPU. I think they will be making 16-core on die, but they'll also help their customers make 512-1024 core racks. They already did it with Tegra 3 with some customer from Barcelona.
Applied Micro also intends to introduce a 128 core SoC in 2014:
Actually, video encoding for most codecs is a task which doesn't lend itself to parallelism due to temporal encoding. Because each frame depends on the last one, you really don't gain much from going to multiple cores (at least yet).
To start encoding at different times with each core you would need a really fast (read: expensive) storage (every core will need a steady stream to compress, if you have 64 cores you need a storage that can handle 64 parallel requests really fast or a lot of RAM to cache everything) and the chips have to handle really heavy I/O. The current many-core ARMs only have a small amount of dedicated RAM for each core.
Also, keyframes are not known from the start in many encodings. A keyframe is inserted when the difference between a frame and the next one is big; this can only be detected when encoding, or with a multi-pass encoding.