libaom is slow because it's fundamentally a research product - the folks hacking on it are mostly concerned with the workload of encoding a basket of twenty short clips and demonstrating a tradeoff between Bjontegaard scores and increased encode time, and much less on holistically making a usable encoder.
Source: wrote part of the AV2 spec and traded off a 0.3% improvement in compression for a 1% increase in encode time
The benchmarks below show you need very powerful machines to get software encoding up to the sort of frame rates that hardware encoding will give you even on low end CPU/GPU combinations.
Even given the numbers in the benchmarks below I think the cheapest of Intel cards are going to make alot more sense than using the CPU of some honking great machine to do software av1 conversions.
As usual with benchmarks it takes some effort to understand what they mean.
Trying to get some tangible information into it with minimal effort:
Some benchmarks - use the dropdown to choose different preset/resolution combinations.
I'm expecting next gen video codecs to all be neutral network based.
Ie. every frame is generated from the previous internal state and a few more bytes of input data which defines how the car moves or who the bad guy shoots.
Current state of the art systems require huge GPU's and minutes per frame. But smaller models will do a decentish job faster, and better hardware and algorithms are coming out every few months.
Sure, that's being predicted for about 10 years now. I think it's far more than one generation away simply because you'd need a decoder everywhere, and that takes a boatload of standardization. JpegXL failed to gain traction and got dumped for instance.
Best we can get next gen would be an experimental fork of x265 or some av1 codec with neural net baseline to improve on the defaut, like mkv was at first.
I worked at a startup (WaveOne) doing neural net-based compression. Your description of the state of the art seems a bit off, here's a 2021 paper [1] from WaveOne for comparison. Large GPUs, maybe, but only seconds (or less) for 720p+ frames.
I can understand the disagreement with your comment in general, but this is a media codec. Its time to just stop with the unsafe memory languages in this domain. While I am personally more inclined to use Rust, there is that provable language from Google designed exactly for this. Its blatant negligence at this point, and liability seriously needs to be on the table.
A redeeming factor is that approximately no one is using the SVT-AV1 decoder, it's almost only for encodes, where you don't care as much since there's nothing privileged in your encoding container besides the current input being encoded
(But... the actual decoder everyone uses is dav1d, and that's majority straight ASM, with a little bit of C glue code. So. Still not a solved problem.)
Get a cheap Intel Arc GPU instead like the SPARKLE Intel Arc A310 or SPARKLE Intel Arc A380 ELF 6GB.