HTTP/1 has parallelism limitations due to number of simultaneous connections (both in terms of browser and server). HTTP/2 lets you retrieve multiple resources over the same connection improving parallelism but has head of line problems when it does so. HTTP/3 based on Quic solves parallelism and head of line blocking.
The only way the first part of your sentence is correct means that the second part is wrong. HTTP 1 pipelining suffers from head of line blocking just as badly (arguably worse) and the only workaround was opening multiple connections which HTTP/2 also supports.
Given the alternative, switching to a new protocol, requires much more code changes than fixing bad implements of the old protocol, your argument falls quite flat.
There are very real reasons why increasing the number of connections isn’t a thing and that has to do with the expense of creating and managing a connection server side. In fact switching to HTTP/2 was the largest speed up because you could fetch multiple resources at once. The problem of course is that multiplexing resources over a single TCP connection has head of line problems.
Could this have been solved a different way? Perhaps but QUIC is tackling a number of TCP limitations at once (eg protocol ossification due to middleware).
Server-side there is no difference between a different context per TCP connection for a HTTP1 server and a different context per HTTP2 stream within a HTTP2 server.
If anything the HTTP2 requires more state due to the nesting nature of the contexts.
There 100% is. You’re thinking about user space context. But establishing a new socket in the kernel requires a fair amount of under the covers book keeping and the total number of file descriptors keeping those resources alive is a real problem for very popular web services that have to handle lots of concurrent simultaneous connections. At the application layer it’s less of an issue.
HTTP/2 is indeed more complex to maintain in the application layer but that’s less to do with the memory implications of multiplexing requests /responses over a single TCP connection.