I think the difference is that the browser already contains the shadow dom needed to render the video tag. But for a custom tag it would need to fetch, parse and execute your component code before it can begin to render.
That's an interesting read. I'm thinking on whether the real question is what is the declarative difference between the video tag, which by normal application lets everyone feel find about that content hidden by the shadow DOM vs the APIs we're developing for our custom elements. There seems to be an important piece of learning, not just for SSR, but for custom element development in general, that I think we can build off of.