Could also have speech problems. Could be lazy. Could want to save time. Could be useful at producing consistent CC information across mediums. Could allow people to choose arbitrary voice synthesis in the future which super futurists may like the idea of. Could have used a translator to produce the text (I haven't listened) and not know English atall.
Personally, I'll take the human voice unless you literally cannot speak (e.g. disability) or feel uncomfortable.
Many academics like the ability to "compile" latex, and probably want to "compile" their videos too, complete with autogenerated script. That way, when they make a small change to their source code, the new version will autogenerate a new video with an updated script.
I like real voice but I could see if you were generating lots of videos and doing so in multiple languages, you could abstract that away a bit and dynamically generate videos these days. This is part of the reason video containers are often separated into components/layers (for video and audio tracks). I dont see why you couldn't also have the subtitle data read and generate the audio dynamically based on language. Some of this probably already happens somewhere by some group. Just an idea that I found interesting, similar to composing documents with LaTeX etc. Think of the audio as the "presentation" layer for a lot of visual frameworks used and think of similar structures for audio. It's especially useful for videos where the speaker isn't visible so syncing audio with lip movements across languages isn't a problem.
Personally, I'll take the human voice unless you literally cannot speak (e.g. disability) or feel uncomfortable.