This seems to be a prime example of the Pareto Principle, or the 80/20 rule. About 12 years ago, I had built something to do something similar and published how I did it [1]. It was a very simple process, which my younger self called "the meat algorithm", as in, how to get the meat of an article.
It was far less code and worked perfectly 95% of the time (though, the average web-page was also a little simpler 12 years ago). But that code would have quickly ballooned out if our use-case had called for addressing the other 5% of webpages, as the Firefox Reader View must do.
It was far less code and worked perfectly 95% of the time (though, the average web-page was also a little simpler 12 years ago). But that code would have quickly ballooned out if our use-case had called for addressing the other 5% of webpages, as the Firefox Reader View must do.
[1] https://www.alfajango.com/blog/create-a-printable-format-for...