Another approach to completeness could be to remove noise from the original page instead of parsing just the text from it. In the worst case the page isn't changed at all but it's still usable (like when ad blockers miss some ads). [0]
I assume you only want always-on reader mode for articles -- and detecting what's an article is another NLP problem. Yet both the completeness and article detection can probably be solved through heuristics in 90% of cases (the evidence is that we DO use reader modes). Maybe it depends on how much the last 10% frustrate you.
I assume you only want always-on reader mode for articles -- and detecting what's an article is another NLP problem. Yet both the completeness and article detection can probably be solved through heuristics in 90% of cases (the evidence is that we DO use reader modes). Maybe it depends on how much the last 10% frustrate you.
[0] I'm working on a browser extension that does this: https://github.com/lindylearn/unclutter