Hacker News new | past | comments | ask | show | jobs | submit login

"Without that, Reader Mode would look for the first HTML element that contained the string "author", assume its content was the name of the author of the article, and make it a byline right below the title. Of course the element it picked was a section header, with the content "Completing an authorization"."

Why would a section header in your article contain "author" if you didn't mean for it to designate who the author of the article was?




> Why would a section header in your article contain "author" if you didn't mean for it to designate who the author of the article was?

It's bad substring matching. It's "authorization", but apparently Firefox just interprets that as "author".


Oh, I see. Yeah, that's a bug in the design, because it's trying to infer metadata from unstructured content.

There's no reliable way to do it. The right way would be to only use explicitly defined metadata and if it's not there to default to showing nothing.


Presumably there is a

    class="authorization"

or something like that in the enclosing element, so it's not that they're inferring metadata from unstructured content - it's that they're drawing bad inferences from the metadata provided.

That said, inferring metadata from unstructured content is literally the whole goal of Reader mode - to make pages more readable even if the original source didn't design it to be - so while this particular bug is avoidable, others may not be.


"inferring metadata from unstructured content is literally the whole goal of Reader mode"

Is that the whole goal, or is it to get around the design choices of the original?

These are two different things.


We can call it wrongly structured. It might be unstructured, or it might be badly structured due to design choices (including user hostile ones).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: