Hacker News new | past | comments | ask | show | jobs | submit login

Depending upon the type of content, one might want to look into using the Readability (Browder's reader view) to parse the webpage. It will give you all the useful info without the junk. Then you can put it in the DB as needed.


Btw, readability, is also available in few other languages like Kotlin:


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
