+1 for text only information. I investigated this a few years ago with the inten...

+1 for text only information. I investigated this a few years ago with the intention of auto-generating a news podcast for my train commute. The audio part never happened but I auto-created a text document and synced that to my phone every morning and read it on the train.

I used RSS and a simple home brew page scraper to fetch the data and compile the document. However this was a pain to update and I found that news sites didn't always manage their feeds very well.

I wouldn't mind resurrecting this project so are there any recommendations for open source web scraping software that I could try, rather than rolling my own?