Hacker News new | past | comments | ask | show | jobs | submit login

The server is written in Go. The news scraper is written in python. I don't have the full list of sites offhand, but I'll try and get back to this comment!



So your Python scraper has a list of websites whose home pages it scrapes?


It mainly uses RSS feeds versus HTML scraping, but there are a couple of news sources where HTML is required.


You wrote elsewhere that your output HTML is highly optimized. I viewed the source of the page and I can see that it contains the CSS inside <style> tags and images embedded as base64 encoded text - what software do you use to generate this output?


The python scraper base64 encodes all the images, so that data is prepared for the web server. The python library being used is PIL. The CSS is hand-written in a Go template. The goal was to have all the rendering done in a single request, so I choose to embed the CSS and relax the CSP on styles.


Thank you for sharing. This is seriously cool stuff. Please write all this up in a blog post and do a Show HN. :-)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: