Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for the detailed reply. :-)

> As for the aggregation (grouping) algorithm, I'll just say that it's straight out of the textbook http://infolab.stanford.edu/~ullman/mmds/ch3.pdf

So, in other words, you're using the MinHash algorithm as well as Locality-sensitive hashing (LSH)? How much volume are you able to process in how much time?

By the way, I first learned about this topic through Stanford’s “Mining of Massive Datasets” (MMDS) course that used to be free on Coursera. So it's thrilling to see someone put it to use in the real world and talk about it, too! :-)




Yup, MinHash with LSH. It's quite fast and low compute intensive, because articles shown are limited by recency (e.g. past 24 hours), say order of hundreds and thousands in a few seconds. Someone wrote an open source LSH on github on Golang, so no credits to me :) Probably would not have been able to code LSH myself.


It would be awesome if you blogged about your entire experience setting up your news aggregator. But I guess your first priority is PageDash these days so I can keep dreaming. :-)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: