Hacker News new | past | comments | ask | show | jobs | submit login
Write an Internet search engine with 200 lines of Ruby code (saush.com)
35 points by nickb on March 17, 2009 | hide | past | favorite | 6 comments



Web search only becomes interesting when you scale it up. At small scale it's just fetching pages and tying together library code, which is fine but not particularly interesting or educational. Once you scale up, though, it involves tons of complicated and fascinating topics (math, distributed systems, compression, networking, query planning, databases, etc.). You also start having to make trade-offs among features, which is often a good sign that a problem is complex enough to be worthy of your time.


Did he really need to engage in monkeypatching even in this small example?

Do you really pollute a class as fundamental as String as cavalierly as in this code?


There's absolutely no reason. Then again, there's plety of bad code to go around in this crawler. Just a few nuggets:

* Magic strings with no comments.

* Global variables

* Attempting to write his own URI parser.


Eh. There are different levels of monkeypatching. I have no problem with extending even a fundamental class with a new method if it's done with care, despite the potential downsides. I think the upside of monkeypatching can outweigh the downside when it facilitates tight, readable code (not that I'd consider the code in this post an example of that). The real problems show up when changing the behavior of existing methods, and that's where I think the downsides often outweigh the upsides.


It's actually not that difficult to write a text database from scratch (without libraries like Lucene). Most text databases are based on inverted indexes at their core. Implementing basic boolean operators turns out to be pretty easy to implement as well.

It's one of the exercises I like to do when learning a new language. I'd recommend giving it a shot if you haven't tried it before.


Or, you could use Y! Boss in 3.

http://github.com/jpignata/bossman-gem/tree/master

However, this is a good first toe-in-the-water for people who want to get more into ruby and web-crawling.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: