Hacker News new | past | comments | ask | show | jobs | submit login

It's a brute-force linear search. _gethtent() returns one line from the hosts file at a time, the loop itself is in _gethtbyname2(). It opens and parses the hosts file every single time. If I were asked to improve this, I'd probably open and parse the file only once, reparsing when the file's been updated, maybe use an on-disk cache file. Second change would be to use a hash table. I don't think anything as sophisticated as a Bloom filter is necessary unless you have truly huge hosts files.

Since gethnamaddr.c appears to be BSD-licensed I'm willing to bet that 99% of all OSs out there (including Windows) are going to have similar if not identical code.




I see. Thank you!

There's a few packages that take a serious approach at adblock-through-hosts-file, including hostsblock[0], linked below. The cost of the systematic linear search is mitigated by the use of a DNS caching daemon, such as dnsmasq or pdnsd.

Indeed a Bloom filter would be overkill, and I'd rather avoid false positives!

0: http://gaenserich.github.io/hostsblock/




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: