Hacker News new | past | comments | ask | show | jobs | submit login

Umm.

"If two records had a similar name and the same date of birth, or a similar name and exactly the same postcode, then they were considered a match. Important: Companies House data only publicly provides the month of birth - not the day. In a general population that would not be reliable for matching identities because there is a high probability that two people with the same name are born in the same month and year. However, given company directors represent a significant sub-section of the population, we felt that this matching would be acceptable for a public showcase."

So in the (much much much more common) case of all the directors called John Smith, rather than the directors called David Robert Joseph Beckham, just how reliable is this assumption?




Indeed, unfortunately with the John Smiths of this world there will be false positive matches. What we could do is add that they need to be from the same town/postcode, but then that is quite an unreliable attribute too.

Similarly, there are a lot of false negatives where we know two records should match, but we could not because that would require a rule that would create more false positives.

In the end, it was the best we could do with the public view of the data. If we were working with the data Companies House actually holds itself, it would of course be much better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: