Show HN: TiloDB – serverless entity resolution technology

nathanappere · on June 15, 2021

Interesting use case.

Some quick feedback on the homepage: - "predictable": serverless is not exactly "predictable". On the other end, having a set number of instances is :) - "low cost": this assumes specific patterns of usage under a threshold where it's profitable to use serverless. That's entirely "use-case dependant" & a bit misleading. - "constant speed": so you're saying that the worst case scenario is bounded to 150ms, no matter the data set?

Pointing these out because it makes it read as "yet another DB vendor with some unrealistic claims" :)

skafoi · on June 15, 2021

Let's start with the last thing: It is close to constant speed - as pointed out in the article, the more data sets you have, the longer it takes to download and aggregate the entity. But the search itself is always the same steps and therefore constant. Obviously, having an entity with 10.000 data sets in it will not load in 150ms - but finding the place where the full data is stored is easily doable in that time.

I agree, that having a server which you bought once (or rented) is predictable cost. But what happens in cases of burst? You would have to buy another server (without knowing if you will still need that tomorrow). Predictable in this case means, that since you have the same steps for each request, you can tell the exact the cost per request. And that is from my point what is important.

skafoi · on June 15, 2021

Just had another look on what is written on the home page regarding "constant speed". This is indeed misleading. Sorry for that.

Major_Grooves · on June 15, 2021

We were the tech team at a European consumer credit bureau. We could not find any technology that could handle our scale of data with the required speed, efficiency and cost. So we spent the last three years developing a new entity resolution technology that we call TiloDB.

We think TiloDB has lots of potential outside this credit bureau, so hope to release the technology as open source software in the near future.

We would welcome your comments, thoughts or potential use cases. In the article you will find a description of the technical challenge we faced and how we solved it. There is also an interactive demo so you can play with the entity resolution yourself, and see the livestream of data submitted by other people.

yamalight · on June 15, 2021

That looks pretty neat, although not all things are exactly clear.

1. You claim that existing graph databases were not fast enough - do you have any benchmark data that compares them with your solution on given dataset?

2. From the description - it seems like you are focusing purely on Person type of data - is that correct? Or is that just the first use case / demo?

3. Do you support more advanced query langs, e.g. SPARQL?

edit: formatting

skafoi · on June 15, 2021

1. We indeed made benchmarks when we started with this. But since this is quite some time ago, I would not call for the exact numbers right now. For our use case, there are basically two extrems. a) everything being the same data: assuming proper deduplication happened, meaning having one node and everything else is still one node away without being connected with each other, then this is still the case where graph databases work quite ok (somewhere around 6 seconds if i remember correctly) b) having a long chain of data: A->B->C->D (in that use case basically a person who moves very often). beside having to write an utterly complex query for that, I remember that I was not able to receive any results within an excaptable time.

2. That is only for that use case. But the underlying matching library we developed can work together with any kind of structured data. It would be interessting to actually use it in some other contexts as so far we have not tested that out yet.

3. Currently no - pure GraphQL api currently. But I was thinking about that. In order to actually support something like this, it would be very interessting to also focus on cross entity linking to make it really cool. We have something like this, but didn't really focus on that yet.

yamalight · on June 15, 2021

1. Since there are plans to open source - would be very interested to see benchmarks published alongside code!

2-3. Got it, thanks!

ewe · on June 15, 2021

Neat! I can see few use cases, where it would make sense in our case. Is it publicly available?

Major_Grooves · on June 15, 2021

Not yet I am afraid. We need to make some work to make it available as OSS. If you sign up on the website we will let you know when it is released. Also happy to discuss your use cases in the meantime.

skafoi · on June 15, 2021

Not yet. But we hope to make it open source in the future.