I wrote about a (relatively) new class of join algorithms for DBMSs that are cal...

metadat · on Feb 28, 2023

Damn, hadn't seen Kuzu before but the Cypher-esqe query language is beautiful, nearly on par with RedisGraph. This is a rare thing in the world of Graph databases, most of the query languages are horrid, even ones claiming cypher compliancy rarely live up to the hype. I'll have to check out Kuzu.

semihsalihoglu · on Feb 28, 2023

Please do :). We're a bit early but very actively developing the system!

namibj · on Feb 27, 2023

Tetris doesn't seem impractical for analytical joins, though?

Granted, it needs a good datastructure; I'm waiting for a friend to find time to discuss options there.

I'm firmly of the belief that beyond-wcoj is a question of implementation, not of theoretical suitability in practical scenarios.

semihsalihoglu · on Feb 27, 2023

If you have any good ideas about how to implement Tetris in a practical way, you can make a good impact here. This is the only work that I know that tried to implement these: https://arxiv.org/pdf/1503.04169.pdf (and excitingly published at GRADES workshop at SIGMOD, which I co-chaired twice and is very dear to my heart). But I talked to the authors and they all agree that despite the tone of the paper, they found them quite difficult to implement in a performant way.

So here's the problem. The core algorithmic step of "beyond wcojs" are "geometric resolutions". The core idea is to work with gaps in the space. So for example suppose we are joining two relations R(A), S(A) which is an intersection and suppose A is an integer domain. Suppose further than R's maximum A value is 100, and S's minimum value is 101. Then there is a gap of (100, \infty) in R's space and another graph (-\infty, 101) in S's space. If you "resolve/join" these gaps, you get a gap of (-\infty, \infty), which tells you in one operation that the join's output is empty.

On this simple query, this seems to work fine but if you have general relations (let alone non-integer data types) doing such geometric "resolutions" and finding efficient indices to index those "gaps" becomes quite challenging. But any good idea here will push the field!

mamcx · on Feb 27, 2023

I wonder if exist a byte-sized implementation of this algorithms.? As far I know all is part of big system...

semihsalihoglu · on Feb 27, 2023

Can you clarify what you mean by byte-sized implementations? There are several systems now that implement these algorithms, KuzuDB, Umbra, LogicBlox (the earliest was this) are examples I know. I'm sure more will come.

mamcx · on Feb 28, 2023

I mean one that could be seen in "one page"? Like a example using only arrays or similar, without the rest of the db?

sitkack · on Feb 28, 2023

What are your thoughts on "learned indexes"?

semihsalihoglu · on March 1, 2023

That's a topic quite separate from new join algorithms. I also don't know enough frankly but I don't yet see them being integrated in the systems that I'm aware of. I would be hesitant to put them inside KuzuDB since I don't see them resolving major performance bottlenecks. I think on the indices side, one interesting topic is to find more update-friendly disk-based CSR-based indices for graph DBMSs.

nerdfaktor42 · on March 1, 2023

Do you know of any research papers wrt "more update-friendly disk-based CSR-based indices"?

semihsalihoglu · on March 1, 2023

I think Dean De Leo's work in this space is good. It's certainly the right place to start. This work is on using packed memory arrays (pma) but is focused on in-memory versions of pma. I can recommend these two papers: Teseo: https://dl.acm.org/doi/abs/10.14778/3447689.3447708, and Packed Memory Arrays Rewired: https://ieeexplore.ieee.org/abstract/document/8731468. In Kuzu, I/we will be implementing a pma version of our disk-based CSR-baed join indices, which we also use to store relationship properties, so stay tuned for that!

nerdfaktor42 · on March 2, 2023

Thanks for the pointers, I'll look into them. I read your blog post series yesterday and found it very well written and interesting. Gonna read the CIDR paper, too. Fyi, I'm the author of https://github.com/s1ck/graph where we use a read-only CSR and my day job is https://github.com/neo4j/graph-data-science which is built on top of a read-only, compressed CSR. Maybe we can have a chat at some point :)

semihsalihoglu · on March 2, 2023

Of course :) Please feel free to write to me when you'd like to have a meeting. Happy to chat anytime! I'll also check out your graph library.