> ... like ACID. shelve doesn't provide this, and IIUC nor does kdb+. So if you'...

Veedrac · on March 11, 2020

I mean, sure, if your problem is such that a strategy like that works for you, I'm not going to tell you otherwise. You can log incoming messages and dump data out to files easily with Python too. I wouldn't want to call that a ‘database’, though, since it's no more than a daily archive.

geocar · on March 12, 2020

Yes! "databases" are all overrated too. Slow expensive pieces of shit. No way you could do 50 billion inserts from Python to SQL server on a single core in a day!

I'm so glad k isn't a "database" like that.

Veedrac · on March 12, 2020

It's a bit unfair to compare the speed of wildly inequivalent things. RocksDB would be more comparable, but even there it is offering much stronger resilience guarantees, multicore support, and gives you access to all your data at once.

Calling them expensive is ironic AF. Most of them are free and open source.

geocar · on March 12, 2020

> It's a bit unfair to compare the speed of wildly inequivalent things.

Yes, but I understand you keep doing it because you don't understand this stuff very well yet.

> RocksDB would be more comparable

How do you figure that? RocksDB is not a programming language.

If you combine it with a programming language like C++, it can, with only 4x the hardware just about keep up 1/6th of my macbook air[1].

RocksDB might be more comparable to gdbm, but it's not even remotely like q or k.

[1]: And that's taking facebook's benchmarks at their word here, ignoring how utterly synthetic these benchmarks read: https://github.com/facebook/rocksdb/wiki/Performance-Benchma...

> much stronger resilience guarantees,

You're mistaken. There is no resilience guarantee offered by rocksdb. In q I can backup the checkpoints and the logs independently. It is trivial to get whatever level of resilience I want out of q just by copying regular files around. RocksDB requires more programming.

> gives you access to all your data at once

You're mistaken. This is no problem in q. All of the data is mmap'd as soon as I access it (if it isn't mmap'd already).

> Calling them expensive is ironic AF. Most of them are free and open source.

If they require 4x the servers, they're at least 4x as expensive. If it takes 20 days to implement instead of 5 minutes, then it's over 5000x as expensive.

No, calling that "free" is what's ironic, and believing it is moronic.

Veedrac · on March 12, 2020

> How do you figure that? RocksDB is not a programming language.

I'm comparing to the code you showed. You're using the file system to dump static rows of data. All your data munging is on memory-sized blocks at program-level. Key-value stores are the comparable database for that.

> You're mistaken. This is no problem in q. All of the data is mmap'd as soon as I access it (if it isn't mmap'd already).

Yes, because you're working on the tail end of small, immutable data tables, rather than an actually sizable database with elements of heterogeneous sizes.

> In q I can backup the checkpoints and the logs independently. It is trivial to get whatever level of resilience I want out of q just by copying regular files around.

Yes, because you don't want much resilience.

---

What you're doing here is incredibly simplistic. It's not proper resiliency, it's not scalable to more complex problems, and it's not scalable to larger workloads. An mmap'ed table and an actual database are different things.

It works fine for you, but for many other people it's not.

geocar · on March 12, 2020

> You're using the file system to dump static rows of data

That's what MySQL, PostgreSQL, SQL Server, and Oracle all do. They write to a logfile (called the "write ahead log") then periodically (and concurrently) process it into working sets that are checkpointed (checked) in much the same way. It's a lot slower because they don't know what is actually important in the statement except what they can deduce from analysis. Whilst that analysis is slow, they do this so that structural concerns can be handed off to a data expert (often called a DBA), since most programmers have no fucking clue how to work with data.

That can work for small data, but it doesn't scale past around the 5bn inserts/day mark currently, without some very special processing strategies, and even then, you don't get close to 50bn.

> All your data munging is on memory-sized blocks at program-level.

That is literally all a computer can do. If you think otherwise, I think you need a more remedial education than the one I've been providing.

> What you're doing here is incredibly simplistic. It's not proper resiliency, it's not scalable to more complex problems, and it's not scalable to larger workloads. An mmap'ed table and an actual database are different things.

Yes, except for everything you said, nothing you said is true in the way that you meant it.

Google.com does not query a "database" that looks any different from the one I'm describing; Bigtable was based on Arthur's work. So was Apache's Kafaka and Amazon's Kinesis. Stream processing is undoubtedly the future, but it started here:

https://pdfs.semanticscholar.org/15ec/7dd999f291a38b3e7f455f...

Not only does this strategy get used for some of the hardest problems and biggest workloads, it's quite possibly the only strategy that can be used for some of these problems.

Resiliency? Simplistic? I'm not even sure you know what those words mean. Putting "proper" in front of it is just weasel words...

goto11 · on March 12, 2020

> I'm not even sure you know what those words mean.

I don't think you are doing your case any favors with such rhetoric.

geocar · on March 13, 2020

It wasn't rhetorical.

Veedrac · on March 12, 2020

Ight imma head out, this is clearly not a good direction for the discussion.