As someone who generally finds "couple million rows? Sqlite is fine" works as a philosophy, what generally should I know here?
I get that there are huge timeseries based data flows going round the world (transactions, metrics, weather, self driving telemetry) and sticking those into something is useful.
I understood the world has settled on "use hdfs to store the data, use apache spark to run over those files and reduce your query data set till it is small enough to fit into pandas and then process"
This looks like it is trying to cut out the first two step?
Ingress endpoints which is usually the more annoying component, rather than the exact storage format.
> As someone who generally finds "couple million rows? Sqlite is fine
There's still unsolved annoyances with analog timeseries measurements, which aren't easily solved in SQL with a columns + btree index.
So combining an airplane altitude sensor at 0.1Hz with a 10Hz fuel sensor, starts giving you trouble in SQL because the question someone might ask is "when the plane was at 31,000 feet, at 0.85 mach, what was the fuel rate", while the plane might have gone from 30,999 to 31,001 feet without ever recording the intermediate point.
Indirectly speaking, you can do range-scans for each measurement if you consider time as something that is different from a generic column value in a generic SQL engine (to join rows by temporal proximity).
The ASF proposal from IoTDB[1] might make interesting reading on the topic (the frequencies part) or the Boeing talk from XLDB 2018 [2]
> So combining an airplane altitude sensor at 0.1Hz with a 10Hz fuel sensor, starts giving you trouble in SQL because the question someone might ask is "when the plane was at 31,000 feet, at 0.85 mach, what was the fuel rate", while the plane might have gone from 30,999 to 31,001 feet without ever recording the intermediate point.
PostgreSQL allows for all of that: User-defined types with user-defined index structures and whatnot. PostgreSQL is great and flexible, yet transaction safe and fast. You should really try it. It is one of the highest-quality FLOSS projects I have ever seen, not just in term of code quality, but also in terms of project and community management.
I did a startup solving this problem (yet again) in an APL; we sucked at enterprise sales, but our data structures went rather faster than BTrDB. The most famous example of solving this problem being Kx systems. Less famous, but basically using the same ideas (these guys all worked with Art and the Iversons):
http://marketgridsystems.com/
And no, I won't explain it in detail. I consider the spark team basically cretins, and it would ruin several of my pals businesses while transferring wealth to hedge funds, exchanges and advertising groups making solutions to this problem available for free. I'll continue to snigger up my sleeve at the fact that decades of open source development still haven't noticed things which were obvious in the 1960s.
Edit add: a great example of this kind of thing, if not this exact thing: some poor guy trying to solve a 25tb problem using spark, parquet files ... finally solving it with ... Awk. Old tools written when 1mb was a lot of memory; sed, awk, tr, cut: vastly better than new tools.
The nodes of the tree resemble Clojure's HAMTs, just skipping the hash. It's interesting, because Clojure itself opts for RB trees when building sorted maps, but for keys that can be cast to a bit sequence (strings, integers), AMTs are a pretty good option.
My side project is in some respects quite close to BTrDB; tree based analytics. Stumbling across it made my day, because it appears to have vindicated some design choices I was unsure about.
I've reimplemented Clojure's data structures in Clojure, and made the nodes serialize to a garbage collected KV backend (ardb or Redis depending on requirements). This means that they can persist between program runs, be sent over a network, and don't need to fit in memory. Using structural sharing, you can take a big hash map, add a single key to it, and have both the old and the new side by side.
The next piece is cached map/filter/reduce, which saves intermediate results on each node, making batch jobs incrementally compute. You can take a big hash map on disk (my main one contains ~20gb, a map of string->set of data points), add some more values to it, and recalculate the analytics of the new map in a few milliseconds, having both side by side.
The ultimate goal is to enable you to write more or less idiomatic Clojure programs that behave the same whether you're reducing a map with a thousand entries, or a trillion (e.g. by firing up a bunch of AWS Lambda instances).
Not yet. I'm using it in prod for my own analytics, so I know it works, but it'll need some clean up and features before release. I do plan on it though.
Those interested in TSDBs might also be interested in M3DB.io which is Apache 2 licensed, supports a modified TSZ compression implementation with 11x compression ratio, replication of data in the cluster, streaming of data between nodes on add/remove and a Kubernetes operator.
The compression technique to achieve the 11x compression ratio is pretty clever and is detailed in the Gorilla TSDB paper. Timestamps are encoded with a delta of delta encoding scheme which allows 96% of all timestamps to compress to a single bit within a compressed block.
It is definitely a huge step forward for time series data collection at scale, great props to the authors.
With M3TSZ vs vanilla TSZ the major difference is that instead of XORing the value component of each datapoint it is determined if the precision of the float value is within a few significant digits and if so, it is turned into an int representation with delta-of-delta of the int value from value to value used to represent the different changes in value rather than just XORing the float bits. Also tracking how many significant digits and dampening the need to encode a change in that from value to value is also performed.
We should write up the specific algorithm differences for the value component encoding.
Thanks I learned something new. Its not general purpose and has some strict limitations. As usual this seems to skip concepts like quality/status of sensor (maybe qnan can be used as a sort of sentinel value) but is interesting newer entry in the field.
It look very interesting, and I am a bit surprised to see how a tree structure can deliver such high performances in retrieving the data. Querying a long period of time mean that each top node has to be walked to the bottom, right? Isn't that fairly expensive?
The aggregations on node level are also interesting. It is something that also the company I work for (https://www.energy21.com) does for calculations, but we use a relational database and a notification-based system for performing the recalculations and keeping the aggregations up to date.
I read about the kdb+ database from a post the other day as "the fastest" database, and the numbers given did seem incredibly impressive. How does this compare?
> This is an example of running the benchmark in a MacBook Air 11" using a pipelining of 16 commands:
$ redis-benchmark -n 1000000 -t set,get -P 16 -q
SET: 403063.28 requests per second
GET: 508388.41 requests per second
Testing against a Macbook Air is a reasonable, as most developers will have access to one or a windows machine of similar stats.
For instance, I have a 5 year old Macbook Air, and I run this test to benchmark (baseline) my database:
My / Redis test is not very interesting (not real-world use case), but it at least gives a baseline that removes a lot of variables.
Then every database should implement Jepsen-like tests for everything else. That is what we're trying to do, with creating a Jepsen-like distributed systems correctness (https://github.com/gundb/panic-server) and load testing tool in a language that is easy for any developer to access and use.
> ... as most developers will have access to one or a windows machine of similar stats.
That's a bold assumption. Looking around my office in India here, I find not a single Macbook, a Windows machine, or any machine from '11. It would also be trouble to get my hands on a Macbook Air '11, if I had to do it for the purposes of reproducing a benchmark. I certainly couldn't buy one today, and even if I were to buy a currently-in-stores Macbook it'd be awfully expensive.
A much better proposition would be something far more ubiquitous and accessible, something that can be borrowed trivially or bought easily by the vast majority of people, something that stays stable and available for a number of years. An AWS instance or a Raspberry Pi (1/2/3), for instance.
Thank you for calling this out. It was the spirit of what I was trying to go for, because the website use "60GB of RAM (EC2 c3.8xlarge)" which I wouldn't even pay for.
I totally agree with the RasPi suggestion. Though note, the RasPi 3 B+ is actually about as powerful as my 5 year old air, 1.4GHz quadcore vs 1.6GHz quadcore.
> 60GB of RAM (EC2 c3.8xlarge)" which I wouldn't even pay for.
The point isn't to restrict benchmarks to only high-end machines. The point is to use a stable standard that people wouldn't have trouble running for a few hours. I can rent a machine of that identical type for 20 hours, or buy an RPi, for the low price of $35.
> about as powerful as my 5 year old air, 1.4GHz quadcore vs 1.6GHz quadcore
Core counts and clock speeds do not translate to comparable computing power across processor families and instruction sets. E.g., the RPi 4 (1.5GHz, quadcore) is announced to deliver 3x the performance of an RPi 3B+ (1.4GHz, quadcore), and that's on the same instruction set.
> Though note, the RasPi 3 B+ is actually about as powerful as my 5 year old air, 1.4GHz quadcore vs 1.6GHz quadcore.
I don't think that's an apples to apples comparison. The Pi 3 doesn't have an out-of-order CPU so can't do instruction level parallelism. The new Pi 4 is better in this regard, but both are handily beaten by Intel CPU's from the last 10 years.
So even Berkeley has abandoned BSD? Or is GPLv3 part of the U.S. Department of Energy ARPA-E program (DE-AR0000430), National Science Foundation CPS-1239552 requirement?
BSD UNIX used the BSD license (among other things).
There is/were/always will be many projects, departments, professors, etc. doing things needing license at UCB,
and each will choose the appropriate one for their circumstances.
That's an interesting choice for a system intended to be high performance. Go's GRPC implementation has been suffering from extremely high resource usage due to https://github.com/grpc/grpc-go/issues/1455
Are use case driven DBs helpful? Is the use case and possible adoption crowd large enough to sustain a product this critical down the road?
I worry about the proliferation of infra level tools or adopting a new one. Why not make an extension or fork postgres to demonstrate your concept? (Just an example)
Timeseries databases have an entirely different set of technical challenges from relational databases. Also, I wouldn’t exactly call this proliferation. There are relatively few decent timeseries databases in existence, some of which have been around since 1999.
So yes, timeseries databases are incredibly helpful if you have to store, well, large amounts of timeseries data quickly.
for a view into the original berkeley db as still incorporated in freebsd base (BSD Berkeley DB v1; not the later lineage by same authors which led to sleepycat / oracle Berkeley DB).
> Why not make an extension or fork postgres to demonstrate your concept?
this isn't that kind of database.
this is a k/v store optimized around a particular use case,
k/v stores being what is more traditionally considered 'database' before
the term 'database' colloquially implied 'relational database managment system' (RDBMS - e.g. a system to manage your k/v database files relationally..)
well, techincally 'database' is any 'base of data'. but.. yes. 'base of data' -> 'k/v store' -> 'rdbms'
concievably, if desired, one would take this library and integrate it into a higher level application, whether that be a custom application datastore, or a new backend for something like postgres.
> BTrDB uses a K-ary tree to store timeseries data. The leaves of the tree store the individual time-value pairs. Each internal node stores associative statistics of the data in its subtree; currently, the statistics are the minimum value, mean value, maximum value, and number of data points. BTrDB uses the internal nodes to accelerate processing of these statistical aggregates over arbitrary time ranges.
It's the same meme - a stack of pancakes as a visual pun on "software stack"/"full stack." Not the only place I've seen that before, either - I've seen the same idea on T-shirts and laptop stickers.
Funnily enough, I went to sequelpro.com/legal to find that while their software is under the MIT license, as opposed to the more restrictive GPL license used by BTrDB, the website specifically calls out the icon as property of the project and its use is disallowed in forks. So, it looks like they really would care more about the logo being ripped off more than anything.
I get that there are huge timeseries based data flows going round the world (transactions, metrics, weather, self driving telemetry) and sticking those into something is useful.
I understood the world has settled on "use hdfs to store the data, use apache spark to run over those files and reduce your query data set till it is small enough to fit into pandas and then process"
This looks like it is trying to cut out the first two step?