Hacker News new | past | comments | ask | show | jobs | submit login

So where are we on Data Lakes vs NewSQL [1].

[1]: https://en.wikipedia.org/wiki/NewSQL




Most “NewSQL” databases are designed for OLTP use cases (i.e. many small queries that do little aggregation). Data Lakes are optimized for OLAP (i.e. doing a smaller amount of queries, but aggregating over large amounts of data).

As an example, Athena would do a terrible job at finding a specific user by its ID, while Spanner would behave just as poorly at calculating the cumulative sales of all products for a given category, grouped by store location (assuming many millions of rows representing sales).

Hope this analogy makes sense.


I think you're selling some of these "NewSQL" DB's short, TiDB/TiKV for example appears (I haven't personally used it yet) capable of supporting both OLTP and OLAP workloads due to some clever engineering and data structures behind the scenes.


TiDB relies on Spark to do analysis, using their TiSpark integration package. It's not built into the database but offers a smoother install than operating a Spark cluster separately.

The only "newsql" database that truly does OLAP+OLTP (now called HTAP) well is MemSQL with it's in-memory rowstores and disk-based columnstores.


(I'm a dev of TiDB so I might be biased.) Yes and no. The yes part is that TiDB still rely on TiSpark for large join query as well as bridging big-data world. TiDB itself cannot shuffle data like MPP database yet. On the other hand, TiDB without TiSpark is still comfortable of those dimensional aggregation queries (which are typical analytical queries as well). The no part is, TiDB now has a columnar engine (TiFlash) for analytics and providing workload isolation. TiFlash can keep up to date (latest and consistent data to be more specific) with row store in real-time in separated nodes via raft. IMO, HTAP should be TP and AP at the same time instead of just "TP or AP you choose one". In such cases, workload interference is real deal. Especially when you are talking about transactions for banking instead of streaming in logs. In such sense, very few, if any, "newsql" systems achieved what I considered true HTAP. For more details: https://pingcap.com/blog/delivering-real-time-analytics-and-...

Welcome to try it in March with TiDB 3.1.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: