They literally explained in the article why they have a data lake instead of just a data warehouse: their data model means it's slow and expensive to ingest that data into the warehouse from Postgres. The data lake is serving the same functions that the data warehouse did, but now that the volume of data has exceeded what the warehouse can handle, the data lake fills that gap.
I wrote another comment about why you'd need this in the first place:
Frankly the argument "they shouldn't need to query the data in their system" is kind of silly. If you don't want your data processed for the features and services the company offers, don't use them.
> Frankly the argument "they shouldn't need to query the data in their system" is kind of silly.
Neutral party here: that's not what they said.
A) Quotes shouldn't be there.
B) Heuristic I've started applying to my comments: if I'm tempted to "quote" something that isn't a quote, it means I don't fully understand what they mean and should ask a question. This dovetails nicely with the spirit of HN's "come with curiosity"
It is disquieting because:
A) This are very much ill-defined terms (what, exactly, is data lake, vs. data warehouse, vs. database?), and as far as I've had to understand this stuff, and a quick spot check of Google shows, it's about making it so you're accumulating more data in one place.
B) This is antithetical to a consumer's desired approach to data, which will described parodically as: stored individually, on one computer, behind 3 locked doors and 20 layers of encryption.
I wrote another comment about why you'd need this in the first place:
https://news.ycombinator.com/item?id=40961622
Frankly the argument "they shouldn't need to query the data in their system" is kind of silly. If you don't want your data processed for the features and services the company offers, don't use them.