Hacker News new | past | comments | ask | show | jobs | submit | breadchris's comments login

ClickHouse is awesome, but as the post shows, some code is involved in getting the data there.

I have been working on Scratchdata [1], which makes it easy to try out a column database to optimize aggregation queries (avg, sum, max). We have helped people [2] take their Postgres with 1 billion rows of information (1.5 TB) and significantly reduce their real-time data analysis query time. Because their data was stored more efficiently, they saved on their storage bill.

You can send data as a curl request and it will get batch-processed and flattened into ClickHouse:

curl -X POST "http://app.scratchdata.com/api/data/insert/your_table?api_ke..." --data '{"user": "alice", "event": "click"}'

The founder, Jay, is super nice and just wants to help people save time and money. If you give us a ring, he or I will personally help you [3].

[1] https://www.scratchdb.com/ [2] https://www.scratchdb.com/blog/embeddables/ [3] https://q29ksuefpvm.typeform.com/to/baKR3j0p?typeform-source...


My first big win for clickhouse was replacing a 1.2tb, billion + row postgresql DB with clickhouse. It was static data with occasional full replacement loads. We got the DB down to ~ 60GB, with query speeds about 45x faster.

Now, the postgres schema wasn't ideal, and we could have saved ~ 3x on it with corresponding speed increases for queries with a refactor similar to the clickhouse schema, but that wasn't really enough to move the needle to near real-time queries.

Ultimately, the entire clickhouse DB was smaller than the original postgres primary key index. The index was too big to fit in memory on an affordable machine, so it's pretty obvious where the performance is coming from.


This is a nice illustration of the effects of different choices for storage layout and use of compute. ClickHouse blows away single-threaded queries on row-based data for analytic questions. On the other hand PostgreSQL can offer far higher throughput and concurrency when updating a shopping cart.


love it. It also makes me sad there arent more technical blogs. I have been building an offline-first writing/publishing platform built on top of hugo. https://justshare.io


hadnt heard of this, thanks for sharing!


sure thing!


yes, this. it feels pretty magical to immediately find people across the Internet who have most likely "gone deep" on something I also have. it is surprisingly hard to connect with people like this in 2024.


I like the comparison to writing. In order to be a better writer, you must read works that are well written. The most common way that I read code are by traversing my source tree into the code I depend on (stdlib/third-party). When dependencies are compiled in some way, reading the code is not possible. It makes learning different code patterns impossible and causes a lot of "reinventing of the wheel"


> I don't understand what the problem is here. Every installation of Node comes bundled with npm. If it doesn't, that is a package maintenance problem.

Node and npm are two commands, and when you go to find packages, you will see people telling you to use pnpm, yarn, or npm. I would expect one tool to do this for me, especially for the most popular language in the world.

> This feels like a massive antipattern. Why is this lauded as a "feature"? Why do I want my build system to automatically reach out to the Internet and download random code without an explicit request, like "npm install"?

https://chat.openai.com/share/8bd82c15-c939-4e82-aad8-086995...

> This is even more antipattern-ish when you consider that Go dependencies are just repoed on GitHub (or possibly on some random git server) instead of a centralized and moderated registry like npmjs.org.

I find this to lend itself to a more decentralized future. I see notable projects owning their code and distributing it positively. You still need the source code for something to run at the end of the day. If you are worried about the code continuing to be there, that is the purpose of a proxy cache, which makes it very easy: https://proxy.golang.org/. Also, the code is distributed on github. So, if github working is a concern, we probably have much bigger problems.

> So it's now considered harmful to have multiple implementations of an open standard, compared to the exclusively Google-developed Go runtime? This sounds akin to arguing for a monopoly over a competitive market with consumer choice.

A hammer looks like a hammer because that is the most effective way to hit a nail. Since I am "building" code, I want my tools to feel as reliable as a hammer. I will not argue that Go is the best language ever invented; I see it as the most accessible language to make things happen fast and reliably until a better one emerges. When that happens, AI-generated refactoring tools will be so good, and Go code is so quickly parseable that I will let it loose in my Go code bases to refactor them into that language.


Hammer is not screwdriver. you want screwdriver and hammer to blend into one tool.

Does your hammer has built-in car or drone to bring nails from store? No. that's why some people think that it's reasonable to split programs that have different modes if operations.

Your choice, but note that it's not universally accepted true or demand.


So on one hand, you're saying decentralization is a good thing, but you're also saying that a centralized proxy solves all these problems?

Also, a proxy and a package registry are not the same thing.


if I were a beginner developer, I now have to have the tribal knowledge of the difference between .js and .mjs. I don't see anyone widely using .mjs to write their code either.


bazel is pretty cool, I have seen it work at Uber for the go monorepo at impressive scale (and of course it works for google). When I need to scale up the build process this will be the tool i reach for, but for starting out it is another technology that someone would have to learn.


Bazel is one of those tools that either someone teaches you or you need a PhD to figure it out. Which is massively frustrating.

The Opensource rulesets are also not fantastic in my opinion compared to the Google ones, so most peoples first impression of the tool is sub-par.


I think the issue with Bazel may be that it works extremely well at Google where for 80% of the code at the company, building is just a completely solved problem, with amazing tooling integration, and it's glorious.

Whereas in the open source there is much more manual setup to get it working smoothly. And the manual setup is much easier in the tool your ecosystem already knows.

So it may not actually be a wonderful system in and of itself (outside of the Google monorepo). My comment was mainly about the principle rather than an endorsement to adopt Bazel!


an interesting project. go contains many source artifacts which make decompilation a bit more straight forward as well. I havent seen anyone really attempt this for go, but would be notable research


If it turns out that its easier for a language model to translate "Ghidra C" into readable Go code than to deal with CMake/Bazel/GNU autoconf/Ninja/Apache Meson/etc I wonder if that says more about the language model or the state of C/C++ toolchains...


have you ever used tinygo? I have been curious how much that project gets used. It seems to me that rust is probably going to be the language of choice some point in the future.


I looked at that recently for a project I'm working on, but walked away when I found that important parts of the net package are pretty much nonexistent on ESP32.

You know, like net/http, for example...

I might have misread the docs, but somehow I doubt it.


The Go standard library's net/http package doesn't yet compile due to some dependency issues, but tinygo provides its own net/http package to stand in as a replacement[1].

[1] https://pkg.go.dev/tinygo.org/x/drivers/net/http


yeah it looks like you are right: https://arc.net/l/quote/veycnsqt


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: