Hacker News new | past | comments | ask | show | jobs | submit login

You have 6 TiB of ram?



If my business depended on it? I can click a few buttons and have a 8TiB Supermicro server on my doorstep in a few days if I wanted to colo that. EC2 High Memory instances offer 3, 6, 9, 12, 18, and 24 TiB of memory in an instance if that's the kind of service you want. Azure Mv2 also does 2850 - 11400GiB.

So yes, if need to be, I have 6 TiB of RAM.


You can have 8TB RAM in a 2U box for under 100K. grab a couple and it will save you millions a year compared to over-engineered bigdata setup.


Bigquery and snowflake are software. They come with a sql engine, data governance, integration with your ldap, auditing. Loading data into snowflake isn't overegineering. What you described is over-engineering.

No business is passing 6tb data around on their laptops.


So is ClickHouse your point being ? Please point out what a server being able to have 8TB of RAM has to do with laptops.


I wonder how much this costs: https://www.ibm.com/products/power-e1080

And how that price would compare to the equivalent big data solution in the cloud.


1U box too.





I personally don't but our computer cluster at work as around 50,000 CPU cores. I can request specific configurations through LSF and there are at least 100 machines with over 4TB RAM and that was 3 years ago. By now there are probably machines with more than that. Those machines are usually reserved for specific tasks that I don't do but if I really needed it I could get approval.


You don’t need that much ram to use mmap(2)


To be fair, mmap doesn't put your data in RAM, it presents it as though it was in RAM and has the OS deal with whether or not it actually is.


Right, which is why you can mmap way more data than you have ram, and treat it as though you do have that much ram.

It’ll be slower, perhaps by a lot, but most “big data” stuff is already so god damned slow that mmap probably still beats it, while being immeasurably simpler and cheaper.


Really depends on the shape of the data. mmap can be suboptimal in many cases.

For CSV it flat out doesn't matter what you do since the format is so inefficient and needs to be read start to finish, but something like parquet probably benefits from explicit read syscalls, since it's block based and highly structured, where you can predict the read patterns much better than the kernel can.


We are decomming our 5-year old 4TB systems this year and could have been ordered with more


The "(multiple times)" part probably means batching or streaming.

But yeah, they might have that much RAM. At a rather small company I was at we had a third of it in the virtualisation cluster. I routinely put customer databases in the hundreds of gigabytes into RAM to do bug triage and fixing.


Indeed, what I meant to say is that you can load it in multiple batches. However, now thinking, I did play around with servers of TiBs of memory :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: