Hacker News new | past | comments | ask | show | jobs | submit login

Indeed, in the not too distant future we expect to move to spinning disks, with the data arranged for streaming reads. Streaming off disk can be pretty fast. LZ4 compression is fast enough to give us another big boost without turning CPU into a bottleneck. But the big enabler will be spreading data across disks, so that in principle we can use every spindle we own in service of every nontrivial query. The fact that server logs have a very high write-to-read ratio really helps us out here.

Today, we're keeping everything on SSD -- twice, in fact, to maintain a hot spare for each server. SSD prices have fallen to the point where, even renting from Amazon, we can make the cost model work; and SSD is a great crutch as we tune our storage layer. But spinning disk will be a lot cheaper.

As for Java: we've been pretty surprised ourselves. Going in, we expected to need a lot more "native" (C) code. As it turned out, currently the only non-Java code we're using is the LZ4 library, and even there we only recently moved to the non-Java implementation. We do soon expect to move a few more bits of code to C, such as the core text-search loop. But based on the performance we've been getting in pure Java, we don't anticipate any need to move more than a fraction of 1% of our overall code base.

We do have some code that lives in a sort of middle ground -- Java code written like it was in C. Our database buffers live in a set of 2GB byte arrays, soon to be replaced by native memory blocks. We do our own (very simple) memory allocation within that, and use a sprinkling of Unsafe operations to access it. This gets us closer to C performance (not all the way there), without any of the hassles of bridging across languages. This part is still well under 1% of our overall codebase.




Amazing how much work 'simple' is :-)

I am reminded of John Carmack's comment on Oculus - he was amazed that the hardware had the power but the headsets performed badly - then he looked at the code and saw seas of abstractions on abstractions.

Good luck stripping out the layers guys !


Has he written anything (or talked) about this experience or was it just a brief comment? I'd love to read it if he did.


I remember reading "Latency Mitigation Strategies", by Carmack. It doesn't mention the Oculus except in an acknowledgment at the end, but it might be about the same thing lifeisstillgood is talking about:

http://www.altdevblogaday.com/2013/02/22/latency-mitigation-...

HN discussion: https://news.ycombinator.com/item?id=5265513


I just remember a documentary / interview - iirc a five minute interview at a trade show. It stuck in my head. I am afraid I cannot find the link now but trade show / two people sitting down, a demo of oculus rift then comments.


Interesting. That must be why C/C++ is always the go-to solution.

Instead of thinking "too many abstractions are making this code slow, therefore let's get rid of the abstractions" I usually had rather pick a better language where abstractions have little or no penalty (Haskell, Scheme, etc).


Maybe we are talking about different abstractions. Perhaps libraries? I reimplemented (probably badly) session handling code in wsgi (python) - because the Django code was over 2000 lines long and used other libraries and I did not understand it - especially when all I wanted was to store a 128bit number in a client cookie and then look it up when I saw it again.

So the idea is simple (cookie sessions) but the different ways of implementing it can hold complexity, errors and abstractions.

There is nothing stopping the same happening with Haskell - I can I am sure write terrible code even in the best languages (see my entire output for proof :-)


Yeah, but the abstractions have a penalty on your mind.


They do? In my experience, lack of abstraction (often due to not analyzing the issue at hand thoroughly) results in non-abstract, verbose, hard to understand and refactor code. It's the difference between, say, building an SQL query by appending strings to a buffer (move one line and everything blows up) and building a model of your query (projection, etc...). Sure, the abstraction means more code, but it's much more easier to manipulate and considerably less risk-prone. It won't be faster than doing it the other way, but it won't be necessarily measurably slower.


Euhm abstractions in Haskell carry lots of penalties. Haskell is generally a language that only makes sense if you buy the "sufficiently smart compiler" argument. Haskell's abstractions shouldn't carry a penalty, because the compiler compiles them out when it recognizes them.

That's cute but while it's impressive what it recognizes, it's generally still stupid, and it will get beaten by bad programmers (especially by bad programmers. Becoming good at Haskell means, amongst other things, learning what the compiler will screw up).

Scheme, likewise, doesn't have free abstractions. Unless you mean macros, but those are not really free either imho.

There's one high-level language in wide use that has "free" abstractions, or at least, costs as low as possible, and that's C++.


netty's buffer management libraries do a good job at what you are looking for. Having an instance of the pooled allocator that is configured to prefer "direct" i.e. off heap memory might just be the thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: