I completely disagree that Java is well suited for distributed systems. Several ...

agibsonccc · on Aug 14, 2016

Tell that to facebook,twitter,linkedin, most big banks? The JVM (whether you like it or not) powers most of the bigger database systems in the world.

New code is still being written for the jvm. Lightbend and pivotal are also companies setting up large systems on the jvm. Seems to work fine for them. What are the alternatives? go?

Your point re: scala. Scala is STILL based on java. There isn't much of a difference for speed here. Akka and its ilk still rely on netty for the underlying transport mechanism (written in java) I'm very much well aware of what's written in scala. That includes spark and kafka among other things which is great. Those STILL rely on java libraries though.

Go is still slower yet: https://www.techempower.com/blog/2016/02/25/framework-benchm...

The other might be what? erlang? Good luck finding developers. The JVM is still the only platform that has not only big data mind share but things like microservices frameworks like lagom and spring boot while ALSO having things like message queues being written for it.

im_down_w_otp · on Aug 14, 2016

Overwhelmingly databases are not written in Java. Most of them are written in C and C++.

Secondly, I'm not denying that lot's of systems are built on Java. But they're built on Java because those places already have super heavy investment in Java ecosystem libraries, tools, and engineering resources. Not because Java or the JVM is somehow intrinsically awesome for building distributed databases or distributed systems things.

In fact it's quite the contrary. It's an epic pain in the ass to keep heap growth and GC latency under control in most of these systems. The former being critical for node stability and the latter being critical for dead-node detection consistency (among other things).

I used to be a distributed database engineer, and I count among my friends many people who still work at Elastic.co, Confluent, MapR, Cloudera, Twitter, and Couchbase. I currently work for a big FinTech company building core distributed transaction processing infrastructure, and I am 100% certain that compelling the use of Java for certain systems development tasks has nothing to do with it as a piece of technology and 100% to due with politics.

agibsonccc · on Aug 14, 2016

A lot of fintech stuff (and google's stuff) is written in c++. MapR's hadoop distribution is also c++.

If you look at a lot of the nosql databases you could list off the top of your head including: presto, cassandra, and hbase, those are all java.

Newer ones like kudu as well as some of the older ones like mysql, postgres etc are def written in c.

I wouldn't say it's complete politics there. Maintainability is a big factor in writing systems code that lasts and you can rely on. The JVM isn't great for everything but there's teams at say: twitter who's sole job is GC management and tuning. I know cloudera and co also have these people on staff. They (as well as us) know the JVM isn't ideal for every use case. Eg: I personally do a lot of GPU stuff. I wouldn't use java code for that (we use JNI/c++ for that)

If I had to argue for the jvm, I'd say with the right tuning it's reliable enough and you can also hire for it. There's trade offs of security and reliability when you start thinking about OTHER parts of this besides latency and speed. It's harder to screw up java code than c code..and a lot of people are pretty familiar with the internals (eg: off heap, unsafe)

The flink project is a great example of this. They took java and wrote their own memory manager allowing them to keep the java integrations but not deal with the GC. Many jvm based distributed systems have started working around this stuff now.

im_down_w_otp · on Aug 14, 2016

I really don't understand what you're arguing here. It's undisputed that several distributed systems are written in Java. What I'm disputing is the idea that Java is used to build these systems because Java is particularly well-suited to building distributed systems.

It isn't. I don't doubt that there are hordes of people whose entire responsibility it is to try to work around Java's rough, eclectic edges when it comes to systems engineering, but the fact that horde of people exists at all is one, among many, symptomatic indicators that Java itself is not intrinsically a particularly good fit for that problem space.

You've inverted the cause and effect.

Java is very popular, and Java is used in a lot of businesses, and some of those businesses have/had needs for different shapes of distributed systems, and those businesses have/had an easier time finding Java engineering resources, and those engineering resources built some distributed systems, and so now some distributed systems are built in Java or otherwise on the JVM.

There are even cases (Storm and Spark come to mind) where the selection of the JVM had seemingly as much to do with where those solutions were trying to position themselves in the larger ecosystem (augment Hadoop or eventually supplant Hadoop, respectively) as it did the technical merits of Java or the JVM itself.

Also, being able to hire for a thing is almost always a political issue. Like your prior comment, "The other might be what? erlang? Good luck finding developers." It's really easy to find Erlang developers. You just have to be willing to pay for them. Additionally, the proportion of Erlang developers who have a lot of experience building distributed, fault-tolerant systems is almost certainly higher than that of the proportion of Java engineers who can say the same. You don't even bother learning Erlang unless you're building distributed systems of some kind. Java is used for everything. That's just simple selection-bias. The hard part about using Erlang is getting management to sign off on using it instead of Java or some Silicon Valley darling language du jour. Anyway, hiring considerations aren't an issue of technical design or implementation merit. They're political considerations.

If you subtract the set of open source distributed systems that were born inside companies with large pre-existing JVM investments already (introducing a different kind of selection-bias), like Hadoop (Yahoo) and Cassandra (Facebook) and Kafka (LinkedIn), and instead look at the ecosystem of distributed systems that were built up as standalone efforts outside any major corporate engineering org, like MapR, Riak, Aerospike, ScyllaDB, RabbitMQ, RethinkDB, etc. you immediately see a much different set of technology choices. Overwhelmingly these "outsider" distributed systems are written in Erlang, C/C++, and Go. The Java cohort is a limited minority in that space.