Hacker News new | past | comments | ask | show | jobs | submit login
The JVM is not that heavy (opensourcery.co.za)
549 points by khy on Feb 1, 2017 | hide | past | favorite | 367 comments



Also worth noting that the JVM itself only weighs a couple of megabytes. The bulk of the size comes from the Java runtime (ie: the "standard libraries"), and there are lots of things that your app may not need there (XML parsing, serialization, etc...)

A couple of years ago I wrote a simple tool (https://github.com/aerofs/openjdk-trim) that allows you to filter out what you don't need. We were able to get the size of OpenJDK from 100MB down to around 10MB.

Note that the work of determining which classes you need is entirely manual. In our case I used strace to check what classes where being loaded.


This is officially supported in JDK 9. It's part of an ongoing project called "jigsaw" which is introducing modules to the JVM. The first step, shipping in 9, is to modularize the JDK [0].

Already today you can build a custom JDK in the early access release.

[0] http://openjdk.java.net/jeps/200


Not that well. The java.base module is still huge. Also, Zulu has nice pre-packaged JDK9 downloads[0] so you don't have to build.

0 - http://zulu.org/zulu-9-pre-release-downloads/


I think jlink can strip at the class level, not just the module level.


> I wrote a simple tool that allows you to filter out what you don't need

It'd be a short hop from here to a tool that basically does for JDK-platform apps what Erlang's releases do for the ERTS platform: builds a new JRE (as a portable executable, not an installer) that actually contains the app and its deps in the JRE's stdlib, such that you just end up with a dir containing "a JRE", plus a runtime "boot config" file that tells the JRE what class it should run when you run the JRE executable.

With such a setup, your Java program could actually ship as an executable binary, rather than a jar and an instruction to install Java. Nobody would have to know Java's involved! :)


Funny you mention that, that's exactly why we wrote this tool. We were having a ton of support issues with our Windows users having to install Java, incompatible Java versions, needing admin privileges to install the JRE, etc...

We fixed this whole class of issues by doing exactly what you suggest: bundling the JRE and writing our own launcher binary.


Does java license interfere with this kind of setup?


not when you bundle the openjdk


https://docs.oracle.com/javase/8/docs/technotes/guides/deplo...

Not only is it a short hop, it already exists :P


Nice. Here's the manual page, the relevant flag is "-native":

http://docs.oracle.com/javase/8/docs/technotes/tools/unix/ja...


The problem with visiting that site is that Oracle have started litigating against users of their JVMs.

In that way the JVM can be "heavy".


With users that use commercial features without paying for them.

It is quite easy to know which features those are when they require a flag named -XX:+UnlockCommercialFeatures, you just don't use them by mistake.


Oracle gives free license to use commercial features in development[1] in section B. It is very easy to some script for JVM startup parameters and get that script deployed in production by mistake.

1. http://www.oracle.com/technetwork/java/javase/terms/license/...


Quite right, but unless I am too cynical I doubt those deployments were actually done by mistake, when I relate how many companies try to "alleviate" their costs.


Avian [0] is an embeddable VM in the way you suggest, I haven't used it though.

In the future, we'll hopefully have the Substrate VM [1] for Java and other Truffle-supported languages. It's embeddable and also does reachability analysis to exclude unused library code. For now, it seems to be closed source.

[0] http://readytalk.github.io/avian/

[1] http://lafo.ssw.uni-linz.ac.at/papers/2015_CGO_Graal.pdf


> what Erlang's releases do for the ERTS platform

That's the difference between an industrial-strength platform like Erlang, and a dev-centric deployment nightmare like Python. Java is normally on the enterprise side of the spectrum, but unfortunately it didn't get deployment right for quite a long time, even though it appears it's getting there lately.


It started on Java 8 with adoption of having a packing story as part of the JDK tooling, but I don't know how good it is.


Careful there---it's a slippery slope all the way to unikernels.

https://mirage.io/


Isn't that _exactly_ what GoLang does for its binaries, insofar as shipping them statically linked?


dotnetcore has it as an option, as well.


It's called "treeshaking", and is a classic option in Lisp delivery tools.


Or virtual image pruning in Smalltalk tools.


.NET Core does have this and allows distributing the runtime, your classes as a standalone executable. It is obviously larger than just the class files and dlls but works great for portability.


Proguard is de-facto standard for Android Java ecosystem, what's the difference OpenJDK-Trim has? Except that Proguard determines used/unused classes automatically, by doing static code analysis, so you need to have proguard config to omit removing stuff that is used dynamically, of course.


We were already using Proguard for obfuscation and we tried to use it to reduce the size of the JDK as well. If my memory serves well the main problem was that the results weren't good enough results because Proguard was being conservative and keeping a lot of stuff that we know we didn't need, but static analysis indicated that it could potentially be used.

Also, Proguard's config is pretty complicated and the results are hard to understand. Our approach (openjdk-trim) is dumb simple: unpack the java runtime jars, use rsync to filter out entire directories we don't need, pack it back.

It's a simple, brute-force approach compared to Proguard's advanced static analysis, but in this case it gives better results. Maybe a good example of "worse is better".


That is why i am looking forward and VERY excited for TruffleRuby, SubstrateVM, Graal, along with C Extention.

I think once there is official way to trim JDK down, making deployment super easy and fast ( Single executable File ), Java will pick up stream again.

The only problem and hesitation we have...is Oracle.


Not to sound rude, but the expression you were aiming to use is maybe "pick up steam". Random example: http://idioms.thefreedictionary.com/pick+up+steam


OT, but this expression seems like a corruption of 'build up steam' and 'pick up speed'. 'Pick up steam' doesn't really make sense in original context- something a steam train would do while stationary and preparing to move :)


-Xverbose:class JVM arg could also have sufficed in case you don't want to use strace.


Would it not be possible to scan an app to determine exactly what libraries it could load? I think you can avoid halting problem issues by simply doing a dumb search for the library load routines in the code, at the possible (but not very likely) problem of picking up spurious data in a binary blob section.


There are tools that do this (for example, proguard). The issue is that java can dynamically load code, so static analysis isn't sufficient. Typically for areas where code size matters (like on Android), you use static analysis plus a manual list.


It would be great to have a runtime for JRuby that's really stripped down to just the necessities. The spin-up time of the jruby environment is painful and it seems predominantly the fault of the JVM's baggage.


I don't know. 10MB still sounds really too big.

Why is it so big? What do we gain?

I don't have a system with 10MB of cache, so I imagine Java can't run faster than memory...


Huh? What?

Several points:

* Who said the 10MB is all used at once?

* I don't know your hardware, but there is very, very good chance you are actually quite wrong about <10MB of cache. These days, most magnetic disks have more cache than that, and if you are using SSD's, there's a boatload more cache than that in there.

* If you were referring strictly to CPU cache, then I'm even more confused, because the entire existence of that stuff is predicated on it being faster than memory, so... (and even still, if your total CPU cache isn't 10MB, it likely isn't that much smaller).

* It's not like the whole package would sit in RAM the whole time anyway. By your same assertion, I could say that one of my CPU registers is only 64-bits wide, so I imagine all programs larger than 64-bits can't run faster than L3 cache...

I'm not sure why you'd say it is too big. The article page is 1.4 MB alone... and it still needs to leverage a general purpose runtime/JIT that is orders of magnitude larger to do its single fixed purpose.


> Who said the 10MB is all used at once?

The parent was suggesting that this was all that was actually needed out of the 100mb or so downloadable. If you think the JVM is smaller, how small is it exactly?

> If you were referring strictly to CPU cache, then I'm even more confused, because the entire existence of that stuff is predicated on it being faster than memory, so... (and even still, if your total CPU cache isn't 10MB, it likely isn't that much smaller).

http://www.intel.co.uk/content/www/uk/en/processors/core/cor...

I don't have anything with 10MB cache.

> It's not like the whole package would sit in RAM the whole time anyway. By your same assertion, I could say that one of my CPU registers is only 64-bits wide, so I imagine all programs larger than 64-bits can't run faster than L3 cache...

If you get into L1, you get about 1000x faster.

http://tech.marksblogg.com/benchmarks.html

> I'm not sure why you'd say it is too big.

Maybe I have a different perspective? If a 600kb runtime is 1000x faster, I want to know what I get by being 10x bigger. I'm quite surprised that there are so many responders defending it given that these benchmarks were just on Hacker News a few days ago.


Unless you linearly scan the whole binary all the time, your CPU makes sure that only the stuff you're currently using is in the cache, so only the data your hot loop is touching.

You could easily see that your assumption is wrong by observing that a typical C application is not 1000 times faster than a typical Java application.


> Unless you linearly scan the whole binary all the time, your CPU makes sure that only the stuff you're currently using is in the cache, so only the data your hot loop is touching.

Cache fills optimize for linear scans, and have nothing to do with eviction.

> You could easily see that your assumption is wrong by observing that a typical C application is not 1000 times faster than a typical Java application.

What assumption are you talking about?

Where do you find your typical applications? Spark is supposed to be one the fastest Java-implementations of a database system, and it's 1000x slower than the fastest C-implementation database systems, but this is clearly a problem limited by memory.

What about problems that are just CPU-bound? C is at least 3x faster than Java for those[1], so just by being "a little bit faster" (if 3x is a "little" faster) then as soon as we introduce latency (like memory, or network, or disk, and so on) this problem magnifies quickly.

[1]: https://benchmarksgame.alioth.debian.org/u64q/compare.php?la...


> Spark is supposed to be one the fastest Java-implementations of a database system, and it's 1000x slower than the fastest C-implementation database systems, but this is clearly a problem limited by memory.

Wow.. so much wrong, I'm not sure how to unpack it all.

a) Spark is Scala, not Java, though both do use the JVM, so I'll give you that.

b) Spark is not a database system, though it is a framework for manipulating data

c) Spark is generally considered to be much faster than Hadoop, and does it's job well, but I'm not sure it qualifies as the fastest anything.

d) By any reasonable interpretation, the fastest Java database system is definitely not Spark. You will find that benchmarks of Java database systems generally don't even include Spark (as an example https://github.com/lmdbjava/benchmarks/blob/master/results/2...)

e) Fast is an ambiguous term... usually you are looking at things like latency, throughput, efficiency, etc. I'm not sure which you mean here.

f) If you know anything at all about runtimes, you'd know that if you've found a Java based system that is 1000x slower than a C based system, either your benchmark is extremely specialized, broken, or you are comparing apples & oranges.

Look, Java certainly has some overhead to it, and sometimes it significantly impacts performance. Before you get too excited about attributing it to runtime size, you might want to look at the size of glibc...


> By any reasonable interpretation, the fastest Java database system is definitely not Spark

What database would you recommend for solving the taxi problem using the JVM?

> Spark is Scala, not Java, though both do use the JVM, so I'll give you that.

What does JVM stand for? I was under the impression that we were talking about it's size (10mb v. 100mb).

> You will find that benchmarks of Java database systems generally don't even include Spark

And? What are we talking about here?

> If you know anything at all about runtimes, you'd know that if you've found a Java based system that is 1000x slower than a C based system, either your benchmark is extremely specialized, broken, or you are comparing apples & oranges.

Why?

We're talking about business problems, not about microbenchmarks.

If this is a business problem, and I solve it in 1/1000th the time, for roughly the same cost, then what exactly is your complaint?

> Fast is an ambiguous term... usually you are looking at things like latency, throughput, efficiency, etc. I'm not sure which you mean here.

It's not ambiguous. I'm pointing to the timings for a specific, and realistic business problem.

> Look, Java certainly has some overhead to it, and sometimes it significantly impacts performance. Before you get too excited about attributing it to runtime size, you might want to look at the size of glibc...

Does Java include glibc?

What exactly is your point here?


> What database would you recommend for solving the taxi problem using the JVM?

You have me at a disadvantage here... The only taxi problem that comes to mind is a probability problem that I'd not likely use a database for at all...

> If this is a business problem, and I solve it in 1/1000th the time, for roughly the same cost, then what exactly is your complaint?

If you came to the conclusion that your business problem runs 1000x faster because of differences in the runtime... you've made a mistake. It is far more likely your benchmark is flawed, or there are significant differences in the compared solutions beyond just the runtimes.

Seriously, I've spent a career dealing with situations exactly like that: "hey, this is 1000x slower than what we are doing before... can you fix that?". Once you are dealing with optimized runtimes, while there can be important differences between them, there just isn't that much room left for improvement.

> It's not ambiguous. I'm pointing to the timings for a specific, and realistic business problem.

The problem is perhaps not ambiguous to you, but you haven't described it in terribly specific terms. More importantly though, you haven't described what you mean by "faster"? That's the ambiguity.

> Does Java include glibc?

> What exactly is your point here?

C programs do. Lots of very efficient, high performance C programs.


> The only taxi problem that comes to mind

It's the problem that I linked to previously.

http://tech.marksblogg.com/benchmarks.html

Finding good benchmarks is hard: Business problems are a good one because these are the ways experts will solve problems using these tools, and we can discuss the choice of tooling, whether this is the right way to solve the problem, and even what the best tools for this problem is -- in this case, GPU beats CPU, but what's amazing is just how close a CPU-powered solution gets by turning it into a memory-streaming problem (which the GPU needs to do anyway).

> If you came to the conclusion that your business problem runs 1000x faster because of differences in the runtime...

I haven't come to any conclusion.

There are a lot of differences between a JVM-powered business solution and a KDB-powered business solution, however one striking difference is the cache-effect.

However the question remains: What exactly do we get by having a big runtime? That we get to write loops?


> what's amazing is just how close a CPU-powered solution gets by turning it into a memory-streaming problem (which the GPU needs to do anyway).

Yes, it turns out the algorithmic approach you use to solve the problem tends to dwarf other factors.

> There are a lot of differences between a JVM-powered business solution and a KDB-powered business solution, however one striking difference is the cache-effect.

Wait, you looked at those benchmarks and came to the conclusion that the language runtimes were the key to the differences?

> However the question remains: What exactly do we get by having a big runtime? That we get to write loops?

There is absolutely no intrinsic value in a big runtime.

Now, one can trivially make a <1KB read-eval-print runtime. So I'll answer your question with a question: why do people not use <1KB runtimes?


> Wait, you looked at those benchmarks and came to the conclusion that the language runtimes were the key to the differences?

At the risk of repeating myself: I don't have any conclusions.

> There is absolutely no intrinsic value in a big runtime.

And yet there is cost. It is unclear if that cost is a factor.

> Now, one can trivially make a <1KB read-eval-print runtime. So I'll answer your question with a question: why do people not use <1KB runtimes?

Because they are not useful.

We are looking at a business problem, think about the ways people can solve that problem, and cross-comparing the tooling used by those different solutions.

Is there really nothing to be gained here?

The memory-central approach clearly wins out so heavily (and the fact we can map-reduce across cores or machines as our problem gets bigger) is a huge advantage in the KDB-powered solution. It's also the obvious implementation for a KDB-powered solution.

Is this Spark-based solution not the typical way Spark is implemented?

Could a 10mb solution do the same if it can't get into L1? Is it worth trying to figure out how to make Spark work correctly if the JVM has a size limit? Is that a size limit?

There are a lot of questions here that require more experiments to answer, but one thing stands out to me: Why bother?

If I've got a faster tool, that encourages the correct approach, why should I bother trying to figure these things out? Or put perhaps more clearly: What do I gain with that 10mb?

That CUDA solution is exciting... There is stuff to think about there.


> At the risk of repeating myself: I don't have any conclusions.

For someone who doesn't have any conclusions, you're making a lot of assertions that don't jive with reality.

> And yet there is cost. It is unclear if that cost is a factor.

It's a factor... just not the factor you think it is.

> Because they are not useful.

I think you grokked it.

> The memory-central approach clearly wins out so heavily (and the fact we can map-reduce across cores or machines as our problem gets bigger) is a huge advantage in the KDB-powered solution. It's also the obvious implementation for a KDB-powered solution.

KDB is a great tool, but you are sadly mistaken if you think the trick to its success is the runtime. That its runtime is so small is impressive, and a reflection of its craftsmanship, but it isn't why it is efficient. For most data problems, the runtime is dwarfed by the data, so the efficiency that the runtime organizes and manipulates the data dominates other factors, like the size of the runtime. This should be obvious, as this is a central purpose of a database.

> There are a lot of questions here that require more experiments to answer, but one thing stands out to me: Why bother?

Yes, you almost certainly shouldn't bother.

Spark/Hadoop/etc. are intended for massively distributed compute jobs, where the runtime overhead on an individual machine is comparatively trivial to inefficiencies you might encounter from failing to orchestrate the work efficiently. They're designed to tolerate cheap heterogenous hardware that fails regularly, so they make a lot of trade-offs that hamper getting to anything resembling peak hardware efficiency. You're talking about a runtime fitting in L1, but these are distributed systems that orchestrate work over a network... Your compute might run in L1, but the orchestration sure as heck doesn't. Consequently, they're not terribly efficient for smaller jobs. There is a tendency for people to use them for tasks that are better addressed in other ways. It is unfortunate and frustrating.

Until you are dealing with such a problem, they're actually quite inefficient for the job... but that inefficiency is not a function of JVM.

Measuring the JVM's efficiency with Spark is like measuring C++'s efficiency with Firefox.

> If I've got a faster tool, that encourages the correct approach, why should I bother trying to figure these things out? Or put perhaps more clearly: What do I gain with that 10mb?

If you read the documentation, the gains should be clear. If you are asking the question, likely the gains are irrelevant to your problem. I would, however, caution you to worry less about the runtime size and more about the runtime efficiency. The two are often at best tenuously related.


If your assumption that a 10MB JVM kills the cache were true, then the alioth benchmarks you have posted wouldn't show a speed difference of ~3. I suggest you learn a bit more about how CPUs work and what benchmarks mean before posting bold claims.


Why not? Those problems fit into cache.


Because they are 333x slower than you'd expect.


> I don't have anything with 10MB cache.

The link you provided was to three distinct models of i7 processors... all with 8MB of L3 cache. I would argue that 8MB isn't much smaller than 10MB, but I will understand if you disagree. However, even the slowest of those processors also has 1MB of L2 cache and 256KB of L1 cache, not to mention other "cache-like" memory in the form of renamed registers, completion queues, etc. At most, we're talking <800KB shy of 10MB in cache.

> If you get into L1, you get about 1000x faster.

I think you are making my point for me.

> Maybe I have a different perspective? If a 600kb runtime is 1000x faster, I want to know what I get by being 10x bigger.

You are assuming that at all times all of that 10MB must be touched by the processor at once. You can have a 10MB runtime where most of the cycles are being spent on a hotspot <4KB of data.... Having a hot spot that is orders of magnitude smaller than the full runtime is totally unsurprising. It's particularly true when your runtime has a JIT in it. With a JIT, most of the time, the bytes that are being executed aren't part of that 10MB, but rather are generated by it. Are you going to penalize your 600KB runtime for the size of the source code? ;-)


10MB for a platform that allows you to run code on all three major operating systems without too much trouble and in a performant way is a huge win, in my opinion. Not many alternatives come close to that.


Actually all languages with a rich runtime and standard library that isn't just a thin POSIX layer like C or C++ (although C++ has been improving their library story).

Still your point holds.


Q/KDB is 600kb, also runs code on all three major operating systems (and a few minor ones). It's also about 1000x faster than Java/Spark[1].

1000x slower doesn't sound like a huge win to me; it sounds like a huge cost, so my question is what do we gain by making our programs 1000x slower?

[1]: http://tech.marksblogg.com/benchmarks.html


That benchmark is literally comparing Apples to Oranges. It's not even the same hardware.


You don't need the entire executable file in cache in order to run the program.

For comparison, a C++ wxWidgets 3.0 application isn't going to be much smaller than 10MB in release mode if you statically link it. Much as I hate to admit it, 10MB just isn't that big in an age of terabyte SSDs and systems with 32GB of RAM.


Not that the majority of users have that. Even eschewing those outside of wealthy countries, most users are on mobile devices (laptops, tablets, cell phones). Of those who have desktop PCs, very few have terabyte SSDs and even fewer have 32gb of RAM. For most people, RAM is probably somewhere between 4-8GB.


10MB fits comfortably within 4GB of RAM.


I don't think it's healthy to think of that 4GB module as "RAM".

It's connected to your CPU by a serial communications interface so access is not uniform or timely, and if the CPU needs any of it, it stops what it's doing while it waits.

The "cache ram" (L1 and to a lesser extent L2) actually acts like the RAM that we learn about in Knuth, so that when we discuss algorithms in terms of memory/time costs, this is the number we should be thinking about. Algorithms that are performant on disk/drum are modern solutions for what you're calling "RAM".


> I don't know. 10MB still sounds really too big.

Can't tell if sarcastic or... o_O.

In the unlikely case you're actually serious, you really need to rethink your perception of memory costs in 2017.


No, I'm really quite serious.

KDB[1] is about 1000x faster than Spark[2], and is only about 600kb (and most of that is shared library dynamic linker stuff that makes interfacing with the rest of the OS easier). A big part of why it's fast is because it's small -- once you're inside cache memory everything gets faster.

That's the real cost of memory in 2017. So what did we gain for paying it?

[1]: https://news.ycombinator.com/item?id=13481824

[2]: http://tech.marksblogg.com/billion-nyc-taxi-rides-spark-2-1-...


You're comparing completely, utterly different results here, and it's really hurting any point you're trying to make.

You're comparing KDB running on 4x Intel Xeon Phi 7210 CPUs, totaling 256 physical CPUs.

Compared to the best result for Java/Spark, which was running on 11x m3.xlarge instances on AWS. That's only 44 CPUs, plus it's running on AWS, not 100% dedicated hardware, so it's tough to tell what sort of an impact the virtualization + EBS has on performance. Plus, from the AWS page: "Each vCPU is a hyperthread of an Intel Xeon core except for T2 and m3.medium", which does not do anything good for the results.

Yes, technically, KDB was 199.80x faster (not 1000!) than Java/Spark, when it was given vastly superior, dedicated hardware without virtualization, and when tackling a problem that the hardware setup is optimized for. Note that the author calls this out by saying "This isn't dissimilar to using graphics cards" when talking about the setup he was using for the KDB benchmarks.

To get a sensible idea of the relative difference in performance, you would have to compare KDB and Java/Spark both running on the Xeon Phis, and/or running both on 11x m3.xlarge AWS instances - and even then, if Java/Spark does poorly on the Xeon Phi test, that might just mean that the Java/Spark developers haven't optimized for that particular setup.



> You're comparing completely, utterly different results here, and it's really hurting any point you're trying to make.

Then argue with the point you think I could be making instead of the point that you think I'm making[1]

[1]: http://philosophy.lander.edu/oriental/charity.html

> you would have to compare KDB and Java/Spark both running on the Xeon Phis, and/or running both on 11x m3.xlarge AWS instances - and even then, if Java/Spark does poorly on the Xeon Phi test...

If Spark can solve the business problem in less real-time in another way, I think that would be worth talking about, but it's my understanding that a bunch of mid/large machines connected to shared storage is the typical Spark deployment, and the hardware costs are similar to the Phi solution.

So my larger question still stands: What is the value in this approach, if it's not faster or cheaper?


If "this approach" is using Java/Spark, instead of something that is a smaller binary, then there are some easy answers to your questions:

- people don't want to write C (or K, or whatever yields a small binary)

- the cost of switching languages is not worth the speed-up

- it's already fast enough

I don't think you're wrong, overall, that, specifically, kdb can be much faster than an equivalently sized Spark cluster, but simply being faster does not invalidate other approaches, which is what you seem to be arguing for.


I'm not arguing for anything: I'm asking what do we get for this cost.

It sounds like you're suggesting we get:

* Not having to write in SQL (note KDB supports SQL92)

Maybe something else? I'm not sure I understand.


Locales? Timezones? Unicode? There's a lot of stuff that is there to be used from time to time, does not mean it hits your processor cache often.

BTW, libruby-2.3 is 2,5M, just the shared object file, and it tries to use all aforementioned stuff from the underlying UNIX.


Java8 has profiles already, Java9 will be able to customize further. I'm not a Java guy, but node.js after npm-install the suddenly increased base is hundreds MB easily, I somehow now feel Java/whatever is better organized and manageable. After learning node.js for a product for a few months, I'm actually returning to PHP7, which has nearly identical OOP as Java.


Couple of megabytes without a run-time is huge. What's hiding in there; a VirtualBox image with a Linux kernel + initrd? Or maybe a high definition splash screen in PNG form?


I led our teams to switch from Java to Go because of the productivity of development, but then noticed deployment was simpler and faster, memory usage was slashed (for comparable applications), request/response times were much more consistent, startup was practically instant and as a result we started aggressively rewriting Java applications to Go and saw a notable difference in the number of machines we needed to run in AWS.

So in my situation, the JVM is heavier by every single measure listed, and for each by a considerable margin.


> aggressively rewriting Java applications to Go and saw a notable difference in the number of machines we needed to run in AWS.

This is the easy trap to fall into though. What if you aggressively rewrote the Java apps from crappy legacy frameworks to well developed Java apps?

A rewrite ALMOST always is faster. So the new language seems faster. Except if you would then rewrite the rewrite back in the original language... you could even still be faster.

Very hard to split apart what is faster because the rewrite got rid of lots of bloat, and what is faster because it is legit faster. Java is legit fast when it is written well. Also very easy to make a bloat fest.


These apps are mostly microservices and the Java ones are mostly only a year or two old. None of them use things like spring. Some use Dropwizard. Would you consider dropwizard modern? If not, what would you use instead?


Take a look at the TechEmpower benchmarks:

https://www.techempower.com/benchmarks/

DropWizard is modern, but it isn't fast. Go and even Node.js are significantly faster. If you want performance, you cut layers out of the stack - check out the numbers for raw servlets or even just straight Jersey annotations in that benchmark. If I were doing JSON-over-HTTP microservices in Java, I'd likely use straight Jersey + Jackson, or if performance was really a problem, Boon over straight servlets.

What framework did your Go rewrite use? The standard libs?


Boon is not that fast. It only appears fast on some poorly constructed benchmarks due to some lazy benchmarketing optimizations.

https://github.com/fabienrenaud/java-json-benchmark


On first glance the dropwizard test app appears to be doomed to mediocrity via reliance on hibernate.

Call me crazy, but I like my dropwizard with Spring DI for (singleton) resource setup, a micro-ORM to get work done, and HikariCP datasources at runtime.


What's wrong with hibernate? The only thing I can think of is that you're not using "JOIN FETCH entity.relation" when accessing collections and end up with the N+1 select problem but that is because you're using any ORM incorrectly.

Entity framework has include and active record has includes which do the same thing. The qt ORM also has something similar.

The only ORM I have seen that lacks this critical feature is odb. It doesn't allow setting the fetching strategy on a per query basis. You have to either always use eager loading or lazy loading which basically makes it useless for my purposes.


Well, for benchmarking the essential framework, which does not mandate any ORM, I would want to use something for data access that takes the question of time spent on type reflection, internal caching, and the like, out of the picture. Hibernate and EMF have their place, but not as part of benchmarking the thing that hosts 'em. Core Dropwizard performance is all about how it uses Jetty, Jackson, and maps requests to resources and operations.


> DropWizard is modern, but it isn't fast. Go and even Node.js are significantly faster.

Any benchmarks to provide in order to support this wild claim?


The ones I just linked to above.


Use vertx if you want lean REST micro-services. I so wish vertx was part of the standard library.

The main advantages that Go has over Java is that the standard library is brilliant - thus obviating the need for folks to create monstrous frameworks (and losing performance) and that Go has better memory utilization because of value types (structs) and because it is AOT compiled. Unfortunately Java JIT as designed by the JVM devs takes a lot of memory.

In raw performance, I would still give the edge to Java over Golang though.


A lot more to an app than MCV framework. I realize dropwizard tries to be the everything for the app, but at the core it is a MVC with some bundled libs.


> This is the easy trap to fall into though.

Indeed.

It's a typical honeymoon phase with very little regards to 1-2-5 years in the future. The cost of having picked to Go will be fully apparent then.


I really want to see a comparison of a language like go to something much more in the functional sphere when it comes to maintainability of a large codebase.

I really feel like that one of the big issues we as programmers want to get a better handle on but there isn't a lot to go off that isn't based off opinions (which can be hard to validate).


Yes, the limitation is rarely the programming language, it is the programmer.

Also, when you do the rewrite you have already solved the domain problem that you did not fully understand when implementing it the first time.


"Plan to throw one away; you will, anyhow." First version to understand the problem, second version to solve it.

But deployment, gc pauses and startup time (jvm vs go) are orthogonal to program quality. I would also expect go to have less memory usage.

> deployment was simpler and faster, memory usage was slashed..., request/response times were much more consistent, startup was practically instant


Orthogonal to quality but imperative to velocity.

At the end of the day despite Go's failings it's a good (maybe the best?) language for large projects and teams because it compiles fast, is easy to anyone to run anywhere, tests run quickly, programs execute quickly and there is already good tooling/editor support.

Nothing beats efficient workflow for improving velocity.


I could use the exact same arguments but for PHP.


Not quite, I omited here that it is also statically typed and is a future proof language. Mainly because these are properties already shared with Java. However this is not true of PHP.

PHP is a great velocity language, provided you have a small(er) team or are willing to commit to additional controls on how you write your PHP (document types/structure of arguments mainly) to ensure that your PHP code is able to be read quickly by other developers.

Personally I prefer Go here because it enforces good readability by default and therefore scales better with team size.


Readability is not problem in PHP either. Follow PSR-1, PSR-2 and PSR-4 and use a command like tool like codesniffer in your build step to guarantee code standard on each commit (or use Upsource)

And in PHP7.x you have even more type hinting than before and with an IDE like PHPStorm refactoring is a breeze.

And with the release of PHP7, PHP is future proof. The community will continue improve it with the major features, they have shown it. Interest in the language have increased. More RFCs is contributed to the language than before. https://wiki.php.net/rfc

Multiple teams on a large code base is not really a problem in modern PHP. I do it every day. We follow modern design patterns, code reviews, code coverage over 80% of the system (old as new code). New code is probably over 95% coverage. Deploys regularly multiple times every week.

Almost all (>95%) of my problems stem from design decisions made in the past, not the language itself.

I'm not saying that you should not use Go (or Java). Both are fine languages. Use the right tool for the job. If you don't do a realtime stock trading system or some embedded system, but some web stack, I can't really see that the majority of the problems stem from language choice (whatever you choose). It is in the team, the culture, the understanding of the domain. There should be your focus.

Personally, the most two important things I look for in a language/platform is tooling and community.


Which is why I still often use PHP.

For my usages its a reasonable language.


You're still limited by the JVM technology, regardless of how you write your app - large heap, and big tail latencies (JVM's GC is designed to be throughput optimised, whereas golang is latency optimised).


Just use one of the other JVM's that have GC that are latency optimised such as Zing from Azul, Metronome from IBM or OpenJDK with Shendoah from RedHat.

The power of Java is that there is more than one JVM and that can really save you a lot of money/developer time if the world changes under your ass ;) i.e. had a JVM based graph database, ran it on Hotspot -> big GC pauses, moved to Zing no more pauses. All we needed to do is run a different VM and problem went away (new problem was of course that Zing costs but not much, also now with Shendoah coming for free we could probably have moved to that)

With GO you can't do that yet. If your app is not latency, bound but throughput bound there is no place to switch too other that a rewrite. That flexibility of deployment on JVM tech gives us a lot insurance for no costs, until we need it.


Actually, the new G1 collector deals very well with latency-sensitive workflows. I'd say it's comparable to Go if you adjust your heap size to the working set. You can try running the benchmarks here - https://gitlab.com/gasche/gc-latency-experiment.


Exactly, that's how some not very bright people were tricked into thinking that Node.js is actually fast.


And at work, we're now rewriting all those NodeJS services in Go or Java.

We hired some Node maintainer(s) a long time ago, rumor has it, who got us on the Node train.


How do you do async in java? While it does have CompletableFutures now, none of the libraries (specifically databard drivers) seem to support it, so I always end up with a blocked thread per request.


Java has had non-blocking IO for some time. https://en.wikipedia.org/wiki/Non-blocking_I/O_(Java)

Unfortunately it seems difficult to use (to me at least), but frameworks like netty are build on top of it to provide incredible performance.

However, the fact that Java provides real threading means that a blocking io is not a performance problem if you use the correct patterns.


I've spent a fair bit of time in both, most recently the last couple of years in go. I think its a very mixed bag and there is no clear winner.

The tooling, especially for runtime operations, are so much superior to the golang options its night and day. I have much more success modeling complex business models in java with its better type system, and for doing low latency work its much easier to do on the jvm due to the availability of better libraries (which may get better in go) and the concurrency options are miles better on the jvm.

Go's stack allocation and gc defaults make for easy management in most of my default cases. The ease of adding http endpoints to things is phenomenal. Being able to write easy cli applications in the same language I write daemons in is great.

All told, I think for simple daemons and cli's I'd go golang, for more complex systems I'd go jvm.

I, personally, think the binary deployment thing is overblown. I've never had any problems deploying jvm applications and the automation to do either seems essentially the same to me.

As for the relative "heaviness" I think golang definitely feels lighter, but that is largely because golang apps do less. Once you start having them do more they start to "feel" just as heavy as java apps (for whatever "feel" means).

* [edit] called golang heavier meant lighter


I have golang website/web app that runs at tens of megabytes per process. A very similar java web app runs in a few hundred megabytes per process.

I also run these in on cloud platforms that auto scale. The golang processes spin up very quickly, the java ones not so much.

In these two respects the JVM is heavy compared to golang for my very common scenarios. The heaviness also causes me to spend more money for the JVM solution.


But what max heap size did you set for the JVM?

I have an app that people were complaining took too much memory. A quick look with VisualVM showed that its actual heap usage when idling was only 50 mb but because we hadn't set any heap size limit, it was reserving hundreds of megs from the OS. The idea is that it can run faster if it does that. The fix was simply to use the -Xmx option to tell it to use less memory and GC more often.


The JVM is very inflexible in that respect. If you give it more memory it will keep all of it way beyond the point where it matters for performance. If you give it less memory you need to know exactly how much less you can give it before performance craters.

In other words, JVM deployments need a lot more tuning than Go and they will generally need a lot more memory as well. But you're right, not setting -Xmx at all will make the JVM look worse than it really is.


We have similar experiences as well.

  $ ps -eo rss,cmd,user | grep jenkins
  4928228 /usr/bin/java -Djava.awt.he jenkins

  $ ps -eo rss,cmd,user | grep drone
  12940 /drone agent                root
  19924 /drone server               root
We run the two applications in the same machine. Admittedly Jenkins is much feature-rich but we only use its vanilla settings without whatever fancy plugins for a few legacy SVN repos.

P.S. The Drone server and agent are running within docker containers.


I don't know why you are getting downvoted for sharing your real world experience, lately "fanboyism" on HN is getting out of hand. I have similar experience with one of the services that was ported from Java to Go.


"Don't rewrite an application from scratch. There is absolutely no reason to believe that you are going to do a better job than you did the first time." -- Joel on Software, Things You Should Never Do, Part I [1]

[1] https://www.joelonsoftware.com/2000/04/06/things-you-should-...


I do like that article but have ignored it several times for good reasons.


I think a better interpretation of the title/article is "Don't assume that rewriting from scratch will fix all of your problems"


These types of arguments cause many intelligent people to headdesk. They're hardly an apples to apples comparison.

Of course "Go was Faster". It's because you started with a clean slate!


That's not it. A fairly small http server in go will run in tens of megabytes. The same thing on the JVM requires a couple hundred megabytes at best. The difference in startup time is roughly the same as well.


Are you sure? I've written small HTTP servers in Java that can happily run with a heap of 30-50mb or less. Runtime overheads add some on top of that, but not much.

I think the perception of Java suffers a lot because it will consume all the RAM on your machine by default if you let it (but not immediately). It's a very poor default because even though there are technical arguments for doing that (goes faster), they aren't well known and people tend to assume "more memory usage == worse design".

There are a lot of myths about the JVM out there. We can see on this thread the idea that it takes 1.5 seconds to start being repeated multiple times, each time someone else points out that it's actually more like tens of milliseconds to start.


> I've written small HTTP servers in Java that can happily run with a heap of 30-50mb or less. Runtime overheads add some on top of that, but not much.

I second that. I have deployed a medium traffic web-server written in Scala backed by a postgresql DB on 128MB VPS, back in 2009!

> I think the perception of Java suffers a lot because it will consume all the RAM on your machine by default if you let it (but not immediately).

I don't think that is true. The default heap size for Oracle and OpenJDK VMs has been bounded as far as I remember. In fact, I would like it if the VM, by default, allowed the heap size to grow upto available RAM when GC pressure increases, but that doesn't seem to be the case as of now.

Edit: Did you mean non-heap VM arenas grow indefinitely? If so, I am not aware of them.


Must be Java 6 in 2009. Java memory usage increased with new releases to make it perform better. For medium traffic site it would have worked fine because GC would have ample time to clean unused objects.


128mb is a lot compared to go which will often run around 10mb. Was your jvm back then 32mb or 64mb? If it was 32 your memory requirement will be higher on 64.


128MB was the total RAM in the VPS including OS + nginx + JVM + Postgresql. The heap allocated to the JVM process was about 64MB, but bear in mind that this was an actual application. So, it's hard to do a detailed comparison between JVM and Go without standardising on the application. All that I am claiming is that JVM is in the same ball park.


It's not in the same ballpark. I'll throw some code up when I get a chance.

Edit: do you have a twitter or Reddit account? I'll ping you when I have code examples if you want.


> it will consume all the RAM on your machine by default if you let it

I wonder if Oracle documents are plain wrong for JDK 8 docs for maximum heap size[1]:

"Smaller of 1/4th of the physical memory or 1GB. Before Java SE 5.0, the default maximum heap size was 64MB. You can override this default using the -Xmx command-line option.

Also Oracle has chosen correct defaults because it took Java long time to shed its reputation of being dog slow and if they optimize for memory it will start looking worse in performance.

1. https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc...


Try to do it and shoot me a github link. Trust me, I've tried, but I'm no JVM expert so there could be some magic flag I'm unaware of.

I can get it to start around 30-50mb, but as soon as you hit it with traffic the memory usage jumps up.


It was a couple of seconds on my Cyrix 300MHz-ish CPU back in 1998. I would have expected it to get a little better since then.


I have a little server I wrote in Java. Admittedly it is not a HTTP server, but it quite happily handles a thousand simultaneous connections with a memory limit of 200MB. It's currently sitting around 26MB, but I'm sure some of that would disappear if the VM did a GC.


Not correct at all. You can run Tomcat in 5mb of ram and it starts in less than 250ms


As another commenter mentioned, I think this is much more the programmer and less the language. Sure, the language may recommend certain approaches which carry across teams differently, but it still often comes down to the app, not the language. I implemented a rudimentary Java AOT targetting Go and the trimmed-down stdlib grew so big Go took hours compiling it (granted some of that is how I approached OOP and what not).


> I implemented a rudimentary Java AOT targetting Go and the trimmed-down stdlib grew so big Go took hours compiling it

Have you reported to Go devs? Sounds interested use case.


Yes[0] though I think the issue title is a bit off. They did bring it down from over 7 hours to 30 minutes ish in a recent release, but it's still is too long, too much CPU, and too much mem. They are very reactive of course which is something I can never say about OpenJDK.

0 - https://github.com/golang/go/issues/18602


> memory usage was slashed

A lot of this has to do with another unmentioned, terrifically annoying property of the JVM: pre-launch min/max heap allocation. Standard operating procedure is to go with the default, and overbump it if your needs exceed it. I can't possibly imagine how many petabytes of memory are unnecessarily assigned to JVMs throughout the world as I type, apps consuming 79MB with a 256MB/512MB heap size.


I wonder how much of this is due to a couple differences: in Go you can embed structures in others instead of using a pointer, strings are UTF-8, and arrays (slices) are resizeable by default.

(I'm sure a chunk of the difference is due to a better understanding of the program during rewrites.)


These sorts of comments are (no offense) worse than useless. Benchmarking is one of the most difficult things to do in software, and anecdotes like this just make things confusing for new engineers and feed the perpetual hype train around newer languages.

Please refrain from making statements like this unless you have a reproducible quantifiable analysis.

If you really wanted to demonstrate the effect you describe you'd need to have the same team rewrite the application twice, once Java->Java, once Java->go, making sure to align the program structure as much as possible (making exceptions to take advantage of lang specific features of course).

If you were to do that, then that would be interesting! No one does that of course because it's expensive and wasteful from a business perspective, but it's the only way to determine anything useful.


Microsoft seem to have learned a lot from Java in designing their new .NET Core CLR. It has gotten almost everything right:

* a small and fast CLR (JVM)

* a class library that defaults to almost nothing but primitive classes

* proper and standardized version, platform and package management (NuGet)

* open source and MIT license[0]

* a patent promise[1]

* arguably the best dev IDE available (Visual Studio) and one of the best up-and-coming dev text editors (VS Code)

* Native ORM, templating, MVC, web server so there is one way to do things

* open source middleware standard (OWIN)

* they left out, for now, attempting the hard ugly stuff like x-platform GUI

* all platforms are equal citizens, they acquired Xamarin for dev tools and release their own Docker containers.

* it's already getting good distribution (on RedHat) even tho it's only 6 months out from a 1.0 release.

Java may have missed the window for fixing some of these issues in their platform - I feel that if Android were being developed today, they'd almost certainly take .NET Core as the runtime.

I've yet to commit to using .NET Core anywhere, but from what I know about it so far it is impressive.

[0] https://github.com/dotnet/coreclr

[1] https://raw.githubusercontent.com/dotnet/coreclr/master/PATE...


> * all platforms are equal citizens

This may be true for the Core CLR specifically, but it's not true of real .NET apps that are being built today. The vast, vast majority are strongly tied to the Windows platform, especially because of the lack of a cross-platform GUI like you mention. As a Wine developer, it's a huge pain in our side because we either have to run the entire .NET virtual machine, which is hard, or depend on Mono, which is by design not completely compatible. This results in really souring my opinion of .NET and .NET applications when compared with win32 applications that do tend to work quite well in Wine.


I could more or less agree with most of it apart from

> arguably the best dev IDE available (Visual Studio) and one of the best up-and-coming dev text editors (VS Code)

https://www.jetbrains.com/resharper/documentation/comparison...

Refactoring, Coding assistance, Navigation & search sections being most important.


Yeah, if I had a dollar for every time I had to restart Visual Studio in order to get something to work...especially test debugging. But IntelliJ always works perfectly. Must say that I can't wait for Jetbrains' Rider to come out.


Yeah, same here. I can't count the amount of times I heard statements like that one (also about Eclipse) and was puzzled. I'm starting to think that people saying this just haven't had the curiosity to really explore the alternatives. That said, even though VS causes me to cringe pretty constantly when I use it, you have to give props to MS for the language integration tools they put together for .NET. Some of the tricks they managed to come up with (like moving the instruction pointer in a method while debugging) is pretty impressive. Unfortunately, every time I get amazed by something like this, either some blatantly stupid behavior of VS destroys the magic again, or it outright crashes. Sigh.


I couldn't agree more. I constantly see this claim made about Visual Studio. I find it to be in the way most of the time. It does sound like most of the features that I want are in Resharper; I'll have to try it out.


You will never, ever look back.


Although the last version I used seriously was VS2013, VS on its own is pretty mediocre. With ReSharper though, nothing beats it in my opinion.

On the other hand I've been using Eclipse and IntelliJ for the past year. Eclipse is not even worth talking about but even IntelliJ does not come close to vanilla VS in terms of usability. Again, my opinion.


Yes, it will be interesting to see what the JetBrains C# IDE is like when it's released.


What is better than VS? PHPStorm etc?


I actually can't name anything better. I was just saying that whilst VS is "the best", the best isn't really that great (without a plugin from Jetbrains)...


> I feel that if Android were being developed today, they'd almost certainly take .NET Core as the runtime.

If it were to develop today as against raising against time(Apple) then Google would have written their own runtime and everything.


But then when Android was initially developed and acquired by Google Apple wasn't in the phone business at all, so the base architecture was already laid down long before the race started.


Apple was getting in phone business at that time. Original Android was nothing like Android user saw when it was released on phone. Google had advanced knowledge about Apple plans as Eric Schmidt was on Apple board at that time.


isn't dart with flutter pretty much that?


> Microsoft seem to have learned a lot from Java in designing their new .NET Core CLR

Of course they did. It's not a secret they designed it as a Java clone when the justice ruled they couldn't embrace the original one.

However, they missed something: cross-platformness. So essentially you get a windows only Java platform. That's why not everybody finds it impressive nor are looking forward to commit to using it everywhere (they wouldn't be able, though)


> they left out, for now, attempting the hard ugly stuff like x-platform GUI

What is the status of this? Will MS be bringing WPF (XAML) to all platforms?


IMO Microsoft will never bring WPF to any other platform but windows. Core CLR (and web GUIs) are what will be available on other platforms. I think that WPF will always remain a windows thing. For that matter WPF is not even getting developed much on Windows and is largely left as it is in favor of putting their effort into web technologies and CoreCLR


I agree with many points in this article. That being said, there are dimensions of heaviness not captured in the article as far as I can see:

1. The startup times, not so much of the JVM itself, that just takes 1,5 secs, but the startup time of your application gets higher if you have a lot of classes on the classpath. I guess it's the classpath scanning that takes a lot of time (?).

2. Memory usage of Java objects is quite heavy. See this article: http://www.ibm.com/developerworks/library/j-codetoheap/index...

3. The heavyness of the ecosystem in terms of the magnitude of concepts and tools being used and the enterprisy-ness of libraries.


> 1. The startup times, not so much of the JVM itself, that just takes 1,5 secs

Where do you get these numbers from? On my five year old MacBook Pro with default JVM options parsing a 20 MB file:

real 0m0.248s

user 0m0.325s

sys 0m0.043s

> 2. Memory usage of Java objects is quite heavy.

That's IBMs enterprise VM that uses three word headers. HotSpot is actually better. If you compare that with other "lightweight" programming languages it is really, really light.


> real 0m0.248s

A quarter of a second to start up the VM, run some code, and exit again is actually pretty steep compared to typical interpreted and compiled languages. Among other things, this means that you can't really call Java executables from a loop in a shell script.

For comparison purposes, both Ruby and Rust will show between "0.00 elapsed" and "0.02 elapsed" for a simple "Hello, world" program on my laptop.


That's just Hello World, though. He said his app was parsing a 20 MB file.

To do a fair comparison, with your example, I just compiled and ran Hello World in Java on my machine and got this:

real 0.06 user 0.06 sys 0.01



The parent did write "parsing a 20 MB file". So not a hello world.


<insert joke about Java being overly verbose>


I'm not arguing with you, those are genuine problems, but there are a few projects in a pipeline to address a few of these things.

1. Startup time being addressed by precompiling the standard library (or your own library). See "JEP 295: Ahead-of-Time Compilation": http://openjdk.java.net/jeps/295. Also addressed by modularisation of the standard library, "JEP 220: Modular Run-Time Images".

2. Memory usage (and less garbage collection overhead) using value types. See "JEP 169: Value Objects": http://openjdk.java.net/jeps/169.


> The heavyness of the ecosystem in terms of the magnitude of concepts and tools being used and the enterprisy-ness of libraries.

You don't have to use the enterprisey libraries though. Using Dropwizard, for example, gives you a tight and performant set of libraries that have a fairly minimal learning curve and require relatively little boilerplate.


While this is true in practice it can be hard. You don't always have control of what libs you are using and often finding lightweight alternatives to many libraries is hard to impossible. It is better once you get outside Java proper, but nearly all the alternative languages on the JVM tout access to the Java ecosystem as a plus which then brings back in all that pain.


This is one thing I've never understood about Clojure - the Java interop. I actually love Clojure but I close my eyes to the fact that it requires an object-oriented VM to work its magic. Clojure is a functional Lisp based on immutable data structures which is about as far from Java OOP as it gets yet we're encouraged to mix Java objects and classes into our Clojure apps as if nothing matters.


Funny, when I code in Clojure there are these things called multi-methods, protocols and multiple dispatch.

I think it was originally designed in a Lisp library called CLOS, which incidentally stands for Common Lisp Object System.

Very nice explained on how to implement OOP in Lisp, in a book called "The Art of the Metaobject Protocol".

Users of Lisp based languages should think twice before criticizing OOP.


Users of Lisp based languages generally think about six times, on average, before criticizing OOP (each time).


> You don't always have control of what libs you are using

Well that's true regardless of the language. If you're not making the decisions on the codebase, there can be all kinds of gnarly dependencies and practices that you have to adhere to. I agree that big legacy corps tend to have over cumbersome setups, but hey, at least it's not cobol. My advice is not to work for big legacy corps.


But some languages have better cultures/eco-systems than others. Java has one of the worst.


Far from it IMO. It depends on which subculture you immerse yourself in. If you subscribe to the IBM/Oracle/Red Hat thought leaders, then yes - you'll encounter enterprisey stuff, because they're all targeting legacy corps.

Believe me that I know where you're coming from -- I have a real aversion the big enterprise side of the Java world. There's a lot of interesting development in Java open source though, and it'd be a shame to throw the baby out with the bathwater.


I mean, even the "enterprisey" stuff like Spring Boot is more than fast enough. I have a little REST service I just deployed to production today, 5 seconds to start up on my laptop's SSD (unfortunately it took about 50 seconds in production because our SAN is dog-slow for some reason).


How is Spring Boot "enterprisey"? It makes modern java programming simpler and more accessible by hiding some of the unnecessary complexity. It enables things like https://jhipster.github.io/ which to me is the Rails equivalent in the java world.


JHipster equivalent to Rails? You have to be joking, surely. I just setup a JHipster site and when I opened it up in IntelliJ it was the same labyrinthine mess I've come to expect from Java frameworks, ie. knee-deep in endless subdirectories and everything abstracted away to the point of incomprehension. Contrast that with the simplicity of Ruby and Rails. Java by its very nature makes it impossible to build simple, easily comprehended frameworks and apps. The trouble is that devs who have spent most of their lives in the Java ecosystem can only think relatively, ie. Java Framework X is simpler than Java Framework Y. Unless they expose themselves to something like Ruby or Clojure they will never experience true simplicity.


This is non-sense, something like Spring boot is easy to comprehend, there's no labyrinthine mess whatsoever. You've got the choice to use other simple frameworks like Spark, or plain libs like Jersey + Jackson. All in all it's still the same, you write your Controller, your services and that's it. Where is that complicated?

And I'm a big fan of Clojure, but it's not because Clojure is cool that Java becomes de-facto a big pile of poo. People have been drilled by so much FUD about Javaland that they simply can't stand to try it correctly without preconceptions.


I didn't say equivalent in general. I said equivalent in the Java world where we are more used to having to deal with bloat. Just free yourself from your preconceptions and run ./gradlew . Everything will start up fine, no mess, I promise.


> hiding complexity

This makes me shiver in terror.


Our job as programmers is to hide complexity and expose it when needed. Nothing to shiver about.


Spring Boot hides needing to deploy to an application server along with the extra configuration that entails, it doesn't hide Spring from you.

There's still a ton of "enterprise-grade" shit in Spring, you just aren't forced to use it if you don't want - but it's always there, lurking behind the scenes.


Startup time on my late 2014 MPBr for Clojure Hello World is indeed 1.29s, which is what the OP was measuring.

   $ time java -jar target/uberjar/clojure.jar
   Hello, World!
   1.29s user 0.08s system 181% cpu 0.755 total

   $ /usr/sbin/system_profiler -detailLevel full 
      Model Name: MacBook Pro
      Model Identifier: MacBookPro11,3
      Processor Name: Intel Core i7
      Processor Speed: 2.3 GHz
      Number of Processors: 1
      Total Number of Cores: 4
      L2 Cache (per Core): 256 KB
      L3 Cache: 6 MB
      Memory: 16 GB


It is Clojure. It loads an additional runtime by itself. It is unfortunately not usable for CLI applications. Pure Java does the same thing in fraction of time. http://blog.ndk.io/jvm-slow-startup.html


4. Garbage collection. The fact that Java does not have a refcount collector, that can release memory back to the process's pool as soon as something goes out of scope and is no longer referenced, is horrid. Nearly every major software written in Java goes through the worst kind of struggle wherein users have to assign a 4 GB heap size to run a service that only really needs 500 MB. When fatal Out-Of-Memory crashes are the status quo, something is very very wrong.


I'm sorry to call out this comment specifically, but almost everything you said here is not true. Out-Of-Memory crashes are not the status quo. The JVM garbage collector is (generally) a very high performance system that has improved incredibly over the past decade, it's not as simple as saying it's missing reference counting so it's "horrid".

These are the kind of lazy generalization that causes people to make poor technology decisions.


These points are true in my experience (EDIT: 1.5 startup time sounds like too much) and they are enough to debunk the claim "The JVM is not that heavy". I hadn't ever heard anybody considering disk consumption or installation time before, when making that claim.

To add,

4. Garbage collection and lack of value typed records. As far as I know there is currently no way around going full SOA (structures of arrays (of primitive types)) for large data collections.

Object overhead (memory usage) and GC overhead are the reason why only SOA will work (and it's a pain because the language doesn't make it convenient) if you have like >10^7 objects. (That's my personal experience from a 2-month project, and I normally don't use Java).


> As far as I know there is currently no way around going full SOA

There are if you use language extensions like Packed Objects on the IBM JVM or Object Layouts on Azul.

So just like C, you have C and then GCC C, clang C, ....

Eventually Java 10 will fix this, but for those that like to live on the edge there are already snapshots available.


> Object Layouts on Azul.

https://objectlayout.github.io/ObjectLayout/ does not save you any headers. It just allows you to control where your objects are in memory and compiler optimizations based on this. It does not help you with memory footprint.

Also I'm not sure if it's really implemented on Zing considering that from the outside the project seems dead.

> Eventually Java 10 will fix this

I would not be so sure. The challenges especially regarding primitive generics are not to be underestimated. See

http://cr.openjdk.java.net/~jrose/values/shady-values.html


> It just allows you to control where your objects are in memory and compiler optimizations based on this. It does not help you with memory footprint.

It is already better than what you get on Hotspot.

> The challenges especially regarding primitive generics are not to be underestimated. See

The challenge here is due to how Java designers to build them in first place.

Modula-3 and Eiffel are two examples of languages with proper generics, value types and toolchains that do AOT compilation to native code.

So I am still hopeful.

However, like everything, some challenges are technical and some are political.


1.5 secs for the jvm only seems excessive.

$time java HelloWorld

Hello, World

real 0m0.071s

user 0m0.053s

sys 0m0.020s

That is a linux vm running in a mba (first run).


Did you test that 1.5 second claim yourself? I literally just wrote a HelloWorld and ran it on my MacBook, the total time for the whole program was <0.2 seconds.


I agree that my claim is false. I kind of wrote that from the top of my head.

In any case, the point that I wanted to make in the parent comment was that the JVM startup time itself was basically fine.

I just checked on my Macbook and a HelloWorld class gives me .13 secs real.


To be fair, 0.2 seconds is still ludicrously long. I can literally say the output of the program in less time than the runtime can.


In my mind, 1½ seconds is huge; that essentially rules out any interactive usage. It's even annoying for rapid development cycles. Only low expectations or heavy orchestration can overcome such a startling disadvantage.


It rules out any interactive usage where you are starting and stopping the jvm, like in a command line context.

There are work arounds for this (things that reuse jvms and such) but until that is overcome the jvm is largely not appropriate for cli tools that start/stop.

But for other kinds of interactive programs, things with long running sessions and such, it is pretty easy to a) lower that startup time and b) do things that mitigate it to the user.


I've always thought it'd be nice to build a sort of hybrid between a "ClojureScript for bash", and a Java boot-script + RPC client.

Picture a Clojure macro library just for writing CLI driver programs, where you could call all your Clojure code like normal, and where some of the subcommand-defining methods of the driver program could be annotated with something like "@inline".

The un-annotated subcommands, as a simpler case, would translate into calls to spawn a JRE and feed your ARGV over to it once it gets running. These would be the slow-startup calls, so you'd just use them for the things that need the full "horsepower" of the JVM.

The @inline subcommands, on the other hand, would grab your app, its deps, and the JRE, do a whole-program dead-code-elimination process over them to trim them down to just what that subcommand needs, and then would transpile that whole resulting blob to bash code and shove it into a bash function. (So, something like Emscripten with a different frontend + backend.)


That's completely false in the context of a Lisp.

I boot the JVM once and iterate endlessly in the same process. Same for ClojureScript in the browser or node.js. Lisp is by far the most interactive language there is with the fastest iteration times (AFAIK).

1.5 seconds would be huge if you had to constantly restart your application like you do everywhere outside Lisp. Iterating in Clojure is literally instant.

I wrote applications in dozens of languages, and none come remotely close to Clojure's iteration speed or joy of use.


Lisp is by far the most interactive language there is with the fastest iteration times (AFAIK).

That's Forth. Lisp comes next.


if you had to constantly restart your application like you do everywhere outside Lisp.

This was probably true in the 80s, but hasn't been in a while. Many languages have this, either built-in or as a tool. In the case of the JVM, there's spring-loaded, which works in Java, Groovy, etc.


1.5 is huge, except it is completely wrong. JVM startup time is within fraction of the second.


You are right, I was just going with what the grandparent said. But I think with normal amounts of class scanning and other overhead, 1.5 seconds becomes the practical normal.

Certainly the JVM startup always feels slow, in my experience.


Well, if you create a lot of additional objects on startup then it will take some time. JVM startup is still fast. http://blog.ndk.io/jvm-slow-startup.html


That links says 1.2 seconds for a hello world! 1.2 microseconds is what I would expect to be called fast.


1.2 seconds was hello world in clojure, the Java hello world presented the numbers below, so it's mostly clojure that is slow:

$ time java Hello Hello world 0.04user 0.01system 0:00.12elapsed 43%CPU (0avgtext+0avgdata 15436maxresident)k 29672inputs+64outputs (82major+3920minor

While 120ms elapsed is not stellar, it's rarely a problem with how the JVM ecosystem looks.


Please be kind to reread it.

Startup time of a simple Java application and therefore also whole JVM is 0.4s (in the linked article).

1.2s is for the implementation in Closure that includes its additional quite heavy runtime.


I read it fine, but Clojure is an application of the JVM. The fact that a popular interpreted language of the JVM takes 1.2 seconds for hello world is a problem of the JVM itself, or at least it's ecosystem. An interpreted language in C wouldn't take nearly that long.


The way an interpreter is implemented and the language it runs on don't have anything to do with each other.

A Clojure interpreter written in C, if written the same way as for Java, it would run just as slow, given the way it is building Clojure every time the application starts.


That's clojure, not java. JVM startup is much faster.


That 1.5 seconds is FUD anyway.


On my old ThinkPad X201 with HDD and 8GB of RAM, running Fedora 25 and Gnome 3 I can start vanilla WildFly 10.1 in less than 10 seconds, http://imgur.com/a/BCDNP:

    20:43:44,578 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 10.1.0.Final (WildFly Core 2.2.0.Final) started in 6551ms - Started 331 of 577 services (393 services are lazy, passive or on-demand)
Not too shabby IMHO.


There's that word again "enterprisey".


This interview with Bob Lee is really interesting on this topic: https://www.infoq.com/interviews/lee-java-di.

Apparently, Square was first built out on Ruby with the mindset that the JVM is an old clunker.

Fast-forward a few years they switched to the JVM because it was faster and the language (I know, not related) provided compile-time safety.


Almost anything would be better than the Ruby runtime, which is notoriously bad. JVM performs best with largish heaps (>512Mi) - if your services fit into that model, it's a great piece of technology that is very fast.

But I have to agree w/ others that after using golang, where an equivalent web app would run in <50Mi of RAM with far better tail latencies, the memory cost of the JVM feels very large.


Same with Twitter. Remember the Fail Whale? Those were the non-jvm days before they ran ruby on the jvm.


> Apparently, Square was first built out on Ruby with the mindset that the JVM is an old clunker

Ah yes, this would be the same industry where people lead their teams to switch from Java to Go because they believe it will improve the productivity of development.


If one doesn't switch technologies to improve the efficacy of their team, why ever switch technologies? Should we simply use the first technologies conceived until the end of time?


Java, especially JAR files, can be quite light weight. However, JVM environments, and development with Java and Clojure, can be very heavy _and_ slow.

For Clojure, starting `lein repl`, takes 16 seconds on my 2012 Macbook and 9 seconds on my similarly-aged Dell laptop, both with SSDs and i7 quads.

Regarding memory usage examples, the base memory usage of a Google App Engine instance running the move trivial Hello World Java program takes around 140MB. Given that the default F1 instance has a soft memory limit of 128MB, it becomes clear that the JVM is working against you in both cost effectiveness (the price to spin up new instances as your existing ones are already above the soft limit) and latency (since spinning up instances is slow). Add Clojure on top and the problem certainly doesn't get any better. As an added annoyance, which is specific to App Engine but a result of using the JVM, it's impossible to specify JAVA_OPTS, so any of the -X flags, without switching to the Flexible environment.

As a result of both of the above, choosing Clojure for developing on App Engine, as my specific example, has had the serious downfall of slow development tools and memory issues out of the gate on my instances, causing me to pay more for a beefier instance class. The REPL is really hard to beat, but the combination of JVM and Clojure are the biggest pain in the ass, with this stack.


Yes, the startup is slow. But nothing afterwards is. The JVM gets bad press because of startup time while in reality that hardly matters.


In production, what matters is the ridiculous memory usage. Both Java and App Engine are to blame about this, but the Python and Go folks aren't running into the same issue.


If memory usage matters to you, have you tried telling the JVM not to use all the memory? Have you tried any of the tuning options? Can't say I've ever used app engine, but the pure java applications I've worked on did in fact use a fair bit of memory, much like windows and OSX will aggressively use spare memory for caching. Then I use -Xmx to tell it not to use all the memory and now it's much better. For a server, I'd say that this is both expected and the correct behaviour.


I have, and gave up after a month of touching every couple of days trying to find the sweet spot. Someone more familiar with JVM internals would have probably succeeded - but somehow C Python Go and even OCaml runtimes don't need this level of tuning in my experience.

JVM deployments tend to assume nothing else happens on the same machine, in my experience.


You last point is probably true. In every application I have ever deployed I only ever set the Xmx and J-Server settings, I never faced any problems. You need to give the GC some breathing room though.


> As an added annoyance, which is specific to App Engine but a result of using the JVM, it's impossible to specify JAVA_OPTS, so any of the -X flags, without switching to the Flexible environment

I haven't used App Engine either but I suspect that the flexible environment is more expensive.


I use -Xmx and then the process crashes because it runs out of memory. It's very wasteful with memory and I wish jep 169 was implemented to mitigate it. It's unfortunate that it was written in 2012 and remains a draft.


Well, I don't know what you consider "ridiculous", but I run a web/application server with significant complexity (written in Clojure) in a 2GB RAM VM, with the JVM taking up about ~800MB. I'd say that's pretty reasonable and it meets my production/business needs.


Considering that RAM is the primary cost in hosted environments, I'd say that 800MB is anything but reasonable. In fact, as a guy coming originally from a non-Java background, that seems downright opulent. Perhaps that's fine if it's the only thing running on that machine, but is that very realistic? A database, some daemons, cron jobs... It seems uncouth to me that the business logic should take up as much RAM as the data itself.

I run a fairly complex Clojure app and it uses 1.5GB of RAM all told. Factor in cron jobs, caching, DB flushes and other periodic spikes on the machine and whoops! we're over 2GB used. To prevent thrashing and OOMEs, I sized up to a 4GB VM, doubling my monthly costs (there are several of these boxes).

Now yes, I can go through and swap out libraries to slim the beast down. But that would mean rewriting large chunks of it, since so few Clojure libraries are interoperable and most simply wrap "enterprisey" (and heavy) Java libraries in sexier syntax. And if I'm rewriting it, I might as well avoid the whole mess and pick an ecosystem with a more streamlined standard library.

It's tough. I adore Clojure, but the combination of Clojure+JVM has made deployment and management less fun and more expensive than necessary. The JVM is awesome, but, just as it's not the hog so many claim it is, it's also not sleek.


Interesting that all the comparisons were with server frameworks. As if anybody ever cared about a few hundred megabytes of overhead on a server. Hell, even a bloated JVM implementation fits in most server L3 caches.

On the desktop, laptop, phone, or embedded environment, the JVM is heavy. It starts up slow, jars carry around ridiculous amounts of dead dependencies, garbage collectors require immense amounts of tuning, etc. And we shouldn't really expect otherwise. If you can't even keep your VM in cache, how are you supposed to have fast application code?

Specialty closed source JVM vendors have done wonders in terms of improving this problem...but it's still an uphill battle. AOT native compilation down to machine code is becoming more popular because of the proliferation of resource-constrained environments, and it will take time for new languages/compilers to take over, but take over they will.


Haha, AOT native compilation.

This comes back time and time again. AOT native results in slower runtimes for applications with one simple exception: startup time. In every other case a modern JIT compiler like the JVM will win due to gathering information and layered compilation.

Where AOT really makes sense is for an interactive app on a mobile device where you don't care about the last millisecond of performance but startup times and even much more importantly: energy expenditure. (That's why google AOTing the apps on device startup is quite sensible)

Most funnily Microsoft was heavily advertising AOT with .net framework 1.0 but in general switched to dynamic profiling and optimization in later versions of .net. (System assemblies that everybody uses during startup are AOT compiled using ngen, however)

It is just not "one size fits all", depending on your platform and requirements you'd have completely different requirements to your virtual machine. What you want is different between interactive use and "batch processing" and between energy starved devices and big iron.

Java - while ironically advertisted for "applets" years ago - is optimized for the latter case and there it really shines. On a server with long running processes AOT makes no sense at all.

So what AOT will take over is phones. IOT devices. Everything were energy is at a premium and startup times need to be quick. Layered JIT compilation takes over where you want to squeeze out the last bit of total performance. (Even interactively, looking at you, Google V8 and Chakra)


    AOT native results in slower runtimes for applications with one simple 
    exception: startup time. In every other case a modern JIT compiler like the 
    JVM will win due to gathering information and layered compilation.
That's not necessarily true. JIT compilation is severely constrained in the amount of analysis that it can do for the simple reason that JIT has to be fast. Fast enough to not noticeably slow down the app. Meanwhile, an AOT compiler can take all the sweet time it needs, and roam all over the program in order to discover optimizations.

JIT compilers work very well on untyped languages like Smalltalk because the compiler can discover the type information at runtime, and then pre-compile the types that it sees most often. But that's not really that useful on the JVM, because Java is typed, as are most other JVM languages, with the exception of Clojure.

    Most funnily Microsoft was heavily advertising AOT with .net framework 1.0 
    but in general switched to dynamic profiling and optimization in later 
    versions of .net. (System assemblies that everybody uses during startup are 
    AOT compiled using ngen, however) 
Actually, with .Net Native, Microsoft is back to advertising AOT.

http://blog.metaobject.com/2015/10/jitterdammerung.html


Not all JIT compilation needs to be fast. I think it was hotspot that first does interpretation, then - when interpretation proves too expensive - switches over to a quick jit. And when code gets executed lots of times it runs jit again, this time with deep optimization settings.

What I personally don't understand: Why don't we cache JIT results between runs? That might be a worthwile optimization and even possible in the face of class loading.

It probably would be like running ngen on .net, just WITH performance statistics of a program. (Enabling specialization of calls for types commonly passed or eliminating constant expressions while keeping the generic version of a function around - that's hard in AOT as you need profiling information. I think sun's C/C++ compiler was able to do that for AOT, resulting in large speedups. But maybe it only used it for branch prediction).

Edit: What I forgot to add - I like the way you could always AOT things in .NET with ngen but also use a JIT where possible. Now that Java turned out to be owned by the evil empire and .NET the one by the company committed to open source - imagine reading that 10 years ago - I'm really curious in which way things will develop. And with all the new contenders as well. JVM (and .net) is not dead, but a lot of interesting alternatives are getting traction now.


> I think it was hotspot that first does interpretation, then - when interpretation proves too expensive - switches over to a quick jit. And when code gets executed lots of times it runs jit again, this time with deep optimization settings.

This is configurable, Hotspot can JIT right away when application starts, but then be prepared to wait a bit.

> What I personally don't understand: Why don't we cache JIT results between runs?

They do, just not the OpenJDK that many only care about.

All commercial JDKs support code caches between executions and AOT compilation.


I agree with most of what you say...AOT is far more of a competitive advantage for phones and other devices.

But I'd argue that there are very few benefits of JIT that can't be achieved by AOT + PGO. A sound static type system nullifies the need for most of those benefits (like speculative type optimizations and deoptimizations). But it might have the upper hand in cases where profiling can't capture all of the possible optimizable workloads that the binary would see. Databases or other large programs that continuously specialize over the lifecycle of the process. But that is far more niche than most people realize.


Later versions of .Net don't do dynamic profiling, the .Net CLR is much less sophisticated than the JVM, the first time you invoke an IL method it is compiled to machine code, and that same code will execute for the rest of the lifetime of that image.


That isn't 100% correct, you can write CLR plugins that control that behaviour.


Don't forget that AOT also means less memory -- no hotspot running and dynamically compiling code.


> On the desktop, laptop, phone, or embedded environment, the JVM is heavy.

I'll half-agree with respect to phones and embedded environments since those are wildly variable and may include extremely low-specification platforms.

But a desktop or laptop? The JVM launches in milliseconds on my desktop and laptop. The monstrous Eclipse IDE launches in about six seconds on my desktop, and about five seconds of that time is Eclipse loading various plugins and what-not, in what looks like a single-threaded manner.

My desktop and laptop can both spin up Undertow and fire up a web-app from a Jar in about two seconds.

I'm fairly sure Eclipse is just using an old clunky CMS garbage collector. I've never tuned it on my desktop or laptop. Maybe Neon is using G1 now. I don't know and don't care because it runs just fine.

Maybe you've done something different with the JVM on desktops and laptops, but in my experience, on desktops and laptops, the JVM behaves more or less the same as it does on servers.


> As if anybody ever cared about a few hundred megabytes of overhead on a server

Kids today! Sit down over here, and Grandpa will tell you about the days when a few hundred megabytes was more than your average server's entire storage capacity. Now, in those days you tied an onion to your servers, which was the style at the time...


The really interesting part of this: Java was actually designed for smaller embedded devices and not for servers.


And there are alternative JVMs that are specifically designed for that use case that don't behave the same way.


For examples:

• the JVM on smart cards, e.g. EMV (chip) credit cards, or GSM cellular SIM cards

• the JVM embedded into the Intel Management Engine coprocessor


> the JVM embedded into the Intel Management Engine coprocessor

First time I've heard that one. Got a source?


Igor Skochinsky did some research into IME and has slides [1]. See slides 32-41.

[1] http://www.slideshare.net/codeblue_jp/igor-skochinsky-enpub


The notion that the JVM is not heavy because it needs less than a GB of disk space seems crazy to me. I consider OpenSSL to be wildly bloated because it is over 1 MB.


A megabyte of disk space is now worth about $0.0000290, according to this site:

http://www.jcmit.com/diskprice.htm

The numbers we're talking about here just aren't a practical consideration any more.


The disk space is cheap, sure.

But the CPU time spent by the dynamic linker resolving thousands upon thousands of symbols? That's actually rather painful.


You must be very busy if two hundred milliseconds is painful.


The actual HotSpot JVM itself is about 10mb, but that includes 4 GCs and 2 JIT compilers.

The Avian JVM can statically link an entire program and widget toolkit with itself and produce a 1mb binary.

It's not that big a deal. The space gets taken up by all the libraries. But then you'd want to compare a JVM against e.g. /usr/lib on a fresh Linux install ...


OpenSSL only does one thing.


Well, LibreSSL devs considered OpenSSL bloated from a security standpoint [1].

[1] - https://en.wikipedia.org/wiki/LibreSSL#Code_removal


OpenSSL relies on system libraries (timezones, locales etc.) that is built-in to JVM, so it's not apple-apple comparison.


Sure it takes a while to load and there's bloat, but the bloat is everywhere now.

On the bad site of the JVM and assorted Java tools is that they are second class citizens of the unix world. The command arguments are all messed up, much like a windows tool ported to unix, and the interaction with the rest of the unix stack like sockets, files are all solipsistic and off, which leaves an ill stink on everything touched by it.

One of the things I find funny with java is the once upon a time much touted security model, fast forward a couple of years and the event of android - using the unix security model and none of the java stuff.


Sure it takes a while to load and there's bloat, but the bloat is everywhere now.

This is exactly the kind of development culture that produces heavyweight, unresponsive tools.


Well, except for the type safety of Java, only allow native code to be compiled to shared objects for implementing Java native methods and exposing all OS APIs outside what is required for graphics and real time audio via JNI.

If it wasn't for the pressure of game developers, the NDK wouldn't even exist.

Remember Brillo? It was supposed to be like Android, but using C++ frameworks instead, as presented at Linux Embedded 2015 conference.

Guess what, when it got recently re-branded as Android Things, it switched to the Java Frameworks instead and it doesn't even allow for the NDK, with the user space device drivers being written in Java.


OP is impressed by running 5 processes at once while claiming that the JVM is not that heavy.

Is this cognitive dissonance? Dishonesty? I don't understand.


I think it is genuine sense of wonder of a web developer. I have been told many times to update my ancient hardware when I say Java is memory hog and slow on a 6GB Windows7 laptop.

Some people don't think that those massive 16/32G MBPs etc with SSDs are not available to everyone.


6 GB? Your laptop is probably not running optimally, because 2 GB of address space has just 64-bit bus, 4 GB has 128-bit bus.

You should ensure both memory modules are same size (say, 4 GB or 8 GB), otherwise performance can suffer noticeably.


It was given by company I was contracting for few years back. I recall Laptops were 'upgraded' to 6GB from 4GB to give developer better performance.

I think my main point was making luxury as basic because even more luxurious stuff exist.


I was a little surprised that he author disliked "heavy" things, yet kept reaching for the absolute heaviest tools in their respective categories:

- Rails (just about the heaviest web framework ever made for ruby, despite it's wide appeal)

- Ember (I absolutely love ember, but it is by far the heaviest modern JS framework... I don't include things like ExtJS)

Also, I routinely use the heavy/light distinction, but it seems in a completely different way. I almost don't care how heavy something that run on the server-side of a web application is, on the back-end "heavy" generally translates to "contains complexity I'm not willing to deal with". "heavy" on the frontend for me means both in footprint and complexity.


Start-up times are still an issue with Clojure on the JVM. For instance with Android I've found the initialization times to be pretty much a show-stopper for any application development.


Clojure on the JVM is way worse than java...but the JVM only contributes to a tiny portion of that. The real problem is that every clojure process has to bootstrap the entire language and compile every library before it can begin executing.


I don't know anything about Clojure, but in modern Android development, Instant Run patch files mean installation isn't even done each change any more. Even then, you can avoid a lot of installation time by using an emulator on a fast development machine instead of a real device. For most of Android's life the emulator has been disgustingly slow, but for installation times, it has benefits. Meanwhile restoring from a snapshot instead of booting fresh each time, x86 emulator images, and Intel's virtualization engine mean the speed isn't so bad any more.


It has nothing to do with that and more to do with that last time I tried it the start-up times for an app was multiple seconds on top of base start times.

Given how often Android evicts apps that's something I'm not comfortable with shipping. Would love to use Clojure but it was definitely a show-stopper for us.


What did you use instead?


Stock Android/Kotlin.


Android doesn't run the JVM.


That's a lot different a context than web-application deployment, though.


That's mostly on clojure.core being so large and rebuilt each time by default isn't it ?


I've been using these arguments at my company for years. There's definitely FUD surrounding the JVM, and it's pretty ridiculous. Sure, it's not a perfect system, but it's usually disregarded for being "old and bloated".


> I run at least 5 JVM processes on my 2012 MacBook Pro with 8GB of memory. This is all day, every day. I would never have tried to start 5 Rails apps at the same time.

Where have we gone wrong that starting 5 processes with 8GB RAM seems impressive? On my development PC I regularly run: - two different browsers (Firefox and Chrome), - an Email Client that is a slimmed down browser suite (Thunderbird), - two chat apps and an IDE that are browsers in disguise (Electron and Chrome App). And that is without me running and testing any of the applications that I'm actually developing. Oh, and then automatic testing starts yet another browser. Even better, my actual application runtime environment then runs in its own full OS virtualization, because otherwise the deployment environment differs from my development environment… If we can't slim down the runtime environments of our day to day apps and development tools, how are we going to survive the end of moore's law with the ongoing trend to hide everything behind more and more abstraction and virtualization?

And I disagree with many others here, that RAM usage just doesn't matter for server development. At my last project, our setup contained a few macro services and the ELK stack for logging which accumulated to five JVMs and three Node instances. Now this is ok for the live setup, because most of these will run on different machines anyway. But for testing you want to have them all on the same machine, for convenience on the Jenkins machine. And you want separate setups for integration tests, user acceptance tests, and demo purposes. All of these have basically no load, but still consume the full amount of RAM. Of course, you will say, just get more machines, we are in the age of the cloud! But that doesn't come for free: Now every Jenkins job needs different credentials to access the machines, all developers need access to every machine, every new setup needs an additional cloud provisioning step, possibly with approval from management because of the additional fees. And yeah, you can automate most that away, but building and maintaining that automation also carries its own burden. All of that for something that should be essentially free – it's not like my OS doesn't already run a few hundred processes when I have just started my window manager.


Not just heavy but I really don't like all our processes named "java" with stupid -D cmd line parameters. A Nice native binary app will let you can name it what you want with config as you like it.


You can name the process anything you like with exec -a


Use `jps`


Not to mention that having three horizontal monitors still isn't wide enough to see the entire flag list in top/htop (looking at you, ELK stack)...


there are ahead of time compilers for java! not free though.


IIRC, gcj compiles to an executable.


Gcj was deleted from GCC in October 2016[1]. So only free option left, AFAIK, is to use mono AOT[2].

[1]: http://tromey.com/blog/?p=911 [2]: http://www.mono-project.com/docs/about-mono/languages/java/


Or a copy of GCC from September 2016. (That's not going to get you Java 9 support, I realize...)


It is not going to get you full Java 5 support either. The project was dead before the OpenJDK even existed and it never got better.


Avian can compile AOT to a single executable.


I feel like I'm missing something. His argument seems to be "the JVM isn't heavy compared to other large and bloated systems". Well... sure. My car isn't heavy when compared to a tank, but if I want light, I'll use my bike.


I've never had a problem with Java the language being heavy. However, many code bases due to annotations, dependency injection, etc. from a development perspective really aren't any longer Java. You need to understand the code base, the app itself, etc.

This happens with open source projects as well. I contributed to the Azure support for JClouds and the bulk of my ramp up time was understanding how things were done more than writing the code itself.


I could not agree with you more. My most recent job doing Java dev at a major US tech company has really soured me on this style of Java. Annotations beyond null/nonnull seem to obscure the code and make it far less maintainable. Besides being largely superfluous the DI frameworks — primarily Dagger — I've seen widely used at said company have led to numerous production memory leaks because they obscured object scope and lifecylce from engineers.


Java does not give memory back to the system very quickly per default. This blows up the memory allocated to the jvm considerably.

start it with:

   java -XX:+UseG1GC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=15 -jar ...
and the jvm will give back to the system. see http://imgur.com/a/m9Qxx


> -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=15

TIL. Why are those XX flags?


i think most people wouldn't say the jvm is heavy compared to ruby or python runtimes, but rather golang , rust or swift. Aka : 1 file copy deployment, almost no boot time and 0 memory bloat.

the first time i ran a server in go and realized it took a few kilobytes in ram when nothing happened, i was quite in shock.


1 file is the stupidest argument I've ever heard for saying one language is better than another.

Look, here's the 1 file it takes to install a python app I wrote: mycoolapp-0.1.0-1.el7.rpm - how neat is that?!

Sure, pretty much anything is "heavy" compared to go or rust, but those are systems programming languages by design, not something I'd write some huge web application in personally.


> Look, here's the 1 file it takes to install a python app I wrote: mycoolapp-0.1.0-1.el7.rpm - how neat is that?!

Is that one cross-platform file? Is it even portable across different linux distributions? Across different servers running the same distribution but perhaps with different libraries installed? Will another python developer understand how the build process for it is set up and be able to add new dependencies?


> Is that one cross-platform file?

No, and neither is a Go or Rust binary, next question.

> Is it even portable across different linux distributions?

Nothing preventing me from taking an extra 10 minutes to modify the spec to support SUSE, Fedora support usually comes for free if you support el7.

You like Debian derivatives? Let me toss a debian/ directory in there, that'll only take a couple minutes too.

Added benefit, I'm not just throwing files at a server like some hacky Windows developer doing an xcopy deploy to IIS. I can check what version of my application is deployed, and update it along with the rest of the system if I so desire (setting up a yum or apt repository isn't hard).

> Across different servers running the same distribution but perhaps with different libraries installed?

Different libraries? Do you mean different VERSIONS of libraries? Native libraries have figured this shit out for ages with soname's. Python too, different versions of an egg can be installed side by side, you need to pin a specific version just use pkg_resources.require() in your main script.

> Will another python developer understand how the build process for it is set up and be able to add new dependencies?

I had a Jr. Developer with no experience with Linux or Python pick up building the package and making basic edits to the .spec file 10 minutes. RPM/DEB packaging isn't magic, you describe your package with metadata, write some shell commands to build/install your package in a buildroot, and then list the files from said buildroot to include in the package. You could make your first package from scratch in under an hour if you read the Fedora or Debian wiki guides on packaging.


> No, and neither is a Go or Rust binary

Sure, I'm comparing with the JVM per the article.

> Nothing preventing me from taking an extra 10 minutes

Indeed, but 10 minutes here, 10 minutes there, it all adds up.

> Different libraries? Do you mean different VERSIONS of libraries?

No, I mean some native libraries not installed. Does your package declare what packages it depends on? How do you handle different distributions using different package names for the same libraries.

> RPM/DEB packaging isn't magic, you describe your package with metadata, write some shell commands to build/install your package in a buildroot, and then list the files from said buildroot to include in the package. You could make your first package from scratch in under an hour if you read the Fedora or Debian wiki guides on packaging.

Sure, none of it's hard. But if there's no clear standard everyone ends up doing it slightly differently, and then every project you pick up you have to understand how they've set things up.


> Sure, I'm comparing with the JVM per the article.

Fair enough, though I still use RPM's to deploy Java (Spring Boot even!) applications.

> Indeed, but 10 minutes here, 10 minutes there, it all adds up.

In the grand scheme of software development updating the .spec file or debian control file is peanuts.

> No, I mean some native libraries not installed. Does your package declare what packages it depends on? How do you handle different distributions using different package names for the same libraries.

In the case of RPM based distributions, some conditional macro's in the .spec file that swap out Requires/BuildRequires statements based on the distribution the package is being built for. I never bother with SUSE personally, but there are differences between EL7 and Fedora that I have to keep track of.

> But if there's no clear standard everyone ends up doing it slightly differently, and then every project you pick up you have to understand how they've set things up.

.spec files are more standard than most build tooling. The only thing that complicates them is projects without an adequate build system in the first place, everything else is minor style differences based on who wrote the spec.

Everything has been done before, unless you are using some extremely new or extremely niche language or build tool chances are your spec file will be easy to figure out, since any and all complexity is explicitly linked to how easy it is to build and install your software in the first place.


"For both Node and Ruby you need a C compiler on the system which is hundreds of megabytes alone." Wait what? What c compiler is hundreds of megabytes?


  $ ls -sh /usr/bin/gcc-5
  896K /usr/bin/gcc-5

  $ ldd /usr/bin/gcc-5 
	linux-vdso.so.1 =>  (0x00007ffc64bf5000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fda3da4f000)
	/lib64/ld-linux-x86-64.so.2 (0x000055f2460a1000)
(so no specific .so dependencies)

And why would you need that in production ? (hint: use fpm)

This article seems strange : why do you include xcode in npm ? why use a mac if it's that heavy ? sure an IDE is nice but I wouldn't take that as part of the language. Or also include eclipse + plugins.


I certainly agree that you don't need that for production machines.

However /usr/bin/gcc is just a frontend, and calls a slew of other binaries(such as cpp, collect2, cc1, normally found somewhere in /usr/libexec/gcc/). You also need binutils, and to be useful, likely headers for at least the standard C library.

The gcc + binutils package comes up at around 75MB on my machine.


I stand corrected - duh of course I should have added called binaries. So it seems 100M is the right ballpark.


You're measuring the size of a program that calls GCC, not GCC itself at all. gcc-5 isn't the compiler, it dispatches to cpp/cc1/ld/etc.

cc1 is 20MB over here. gold is 5MB. Dynamic libraries it depends on (gmp & family, isl) weigh about 4MB. All-in-all the programs required to compile even a simple C program come out around ~32MB.

Clang is much bigger. Clang 3.8 is 59MB (I think that includes the preprocessor though, but GNU's CPP is only around 1MB (which still seems huge for a preprocessor)).


I keep seeing plenty of evidence that the JVM is indeed heavy. When I was working at a startup and we were running Solr on the JVM, it kept running out of memory and crashing. When I tried using Clojure I was irritated by the startup time of the REPL on the JVM. More evidence: http://stackoverflow.com/questions/13692206/high-java-memory....

If you want something light like Clojure and don't need any Java libs, try Pixie. https://github.com/pixie-lang/pixie

If you want something light not like Clojure, Go is a great choice. It's fast to compile, fast to run and doesn't gobble up your memory unless you force it to.


If you ran a game written in C++ that kept running out of memory and crashing, would you say C++ is "heavy"? Solr (and Lucene internally) maintain large in-memory data structures as well as memory map (usually) on-disk segment files. How much memory they use is determined by many, many configuration options within Solr. This has nothing to do with the JVM.

Same thing with Clojure. It basically bootstraps the entire Clojure environment on each process start. The JVM itself starts up in tens of milliseconds.


So JVM is not heavy compared to even heavier stuff. However compared to Go it is quite heavy in term of disk, RAM usage, and deployment.


Exactly. I see where the author is going with his article (and I think some of the points he raises are actually quite valid), but he's playing both sides of the "heavier than what?" point.

Yes, the JVM is slimmer than many interpreted languages with external dependencies (Python, Ruby, etc), and yes it can be slimmed down through manual labor, but no it is not something you can call "not heavy" with a straight face.


Interestingly, the author does not touch startup times.

Frankly, for server-side loads, no-one cares how much your runtime weights, be it disk space, download size and even memory (heck, my last project required developer workstations with at least 96GB).

There are all sorts of environments where that matters. And that's one of the reasons why Clojure didn't catch on on Android.


Wow, 96GB of RAM? I don't even... Would you care to elaborate?


We have applications that require 100GB of RAM to run.

The result being that they're not run locally. At least we have alright tests.


Well, I sure wish I knew how to make our Clojure dev server (`lein ring server`) take less than 2 minutes to start up on a 4 core i7 16GB MacBook Pro.

Fortunately I typically only need to restart it when switching branches.


There's maybe just too much stuff required in user namespace? We were able to address many problems including this one when we switched to boot-clj


Have you considered looking at it with a profiler? Some relevant tools: jconsole, YourKit, jmh...


I must admit my experience is a few years old, but one thing that the whole text doesn't mention and that has contributed a lot to the negativity I feel towards the java ecosystem: It seemed not well integrated in the Linux ecosystem.

What do I mean? I often experienced that Java dependencies were not readily available in Linux distros. Packaging Java stuff was - weird and complicated, not sure how to better phrase it.

For Python/Ruby/PHP you usually can rely on the fact that major libraries are properly packaged. For Java not so much.


Because Java distributes applications as bundles including the libraries (fat JARs, WAR files and so on), libraries need to be available to the builder, not the installer. That means it's perfectly sound that they're in Maven Central (and competitors), and not in the OS's package manager. It's the same with packages in JavaScript, Ruby, Go, Erlang, etc.

The only things you might need to install through OS packages are the JVM, and perhaps an application server if you're doing things that way, which fewer and fewer people are.

When i first started working with Java on unix, i felt the same way as you - i wanted the libraries to come through the OS package manager, the same way native libraries do, and spent ages trying to get my deployed applications to use them. Eventually i realised i was just doing it completely wrong.


Python, Ruby, JavaScript etc all have their own dependency repositories and packaging systems. Java does too. It's not Java that's the problem here, it's the Linux distros that insist on packaging everything themselves (almost always badly).

You could if you really wanted to just auto-convert Maven Central to DEBs. The metadata is there. The problem is that "dependency hell" would then visit you in the same way it does for Linux apps. Upgrading libraries is something that should either be done by developers, or by OS vendors very carefully, not by having some random part time packager run a script and push a new version that immediately propagates down to everyone else without any app-compat testing.


I think that's a fair point. I bet part of it is due to licensing. The choice on whether or not to use OpenJDK is not trivial, and I surmise that distro managers just avoid the issue completely (I would).


You can rely on ruby packages being properly packaged? Ahahahaahaha. They package what is necessary for the end user apps (like redmine) that they support, and a very bare minimum of other gems. Then you are stuck trying to match their release of updated versions with the ones you use. That seldom matches the update speed you will want, for neither ruby versions nor gem versions.


I used to think this too, until I came across http://www.scylladb.com/

It is a fork of cassandra written in the Seastar c++ framework and is drop-in compatible with cassandra. Claims 10x increase in performance.

I always thought there was a few percentage points difference - never a 10x performance difference between java and c++. And that too for a project with as many man hours and facebook-scale tuning as cassandra.


I don't believe their claims. Many benchmarks (including those done by ScyllaDB) are done badly. They'll take a database built to operate on larger than memory data (e.g. 10x) and run on a dataset that can fit entirely in memory. So whoever optimized for in memory wins. But run on an appropriately sized dataset or reduce system memory and you see little difference.

This might seem like a good thing (ScyllaDB gives you extra performance when you have the memory for it), but it does mean that if your dataset grows, performance falls off a cliff. Something to keep in mind.


"it does mean that if your dataset grows, performance falls off a cliff."

Are you saying you know ScyllaDB does not handle larger datasets and Cassandra is better in this respect? Or are you saying that their benchmarks are not yet conclusive?


I am saying that when you go from fully in memory (due to having a small dataset) to having to move things to and from disk, disk increasingly becomes your bottleneck rather than memory. And disk is much slower than memory.


I thought a main point of Cassandra was to be distributed so the working dataset could stay in memory across the cluster. And the smaller memory footprint you typically get when you're not in the JVM means more of your working dataset can be cached in memory. So I would expect superlinear speedups compared to Java for exactly the reason you describe (depending on the request distribution).

But yeah, I'm always up for pouring over more benchmarks. :)

Here are more details on benchmarks here:

https://qconsf.com/system/files/presentation-slides/avikivit...

The YCSB benchmark suite they use is the same one as used in this paper from the Cassandra homepage:

http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf


The choice of C++ is responsible for only a very small part of the performance difference. ScyllaDB uses different low-level algorithms, many of which could have been done in Java as well. That the Cassandra data model works well with the sequential processing approach of Seastar makes the effort of implementing in C++ manageable. In general, concurrent data structures in C++ require significantly more effort than in Java, and rarely yield performance improvements that are worth it, unless you're memory-bound. In sequential code it's easier to surpass Java's performance, but even that difference is diminishing (and expected to be dramatically reduced when value types are added to the JVM). Usually, the only significant overhead you must be prepared to pay is in RAM, and in exchange you get better performance-per-effort.


If something sounds too-good-to-be-true, it probably is.

I don't love Cassandra, but it's not because it's written on the JVM (full disclosure, we roll our own JVM key-value datastore https://github.com/liveramp/hank).

If your random-access keystore is limited by anything except network and disk latency, you have bigger problems.


If you investigate what they actually do to achieve those numbers, it's much less simple than just rewriting Cassandra in C++. For example, they use their own TCP stack and make use of vector intrinsics.


Hey.. "vector intrinsics" looks very cool. Thanks for mentioning that!

so what you mean is that, even after throwing facebook scale resources at java.. it is possible for a <10 people team to get 10X performance over java using the features that you mention.

That's a huge loss of face for java IMHO


> so what you mean is that, even after throwing facebook scale resources at java.. it is possible for a <10 people team to get 10X performance over java using the features that you mention.

That is not what I mean. Writing a userspace TCP stack isn't a feature of C++.


Explicit use of vector intrinsics are not the source of any 10x performance boost, nor anything to do with C++ in particular. Again, only a small (but probably not minuscule) portion of the difference has to do with Java vs. C++. The bulk of the difference is due to all sorts of optimizations, most of them could have been done in Java as well. But the ScyllaDB people are more experienced in C++ than in Java, and as they use sequential code anyway, there isn't a big downside for using C++ -- certainly not for them -- so it was the better choice. From what little I know, the reasons why such optimizations weren't done in Cassandra are because 1. the people working on it aren't low-level optimization experts, but more importantly, 2. because the performance was good enough.


You aren't comparing the same program written in two languages. Seastar stuff is written by C++ performance experts who are fanatical about tuning, and does all kinds of unusual far-out things that Cassandra doesn't do to get high performance.


> That's a huge loss of face for java IMHO

Because the JVM JIT, like every other compiler, sucks at developing vectorized algorithms ad-hoc; a task usually carried out by human experts in that?


They also have never caught up feature wise, and actually have gotten further behind since initial release. Also the benchmarks are lies (tbf, all benchmarks are lies).


characterizing what the performance difference between C++ and Java is or will normally be is really hard.

Naive translations from Java to C++ will normally result in only a small % difference.

But clever rewrites where control of memory locality is leveraged, and SIMD intrinsics are leveraged (either via pragmas to induce it automatically, or by hand), good understanding of compiler settings for given architectures, etc, the differences can get quite large, depending on the problem domain.

Then again, there are ways around some of the performance limitations in the JVM, but it often involves writing very painful coding styles. But you could narrow the gap a bit with that effort. (but if you are going to add effort, maybe just do it is in C++?)


We run an algo trading system using 10-20 JVMs depending on exchange. It's not so bad.


If you're installing a compiler on your production server, you're doing it wrong.


And yet we do that every time we install the JVM (and Ruby, and Python, and...)


No, when you install JVM (you probably meant JRE) on the server -- there's no `javac` (compiler) installed, only `java` -- JVM. I never needed `javac` on production server.


The JIT is a realtime compiler, so yes, there's still a compiler when you put the JVM on a machine (whatever form you use).

Of course, I know of a lot of developers who just put the whole JDK in a Docker image to save on the complexity of having to manage two different installs or containers.

> I never needed `javac` on production server.

Good; that's how it should be; but not how it always is.


JIT is part of a platform's runtime environment, not part of its build environment. Since you appear to be a super-pedant, I'll revise my initial statement to:

> If you're installing any tooling or programs on a production server or opening ports other than those strictly necessary for running your production application (as a pre-built binary or package whenever applicable) conforming to industry standards for that application/server environment, you're doing it wrong.


That's nothing - every time you serve some JavaScript, you're relying on a compiler being installed on your users' machines!


Jars are just awesome for cross-platform work or for when you need easy deployment. I oscillate between Java and Go now for similar tasks. They are both great for cross-platform and easy deploys/frequent updates. Go is better at systems stuff (that's expected as it is the new C IMO). Java is better if you want a boat-load of rich data structures at your finger tips (more than slices, structs and maps... Go feels bear bones here, but I like it that way just as I like C that way). They both perform about as well as C and C++ for most things and are safe and fun to use. I like C and C++ too, but I'm getting too old to use them.


The reputation of "heavy" was hard-earned back in the bad-old-days (1.3 era) when EVERYTHING NEEDED to be written in java and the JVM still sucked.

I, literally, had a party at work when we got our web app to three days of continuous uptime without an OOM error; nevermind that the early JIT (1.4 era) took 12+ hours to get "warmed up" and give peak performance.

Don't misunderstand me- java's horribleness has given me a pretty nice career so, for that, I love it. For the years and years of broken promises- (OSGi without running out of PermGen? lol) I hate it.

Java is not lightweight by any means; computers are just faster and have more memory.


I think topic poster point was that _other_ ecosystems (JS and Ruby) became so heavyweight that JVM now looks light in comparison.


It's true. node.js issues also pay the bills.


Ruby is perl-minded folks reinventing python without understanding it. The result was: ruby is python-done wrong.

Now comparing python flask uwsgi and jvm jetty apps are in favor of python in all metric.

Starting a django elephant takes no time compare that to starting less capable framework of choice in java world.

To be fair, java can be much much faster than python. But usually you don't care because python is not the bottleneck. Ex. The bottleneck can be in the SQL query


Ruby is perl-minded folks reinventing python without understanding it. The result was: ruby is python-done wrong.

This is something said by someone who has no clue about the Smalltalk influence on both Python and Ruby. (Though Guido was very critical of certain things Smalltalk did, and made a point to do certain things very differently.) Ruby is very much Perl redone by someone who very much wanted a Smalltalk-like object system, but with much more syntactic sugar.


Talking about ruby community, thread safty was too late it's only usable in ruby 2.x and rails v. 4.

And even with rails v4 many apps are not thread safe like canvas lms because the long tail of rails addons that are not thread safe and the common bad practice of using static properties to store non-sharable states and properties


Perl > Ruby > any other language > Python


Work on an enterprise-sized "app", the JVM starts seeming "heavy". It feels super slow (and memory greedy, and you have to make sure it has enough of the right 'type' of memory) in executing on a whole (when it gets going it can do limited, specific things quite fast) and for productivity, as does all the tooling around it. Unless perhaps you're Google and you have server farms constantly compiling crap and so on so devs don't have to notice as much. Most of the negative feelings are probably the fault of having a ginormous app in the first place, I'd likely feel similarly about C++. Still, even when it's just a somewhat-large app (like a database, or an IDE) I've felt it in Java whereas working on a similarly large app in other languages doesn't have the same feeling. I've never really felt it when just writing a REST/SOAP API that's effectively a thin wrapper around some DB calls. Like a blog could be, but there are many other similarly small things. But those sorts of small problems can be done effectively in most any language, and 'heaviness' is probably very low on one's priority list...

I think there's an incentive with more dynamic languages to decompose your software into smaller bits. When the language gives you the ability to "script" that helps even more. You don't need to bundle everything into an uberjar, you can run things independently, and therefore you don't even need all of the sources locally, just what you need to do your particular bit of data processing. It's possible to do this with Java and multiple JVMs, but it's hard, the incentives aren't there. Unfortunately I don't have experience with Clojure in the large to say whether it helps make the JVM feel lighter, from my side projects it seems like it could but I don't know if the community is going that way. Lispy languages have different incentives since they let you build from the bottom up so well.


I think a part of the reputation of Java being big and slow comes from its popularity in the enterprise. This is partly because in that space you're usually writing apps for a captive audience (your fellow employees) so there isn't much incentive to optimise. The business managers understand things like "the app has feature X by date Y" but don't really understand or care about "it is productive to develop on this app" or "the app uses half as much memory as last month". Those things make the lives of developers nicer, but don't change much about the business.

Moreover, I think enterprise software managers often don't really have a grip on how much work needs to be done or how long it should take. So you can get situations where a team of say 10 people is staffed up to build an in house app, they deliver it, there are some improvements that can be made, they deliver the improvements, etc. After a few years the app is largely in maintenance mode and doesn't need much done to it, but ... who wants to fire the loyal employees who understand the app? Unless there is another app of roughly the same size and type waiting in the wings, what can happen is the team starts doing busywork. The managers don't notice because they aren't programmers to begin with and can't tell the difference between "creating a new in house framework because no reasonable alternative exists" and "creating a new in house framework because we're bored".

So over time enterprise software can bloat to extreme levels. Combined with Java's verbosity, and the fact that a long time ago it really was very slow, you get a platform with a reputation for ponderousness that is only partly deserved.


Yes, it's not my preferred platform at all but we all inherit projects. By leaning too heavily on the _dynamic_ aspects of Java (like reflection) it is possible to get Python-like speeds and constantly running the risk of OOM. But with better tooling than Python, at least. (For small languages that interop with the JVM, I must say I had a lot of fun with LuaJava - Swing was actually pleasant)


Way back when I ran a comment engine that was a custom Java servlet running on an embedded (i.e. mostly interpreted) JVM on a NSLU2: a 266MHz ARM with 32MB RAM. The servlet container was Winstone. Can't remember what the JVM was, sorry.

Load wasn't exactly high, but it worked absolutely fine.


Ah, it was JamVM.

http://jamvm.sourceforge.net/

Alas, the last release looks like it was 2014, although it does claim to support Java 8. The actual interpreter core is under 100kB (class libraries extra, of course).

It plugs into the OpenJDK! Look for the openjdk-8-jre-jamvm package in Debian.


What most people seem to miss is that if you use Clojure, the startup time hardly matters, because you do not restart your development JVM often.

I find this similar to discussions about boot time. It doesn't matter -- I reboot my computer once a month, if even that.


Note that you can also use IKVM and then Mono AOT if you don't want a JIT at all.


I see no mention of the additional cognitive load a complex runtime adds. This to me is the 'heaviest' part of the JVM. Though I admit it is not necessarily the same class of things the author is discussing.


Clojure is heavy, or at least it has high start times. Some of that is the JVM start, but parsing and compiling a large amount of Clojure code system-wide is more of a burden than, say, CPython.


You know it's not heavy because there are blog posts about it being not heavy. I also know Python's GIL isn't a problem because there are numerous posts about it not being a problem


It's not the JVM, it's the tooling.


Which tooling specifically? Going from Java to Python a couple years ago I was in shock about how immature the tooling in the Python world is by comparison.


Good question, I still haven't found something that trades blows with maven in any other programming language. I've made my peace with setuptools for python, but .Net/Ruby/Go/Rust/etc. are all lacking in one way or another.


What could Cargo do for you that you miss from Maven?


I also hate the Java tooling as well personally, it's my #1 complaint with the language and one of the reason I avoid it. I would say that nothing really follows unix principles and seem over-engineered, everything is quite complex to grasp to become proficient.


I don't know when I used Haskell after using Scala, the first thing I missed was the JVM tooling. There is nothing I have seen ever that comes near.


Which specifically? I'm curious which tools you object to.


If you don't think the JVM is heavy try running it on a t2.micro. It can never get enough memory to start. I haven't ran in to any other language runtime that won't run on a t2.micro


Huh? I ran Atlassian Stash (now Bitbucket Server) on a micro while trialing it.

That's a many-hundreds-of-MB Spring enterprise behemoth of a Java server.

Have you ever tried running Java on a micro?


I'm running a bunch of Java stuff off a t2.micro, runs fine.


Just out of curiosity: If you decide to use Java (or any other languages that run on JVM), are their 3rd party libraries as "nice" (read: as many) as npm?


Java has an incredible range of libraries, often of very high quality. You have to manually install the dependency piece in the form of Maven (http://maven.apache.org/), or if you're doing Clojure, Leiningen (https://leiningen.org/). The total number of artifacts in the main archive, Maven Central (http://search.maven.org/#stats) is ~1.8M.


Former node dev, currently Clojure here.

Npm has more abundance of good packages. They are typically better documented and easier to get started with, sadly :(.

JVM has some really great stuff - things which are lightyears ahead of what is there in NPM. Much of which started as university projects, as Java is popular at schools.

I wish JVM developers took a hint from others and started making very easy and fun documentation, but have no high hopes.


Can you give some examples?

I've usually found docs to be pretty good in the Java library space. One of the nice surprises about it.


Hi Mike! Big fan of your work! Have been for years, so it's a pleasure seeing you have replied to my comment.

My latest example of this is the Apache Commons-Net Java package [0]. I wanted to make a toy Telnet server as an example for a friend asking how to do so.

I gave trying to install the library a good 20 minutes in my evening at home. Couldn't find a simple Maven "this is how you install" and no "easy 1-2-3 get started with this library" text either.

After looking around for a while, I decided to check how to do it in Node. It's super easy, you just do

    require('net').createServer((socket) => socket.write('Hello from a toy telnet server!')).listen(8080)
And that is all! Got my task done in 5 minutes, and my friend learned something easy. My takeaway is that for software to be successful, being technically sound is only the first half of the marathon. You also need to make it accessible.

[0] https://commons.apache.org/proper/commons-net/


A telnet server is just a socket connected to a pty, no? I'm not sure there's much to it, which is why it's so easy in node.

I guess in java you'd do the same program something like this:

    ServerSocket server = new ServerSocket(8080);
    while (true) {
        try (Socket sock = server.accept()) {
            sock.getOutputStream().write("Hello world!".getBytes());
        }
    }
It's a bit more verbose, but hey, that's Java.

I think maybe the issue there is you got distracted by the idea that you needed a library. Commons-Net doesn't actually provide a telnet server because it doesn't need to.

That said, I agree that Commons-Net doesn't have great docs. I think it's fallen out of use over time. These days if you wanted a powerful non-blocking socket library you'd use Netty or VertX. The docs for Netty are much better:

http://netty.io/wiki/index.html


Cheers for the example. The Commons-Net library was where I ended up after some searching - so my experience is of someone who wants to get from 0 to 100 asap.

Cheers and have a good day!


Yes. Maven has been doing package management right for years (and avoids the wasteful repetition that node does, that can often hide silent incompatibilities until runtime).


If you're building jars with a classpath on the filesystem, maybe.

If you're building wars, fat-jars or anything else that gets you to that "single jar deployment" which is often mentioned as a pro, you're definitely not immune to duplication. Especially when you get to libraries that have changed their package name over the year like say jackson.


> If you're building wars, fat-jars or anything else that gets you to that "single jar deployment" which is often mentioned as a pro, you're definitely not immune to duplication.

True, though at least it's once per transitive dependency per deployable application rather than once per path to transitive dependency per project.

> Especially when you get to libraries that have changed their package name over the year like say jackson.

Yeah, you do have to deal with those, though it's a relatively small number IME. It would be nice if maven had some integrated support for saying that library x and y are actually what was previously combined library z or similar, though I'm not aware of any package manager that does that yet.


That cough medicine doesn't taste that bad. If you have to use "that". It's the thing.


I wish I was as good as the developers commenting how slow java is to start. They are clearly on a higher plane of enlightenment where 1.5 seconds to start ruins their day, their deployment process, and development cycle.

More power to them.


The biggest problem with Java is not the JVM, but versioning. Many Java applications encounter difficulties when the wrong version of Java exists on a computer. This leads to many siloed computers to maintain.


And it is better than a segfault.


A lot of Java apps are shit at packaging too. They often include an entire Tomcat server with all the setup files you don't even need. I wrote this a while ago:

http://penguindreams.org/tutorial/embed-tomcat-in-your-appli...

..but don't do that. Instead use something newer like netty and ditch that decades old crappy servlet layer you don't need.

You can also use sbt+onejar or sbt-native-package to make either a single jar runnable or a standard deb/rpm/tar.gz package to run your service.


What the hell is an HCMB? Nothing on DuckDuckGo, and nothing on Urban dictionary. I wish people would stop using acronyms before defining them.


Closest I could think of was a typo on Intercontinental Ballistic Missile, but your comment is on the first page of google results.


Can confirm: googled and landed right here.


Highly Canabalistic Machine Bytecode


Human Controlled Missile Bomber


High Caliber Mobile Battleship


It's Hercules-Cloud Strife-Megaman-Batman. You don't miss with that guy.


> I, for one, am relieved not to have run apt-get install build-essentials on a production box.

Well, whoever does this is Doing It Wrong.


JV-what? I just installed nuget on ubuntu and what is this?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: