The JVM is not that heavy

gregschlom · on Feb 1, 2017

Also worth noting that the JVM itself only weighs a couple of megabytes. The bulk of the size comes from the Java runtime (ie: the "standard libraries"), and there are lots of things that your app may not need there (XML parsing, serialization, etc...)

A couple of years ago I wrote a simple tool (https://github.com/aerofs/openjdk-trim) that allows you to filter out what you don't need. We were able to get the size of OpenJDK from 100MB down to around 10MB.

Note that the work of determining which classes you need is entirely manual. In our case I used strace to check what classes where being loaded.

necubi · on Feb 1, 2017

This is officially supported in JDK 9. It's part of an ongoing project called "jigsaw" which is introducing modules to the JVM. The first step, shipping in 9, is to modularize the JDK [0].

Already today you can build a custom JDK in the early access release.

[0] http://openjdk.java.net/jeps/200

kodablah · on Feb 1, 2017

Not that well. The java.base module is still huge. Also, Zulu has nice pre-packaged JDK9 downloads[0] so you don't have to build.

0 - http://zulu.org/zulu-9-pre-release-downloads/

the8472 · on Feb 2, 2017

I think jlink can strip at the class level, not just the module level.

derefr · on Feb 1, 2017

> I wrote a simple tool that allows you to filter out what you don't need

It'd be a short hop from here to a tool that basically does for JDK-platform apps what Erlang's releases do for the ERTS platform: builds a new JRE (as a portable executable, not an installer) that actually contains the app and its deps in the JRE's stdlib, such that you just end up with a dir containing "a JRE", plus a runtime "boot config" file that tells the JRE what class it should run when you run the JRE executable.

With such a setup, your Java program could actually ship as an executable binary, rather than a jar and an instruction to install Java. Nobody would have to know Java's involved! :)

gregschlom · on Feb 1, 2017

Funny you mention that, that's exactly why we wrote this tool. We were having a ton of support issues with our Windows users having to install Java, incompatible Java versions, needing admin privileges to install the JRE, etc...

We fixed this whole class of issues by doing exactly what you suggest: bundling the JRE and writing our own launcher binary.

mirekrusin · on Feb 2, 2017

Does java license interfere with this kind of setup?

merb · on Feb 3, 2017

not when you bundle the openjdk

yoden · on Feb 1, 2017

https://docs.oracle.com/javase/8/docs/technotes/guides/deplo...

Not only is it a short hop, it already exists :P

andor · on Feb 1, 2017

Nice. Here's the manual page, the relevant flag is "-native":

http://docs.oracle.com/javase/8/docs/technotes/tools/unix/ja...

radicalbyte · on Feb 2, 2017

The problem with visiting that site is that Oracle have started litigating against users of their JVMs.

In that way the JVM can be "heavy".

pjmlp · on Feb 2, 2017

With users that use commercial features without paying for them.

It is quite easy to know which features those are when they require a flag named -XX:+UnlockCommercialFeatures, you just don't use them by mistake.

geodel · on Feb 2, 2017

Oracle gives free license to use commercial features in development[1] in section B. It is very easy to some script for JVM startup parameters and get that script deployed in production by mistake.

1. http://www.oracle.com/technetwork/java/javase/terms/license/...

pjmlp · on Feb 2, 2017

Quite right, but unless I am too cynical I doubt those deployments were actually done by mistake, when I relate how many companies try to "alleviate" their costs.

andor · on Feb 1, 2017

Avian [0] is an embeddable VM in the way you suggest, I haven't used it though.

In the future, we'll hopefully have the Substrate VM [1] for Java and other Truffle-supported languages. It's embeddable and also does reachability analysis to exclude unused library code. For now, it seems to be closed source.

[0] http://readytalk.github.io/avian/

[1] http://lafo.ssw.uni-linz.ac.at/papers/2015_CGO_Graal.pdf

marklgr · on Feb 2, 2017

> what Erlang's releases do for the ERTS platform

That's the difference between an industrial-strength platform like Erlang, and a dev-centric deployment nightmare like Python. Java is normally on the enterprise side of the spectrum, but unfortunately it didn't get deployment right for quite a long time, even though it appears it's getting there lately.

pjmlp · on Feb 2, 2017

It started on Java 8 with adoption of having a packing story as part of the JDK tooling, but I don't know how good it is.

eru · on Feb 2, 2017

Careful there---it's a slippery slope all the way to unikernels.

https://mirage.io/

jjtheblunt · on Feb 1, 2017

Isn't that _exactly_ what GoLang does for its binaries, insofar as shipping them statically linked?

dvlsg · on Feb 1, 2017

dotnetcore has it as an option, as well.

mahmud · on Feb 2, 2017

It's called "treeshaking", and is a classic option in Lisp delivery tools.

pjmlp · on Feb 2, 2017

Or virtual image pruning in Smalltalk tools.

hashhar · on Feb 2, 2017

.NET Core does have this and allows distributing the runtime, your classes as a standalone executable. It is obviously larger than just the class files and dlls but works great for portability.

deepsun · on Feb 1, 2017

Proguard is de-facto standard for Android Java ecosystem, what's the difference OpenJDK-Trim has? Except that Proguard determines used/unused classes automatically, by doing static code analysis, so you need to have proguard config to omit removing stuff that is used dynamically, of course.

gregschlom · on Feb 2, 2017

We were already using Proguard for obfuscation and we tried to use it to reduce the size of the JDK as well. If my memory serves well the main problem was that the results weren't good enough results because Proguard was being conservative and keeping a lot of stuff that we know we didn't need, but static analysis indicated that it could potentially be used.

Also, Proguard's config is pretty complicated and the results are hard to understand. Our approach (openjdk-trim) is dumb simple: unpack the java runtime jars, use rsync to filter out entire directories we don't need, pack it back.

It's a simple, brute-force approach compared to Proguard's advanced static analysis, but in this case it gives better results. Maybe a good example of "worse is better".

ksec · on Feb 2, 2017

That is why i am looking forward and VERY excited for TruffleRuby, SubstrateVM, Graal, along with C Extention.

I think once there is official way to trim JDK down, making deployment super easy and fast ( Single executable File ), Java will pick up stream again.

The only problem and hesitation we have...is Oracle.

ominous · on Feb 2, 2017

Not to sound rude, but the expression you were aiming to use is maybe "pick up steam". Random example: http://idioms.thefreedictionary.com/pick+up+steam

grw_ · on Feb 2, 2017

OT, but this expression seems like a corruption of 'build up steam' and 'pick up speed'. 'Pick up steam' doesn't really make sense in original context- something a steam train would do while stationary and preparing to move :)

aduffy · on Feb 2, 2017

-Xverbose:class JVM arg could also have sufficed in case you don't want to use strace.

jandrese · on Feb 1, 2017

Would it not be possible to scan an app to determine exactly what libraries it could load? I think you can avoid halting problem issues by simply doing a dumb search for the library load routines in the code, at the possible (but not very likely) problem of picking up spurious data in a binary blob section.

necubi · on Feb 1, 2017

There are tools that do this (for example, proguard). The issue is that java can dynamically load code, so static analysis isn't sufficient. Typically for areas where code size matters (like on Android), you use static analysis plus a manual list.

astrodust · on Feb 1, 2017

It would be great to have a runtime for JRuby that's really stripped down to just the necessities. The spin-up time of the jruby environment is painful and it seems predominantly the fault of the JVM's baggage.

geocar · on Feb 1, 2017

I don't know. 10MB still sounds really too big.

Why is it so big? What do we gain?

I don't have a system with 10MB of cache, so I imagine Java can't run faster than memory...

cbsmith · on Feb 2, 2017

Huh? What?

Several points:

* Who said the 10MB is all used at once?

* I don't know your hardware, but there is very, very good chance you are actually quite wrong about <10MB of cache. These days, most magnetic disks have more cache than that, and if you are using SSD's, there's a boatload more cache than that in there.

* If you were referring strictly to CPU cache, then I'm even more confused, because the entire existence of that stuff is predicated on it being faster than memory, so... (and even still, if your total CPU cache isn't 10MB, it likely isn't that much smaller).

* It's not like the whole package would sit in RAM the whole time anyway. By your same assertion, I could say that one of my CPU registers is only 64-bits wide, so I imagine all programs larger than 64-bits can't run faster than L3 cache...

I'm not sure why you'd say it is too big. The article page is 1.4 MB alone... and it still needs to leverage a general purpose runtime/JIT that is orders of magnitude larger to do its single fixed purpose.

geocar · on Feb 2, 2017

> Who said the 10MB is all used at once?

The parent was suggesting that this was all that was actually needed out of the 100mb or so downloadable. If you think the JVM is smaller, how small is it exactly?

> If you were referring strictly to CPU cache, then I'm even more confused, because the entire existence of that stuff is predicated on it being faster than memory, so... (and even still, if your total CPU cache isn't 10MB, it likely isn't that much smaller).

http://www.intel.co.uk/content/www/uk/en/processors/core/cor...

I don't have anything with 10MB cache.

> It's not like the whole package would sit in RAM the whole time anyway. By your same assertion, I could say that one of my CPU registers is only 64-bits wide, so I imagine all programs larger than 64-bits can't run faster than L3 cache...

If you get into L1, you get about 1000x faster.

http://tech.marksblogg.com/benchmarks.html

> I'm not sure why you'd say it is too big.

Maybe I have a different perspective? If a 600kb runtime is 1000x faster, I want to know what I get by being 10x bigger. I'm quite surprised that there are so many responders defending it given that these benchmarks were just on Hacker News a few days ago.

adrianN · on Feb 2, 2017

Unless you linearly scan the whole binary all the time, your CPU makes sure that only the stuff you're currently using is in the cache, so only the data your hot loop is touching.

You could easily see that your assumption is wrong by observing that a typical C application is not 1000 times faster than a typical Java application.

geocar · on Feb 2, 2017

> Unless you linearly scan the whole binary all the time, your CPU makes sure that only the stuff you're currently using is in the cache, so only the data your hot loop is touching.

Cache fills optimize for linear scans, and have nothing to do with eviction.

> You could easily see that your assumption is wrong by observing that a typical C application is not 1000 times faster than a typical Java application.

What assumption are you talking about?

Where do you find your typical applications? Spark is supposed to be one the fastest Java-implementations of a database system, and it's 1000x slower than the fastest C-implementation database systems, but this is clearly a problem limited by memory.

What about problems that are just CPU-bound? C is at least 3x faster than Java for those[1], so just by being "a little bit faster" (if 3x is a "little" faster) then as soon as we introduce latency (like memory, or network, or disk, and so on) this problem magnifies quickly.

[1]: https://benchmarksgame.alioth.debian.org/u64q/compare.php?la...

cbsmith · on Feb 2, 2017

> Spark is supposed to be one the fastest Java-implementations of a database system, and it's 1000x slower than the fastest C-implementation database systems, but this is clearly a problem limited by memory.

Wow.. so much wrong, I'm not sure how to unpack it all.

a) Spark is Scala, not Java, though both do use the JVM, so I'll give you that.

b) Spark is not a database system, though it is a framework for manipulating data

c) Spark is generally considered to be much faster than Hadoop, and does it's job well, but I'm not sure it qualifies as the fastest anything.

d) By any reasonable interpretation, the fastest Java database system is definitely not Spark. You will find that benchmarks of Java database systems generally don't even include Spark (as an example https://github.com/lmdbjava/benchmarks/blob/master/results/2...)

e) Fast is an ambiguous term... usually you are looking at things like latency, throughput, efficiency, etc. I'm not sure which you mean here.

f) If you know anything at all about runtimes, you'd know that if you've found a Java based system that is 1000x slower than a C based system, either your benchmark is extremely specialized, broken, or you are comparing apples & oranges.

Look, Java certainly has some overhead to it, and sometimes it significantly impacts performance. Before you get too excited about attributing it to runtime size, you might want to look at the size of glibc...

geocar · on Feb 3, 2017

> By any reasonable interpretation, the fastest Java database system is definitely not Spark

What database would you recommend for solving the taxi problem using the JVM?

> Spark is Scala, not Java, though both do use the JVM, so I'll give you that.

What does JVM stand for? I was under the impression that we were talking about it's size (10mb v. 100mb).

> You will find that benchmarks of Java database systems generally don't even include Spark

And? What are we talking about here?

> If you know anything at all about runtimes, you'd know that if you've found a Java based system that is 1000x slower than a C based system, either your benchmark is extremely specialized, broken, or you are comparing apples & oranges.

Why?

We're talking about business problems, not about microbenchmarks.

If this is a business problem, and I solve it in 1/1000th the time, for roughly the same cost, then what exactly is your complaint?

> Fast is an ambiguous term... usually you are looking at things like latency, throughput, efficiency, etc. I'm not sure which you mean here.

It's not ambiguous. I'm pointing to the timings for a specific, and realistic business problem.

> Look, Java certainly has some overhead to it, and sometimes it significantly impacts performance. Before you get too excited about attributing it to runtime size, you might want to look at the size of glibc...

Does Java include glibc?

What exactly is your point here?

cbsmith · on Feb 3, 2017

> What database would you recommend for solving the taxi problem using the JVM?

You have me at a disadvantage here... The only taxi problem that comes to mind is a probability problem that I'd not likely use a database for at all...

> If this is a business problem, and I solve it in 1/1000th the time, for roughly the same cost, then what exactly is your complaint?

If you came to the conclusion that your business problem runs 1000x faster because of differences in the runtime... you've made a mistake. It is far more likely your benchmark is flawed, or there are significant differences in the compared solutions beyond just the runtimes.

Seriously, I've spent a career dealing with situations exactly like that: "hey, this is 1000x slower than what we are doing before... can you fix that?". Once you are dealing with optimized runtimes, while there can be important differences between them, there just isn't that much room left for improvement.

> It's not ambiguous. I'm pointing to the timings for a specific, and realistic business problem.

The problem is perhaps not ambiguous to you, but you haven't described it in terribly specific terms. More importantly though, you haven't described what you mean by "faster"? That's the ambiguity.

> Does Java include glibc?

> What exactly is your point here?

C programs do. Lots of very efficient, high performance C programs.

geocar · on Feb 4, 2017

> The only taxi problem that comes to mind

It's the problem that I linked to previously.

http://tech.marksblogg.com/benchmarks.html

Finding good benchmarks is hard: Business problems are a good one because these are the ways experts will solve problems using these tools, and we can discuss the choice of tooling, whether this is the right way to solve the problem, and even what the best tools for this problem is -- in this case, GPU beats CPU, but what's amazing is just how close a CPU-powered solution gets by turning it into a memory-streaming problem (which the GPU needs to do anyway).

> If you came to the conclusion that your business problem runs 1000x faster because of differences in the runtime...

I haven't come to any conclusion.

There are a lot of differences between a JVM-powered business solution and a KDB-powered business solution, however one striking difference is the cache-effect.

However the question remains: What exactly do we get by having a big runtime? That we get to write loops?

cbsmith · on Feb 4, 2017

> what's amazing is just how close a CPU-powered solution gets by turning it into a memory-streaming problem (which the GPU needs to do anyway).

Yes, it turns out the algorithmic approach you use to solve the problem tends to dwarf other factors.

> There are a lot of differences between a JVM-powered business solution and a KDB-powered business solution, however one striking difference is the cache-effect.

Wait, you looked at those benchmarks and came to the conclusion that the language runtimes were the key to the differences?

> However the question remains: What exactly do we get by having a big runtime? That we get to write loops?

There is absolutely no intrinsic value in a big runtime.

Now, one can trivially make a <1KB read-eval-print runtime. So I'll answer your question with a question: why do people not use <1KB runtimes?

geocar · on Feb 9, 2017

> Wait, you looked at those benchmarks and came to the conclusion that the language runtimes were the key to the differences?

At the risk of repeating myself: I don't have any conclusions.

> There is absolutely no intrinsic value in a big runtime.

And yet there is cost. It is unclear if that cost is a factor.

> Now, one can trivially make a <1KB read-eval-print runtime. So I'll answer your question with a question: why do people not use <1KB runtimes?

Because they are not useful.

We are looking at a business problem, think about the ways people can solve that problem, and cross-comparing the tooling used by those different solutions.

Is there really nothing to be gained here?

The memory-central approach clearly wins out so heavily (and the fact we can map-reduce across cores or machines as our problem gets bigger) is a huge advantage in the KDB-powered solution. It's also the obvious implementation for a KDB-powered solution.

Is this Spark-based solution not the typical way Spark is implemented?

Could a 10mb solution do the same if it can't get into L1? Is it worth trying to figure out how to make Spark work correctly if the JVM has a size limit? Is that a size limit?

There are a lot of questions here that require more experiments to answer, but one thing stands out to me: Why bother?

If I've got a faster tool, that encourages the correct approach, why should I bother trying to figure these things out? Or put perhaps more clearly: What do I gain with that 10mb?

That CUDA solution is exciting... There is stuff to think about there.

cbsmith · on Feb 13, 2017

> At the risk of repeating myself: I don't have any conclusions.

For someone who doesn't have any conclusions, you're making a lot of assertions that don't jive with reality.

> And yet there is cost. It is unclear if that cost is a factor.

It's a factor... just not the factor you think it is.

> Because they are not useful.

I think you grokked it.

> The memory-central approach clearly wins out so heavily (and the fact we can map-reduce across cores or machines as our problem gets bigger) is a huge advantage in the KDB-powered solution. It's also the obvious implementation for a KDB-powered solution.

KDB is a great tool, but you are sadly mistaken if you think the trick to its success is the runtime. That its runtime is so small is impressive, and a reflection of its craftsmanship, but it isn't why it is efficient. For most data problems, the runtime is dwarfed by the data, so the efficiency that the runtime organizes and manipulates the data dominates other factors, like the size of the runtime. This should be obvious, as this is a central purpose of a database.

> There are a lot of questions here that require more experiments to answer, but one thing stands out to me: Why bother?

Yes, you almost certainly shouldn't bother.

Spark/Hadoop/etc. are intended for massively distributed compute jobs, where the runtime overhead on an individual machine is comparatively trivial to inefficiencies you might encounter from failing to orchestrate the work efficiently. They're designed to tolerate cheap heterogenous hardware that fails regularly, so they make a lot of trade-offs that hamper getting to anything resembling peak hardware efficiency. You're talking about a runtime fitting in L1, but these are distributed systems that orchestrate work over a network... Your compute might run in L1, but the orchestration sure as heck doesn't. Consequently, they're not terribly efficient for smaller jobs. There is a tendency for people to use them for tasks that are better addressed in other ways. It is unfortunate and frustrating.

Until you are dealing with such a problem, they're actually quite inefficient for the job... but that inefficiency is not a function of JVM.

Measuring the JVM's efficiency with Spark is like measuring C++'s efficiency with Firefox.

> If I've got a faster tool, that encourages the correct approach, why should I bother trying to figure these things out? Or put perhaps more clearly: What do I gain with that 10mb?

If you read the documentation, the gains should be clear. If you are asking the question, likely the gains are irrelevant to your problem. I would, however, caution you to worry less about the runtime size and more about the runtime efficiency. The two are often at best tenuously related.

adrianN · on Feb 2, 2017

If your assumption that a 10MB JVM kills the cache were true, then the alioth benchmarks you have posted wouldn't show a speed difference of ~3. I suggest you learn a bit more about how CPUs work and what benchmarks mean before posting bold claims.

geocar · on Feb 3, 2017

Why not? Those problems fit into cache.

cbsmith · on Feb 7, 2017

Because they are 333x slower than you'd expect.

cbsmith · on Feb 2, 2017

> I don't have anything with 10MB cache.

The link you provided was to three distinct models of i7 processors... all with 8MB of L3 cache. I would argue that 8MB isn't much smaller than 10MB, but I will understand if you disagree. However, even the slowest of those processors also has 1MB of L2 cache and 256KB of L1 cache, not to mention other "cache-like" memory in the form of renamed registers, completion queues, etc. At most, we're talking <800KB shy of 10MB in cache.

> If you get into L1, you get about 1000x faster.

I think you are making my point for me.

> Maybe I have a different perspective? If a 600kb runtime is 1000x faster, I want to know what I get by being 10x bigger.

You are assuming that at all times all of that 10MB must be touched by the processor at once. You can have a 10MB runtime where most of the cycles are being spent on a hotspot <4KB of data.... Having a hot spot that is orders of magnitude smaller than the full runtime is totally unsurprising. It's particularly true when your runtime has a JIT in it. With a JIT, most of the time, the bytes that are being executed aren't part of that 10MB, but rather are generated by it. Are you going to penalize your 600KB runtime for the size of the source code? ;-)

gregschlom · on Feb 2, 2017

10MB for a platform that allows you to run code on all three major operating systems without too much trouble and in a performant way is a huge win, in my opinion. Not many alternatives come close to that.

pjmlp · on Feb 2, 2017

Actually all languages with a rich runtime and standard library that isn't just a thin POSIX layer like C or C++ (although C++ has been improving their library story).

Still your point holds.

geocar · on Feb 2, 2017

Q/KDB is 600kb, also runs code on all three major operating systems (and a few minor ones). It's also about 1000x faster than Java/Spark[1].

1000x slower doesn't sound like a huge win to me; it sounds like a huge cost, so my question is what do we gain by making our programs 1000x slower?

[1]: http://tech.marksblogg.com/benchmarks.html

adrianN · on Feb 2, 2017

That benchmark is literally comparing Apples to Oranges. It's not even the same hardware.

taneq · on Feb 2, 2017

You don't need the entire executable file in cache in order to run the program.

For comparison, a C++ wxWidgets 3.0 application isn't going to be much smaller than 10MB in release mode if you statically link it. Much as I hate to admit it, 10MB just isn't that big in an age of terabyte SSDs and systems with 32GB of RAM.

nv-vn · on Feb 2, 2017

Not that the majority of users have that. Even eschewing those outside of wealthy countries, most users are on mobile devices (laptops, tablets, cell phones). Of those who have desktop PCs, very few have terabyte SSDs and even fewer have 32gb of RAM. For most people, RAM is probably somewhere between 4-8GB.

krallja · on Feb 2, 2017

10MB fits comfortably within 4GB of RAM.

geocar · on Feb 2, 2017

I don't think it's healthy to think of that 4GB module as "RAM".

It's connected to your CPU by a serial communications interface so access is not uniform or timely, and if the CPU needs any of it, it stops what it's doing while it waits.

The "cache ram" (L1 and to a lesser extent L2) actually acts like the RAM that we learn about in Knuth, so that when we discuss algorithms in terms of memory/time costs, this is the number we should be thinking about. Algorithms that are performant on disk/drum are modern solutions for what you're calling "RAM".

hota_mazi · on Feb 2, 2017

> I don't know. 10MB still sounds really too big.

Can't tell if sarcastic or... o_O.

In the unlikely case you're actually serious, you really need to rethink your perception of memory costs in 2017.

geocar · on Feb 2, 2017

No, I'm really quite serious.

KDB[1] is about 1000x faster than Spark[2], and is only about 600kb (and most of that is shared library dynamic linker stuff that makes interfacing with the rest of the OS easier). A big part of why it's fast is because it's small -- once you're inside cache memory everything gets faster.

That's the real cost of memory in 2017. So what did we gain for paying it?

[1]: https://news.ycombinator.com/item?id=13481824

[2]: http://tech.marksblogg.com/billion-nyc-taxi-rides-spark-2-1-...

ipsi · on Feb 2, 2017

You're comparing completely, utterly different results here, and it's really hurting any point you're trying to make.

You're comparing KDB running on 4x Intel Xeon Phi 7210 CPUs, totaling 256 physical CPUs.

Compared to the best result for Java/Spark, which was running on 11x m3.xlarge instances on AWS. That's only 44 CPUs, plus it's running on AWS, not 100% dedicated hardware, so it's tough to tell what sort of an impact the virtualization + EBS has on performance. Plus, from the AWS page: "Each vCPU is a hyperthread of an Intel Xeon core except for T2 and m3.medium", which does not do anything good for the results.

Yes, technically, KDB was 199.80x faster (not 1000!) than Java/Spark, when it was given vastly superior, dedicated hardware without virtualization, and when tackling a problem that the hardware setup is optimized for. Note that the author calls this out by saying "This isn't dissimilar to using graphics cards" when talking about the setup he was using for the KDB benchmarks.

To get a sensible idea of the relative difference in performance, you would have to compare KDB and Java/Spark both running on the Xeon Phis, and/or running both on 11x m3.xlarge AWS instances - and even then, if Java/Spark does poorly on the Xeon Phi test, that might just mean that the Java/Spark developers haven't optimized for that particular setup.

srpeck · on Feb 2, 2017

Have a look at the benchmarks here: http://kparc.com/q4/readme.txt

Also: https://hn.algolia.com/?query=http:%2F%2Fkparc.com%2Fq4%2Fre...

geocar · on Feb 2, 2017

> You're comparing completely, utterly different results here, and it's really hurting any point you're trying to make.

Then argue with the point you think I could be making instead of the point that you think I'm making[1]

[1]: http://philosophy.lander.edu/oriental/charity.html

> you would have to compare KDB and Java/Spark both running on the Xeon Phis, and/or running both on 11x m3.xlarge AWS instances - and even then, if Java/Spark does poorly on the Xeon Phi test...

If Spark can solve the business problem in less real-time in another way, I think that would be worth talking about, but it's my understanding that a bunch of mid/large machines connected to shared storage is the typical Spark deployment, and the hardware costs are similar to the Phi solution.

So my larger question still stands: What is the value in this approach, if it's not faster or cheaper?

tobz · on Feb 2, 2017

If "this approach" is using Java/Spark, instead of something that is a smaller binary, then there are some easy answers to your questions:

- people don't want to write C (or K, or whatever yields a small binary)

- the cost of switching languages is not worth the speed-up

- it's already fast enough

I don't think you're wrong, overall, that, specifically, kdb can be much faster than an equivalently sized Spark cluster, but simply being faster does not invalidate other approaches, which is what you seem to be arguing for.

geocar · on Feb 3, 2017

I'm not arguing for anything: I'm asking what do we get for this cost.

It sounds like you're suggesting we get:

* Not having to write in SQL (note KDB supports SQL92)

Maybe something else? I'm not sure I understand.

guard-of-terra · on Feb 2, 2017

Locales? Timezones? Unicode? There's a lot of stuff that is there to be used from time to time, does not mean it hits your processor cache often.

BTW, libruby-2.3 is 2,5M, just the shared object file, and it tries to use all aforementioned stuff from the underlying UNIX.

ausjke · on Feb 2, 2017

Java8 has profiles already, Java9 will be able to customize further. I'm not a Java guy, but node.js after npm-install the suddenly increased base is hundreds MB easily, I somehow now feel Java/whatever is better organized and manageable. After learning node.js for a product for a few months, I'm actually returning to PHP7, which has nearly identical OOP as Java.

kazinator · on Feb 2, 2017

Couple of megabytes without a run-time is huge. What's hiding in there; a VirtualBox image with a Linux kernel + initrd? Or maybe a high definition splash screen in PNG form?

martingxx · on Feb 1, 2017

I led our teams to switch from Java to Go because of the productivity of development, but then noticed deployment was simpler and faster, memory usage was slashed (for comparable applications), request/response times were much more consistent, startup was practically instant and as a result we started aggressively rewriting Java applications to Go and saw a notable difference in the number of machines we needed to run in AWS.

So in my situation, the JVM is heavier by every single measure listed, and for each by a considerable margin.

brianwawok · on Feb 1, 2017

> aggressively rewriting Java applications to Go and saw a notable difference in the number of machines we needed to run in AWS.

This is the easy trap to fall into though. What if you aggressively rewrote the Java apps from crappy legacy frameworks to well developed Java apps?

A rewrite ALMOST always is faster. So the new language seems faster. Except if you would then rewrite the rewrite back in the original language... you could even still be faster.

Very hard to split apart what is faster because the rewrite got rid of lots of bloat, and what is faster because it is legit faster. Java is legit fast when it is written well. Also very easy to make a bloat fest.

martingxx · on Feb 1, 2017

These apps are mostly microservices and the Java ones are mostly only a year or two old. None of them use things like spring. Some use Dropwizard. Would you consider dropwizard modern? If not, what would you use instead?

nostrademons · on Feb 2, 2017

Take a look at the TechEmpower benchmarks:

https://www.techempower.com/benchmarks/

DropWizard is modern, but it isn't fast. Go and even Node.js are significantly faster. If you want performance, you cut layers out of the stack - check out the numbers for raw servlets or even just straight Jersey annotations in that benchmark. If I were doing JSON-over-HTTP microservices in Java, I'd likely use straight Jersey + Jackson, or if performance was really a problem, Boon over straight servlets.

What framework did your Go rewrite use? The standard libs?

zapov · on Feb 2, 2017

Boon is not that fast. It only appears fast on some poorly constructed benchmarks due to some lazy benchmarketing optimizations.

https://github.com/fabienrenaud/java-json-benchmark

sk5t · on Feb 2, 2017

On first glance the dropwizard test app appears to be doomed to mediocrity via reliance on hibernate.

Call me crazy, but I like my dropwizard with Spring DI for (singleton) resource setup, a micro-ORM to get work done, and HikariCP datasources at runtime.

imtringued · on Feb 2, 2017

What's wrong with hibernate? The only thing I can think of is that you're not using "JOIN FETCH entity.relation" when accessing collections and end up with the N+1 select problem but that is because you're using any ORM incorrectly.

Entity framework has include and active record has includes which do the same thing. The qt ORM also has something similar.

The only ORM I have seen that lacks this critical feature is odb. It doesn't allow setting the fetching strategy on a per query basis. You have to either always use eager loading or lazy loading which basically makes it useless for my purposes.

sk5t · on Feb 3, 2017

Well, for benchmarking the essential framework, which does not mandate any ORM, I would want to use something for data access that takes the question of time spent on type reflection, internal caching, and the like, out of the picture. Hibernate and EMF have their place, but not as part of benchmarking the thing that hosts 'em. Core Dropwizard performance is all about how it uses Jetty, Jackson, and maps requests to resources and operations.

hota_mazi · on Feb 2, 2017

> DropWizard is modern, but it isn't fast. Go and even Node.js are significantly faster.

Any benchmarks to provide in order to support this wild claim?

nostrademons · on Feb 2, 2017

The ones I just linked to above.

lenkite · on Feb 2, 2017

Use vertx if you want lean REST micro-services. I so wish vertx was part of the standard library.

The main advantages that Go has over Java is that the standard library is brilliant - thus obviating the need for folks to create monstrous frameworks (and losing performance) and that Go has better memory utilization because of value types (structs) and because it is AOT compiled. Unfortunately Java JIT as designed by the JVM devs takes a lot of memory.

In raw performance, I would still give the edge to Java over Golang though.

brianwawok · on Feb 1, 2017

A lot more to an app than MCV framework. I realize dropwizard tries to be the everything for the app, but at the core it is a MVC with some bundled libs.

hota_mazi · on Feb 2, 2017

> This is the easy trap to fall into though.

Indeed.

It's a typical honeymoon phase with very little regards to 1-2-5 years in the future. The cost of having picked to Go will be fully apparent then.

shard972 · on Feb 2, 2017

I really want to see a comparison of a language like go to something much more in the functional sphere when it comes to maintainability of a large codebase.

I really feel like that one of the big issues we as programmers want to get a better handle on but there isn't a lot to go off that isn't based off opinions (which can be hard to validate).

fetbaffe · on Feb 1, 2017

Yes, the limitation is rarely the programming language, it is the programmer.

Also, when you do the rewrite you have already solved the domain problem that you did not fully understand when implementing it the first time.

hyperpallium · on Feb 2, 2017

"Plan to throw one away; you will, anyhow." First version to understand the problem, second version to solve it.

But deployment, gc pauses and startup time (jvm vs go) are orthogonal to program quality. I would also expect go to have less memory usage.

> deployment was simpler and faster, memory usage was slashed..., request/response times were much more consistent, startup was practically instant

jpgvm · on Feb 2, 2017

Orthogonal to quality but imperative to velocity.

At the end of the day despite Go's failings it's a good (maybe the best?) language for large projects and teams because it compiles fast, is easy to anyone to run anywhere, tests run quickly, programs execute quickly and there is already good tooling/editor support.

Nothing beats efficient workflow for improving velocity.

fetbaffe · on Feb 2, 2017

I could use the exact same arguments but for PHP.

jpgvm · on Feb 2, 2017

Not quite, I omited here that it is also statically typed and is a future proof language. Mainly because these are properties already shared with Java. However this is not true of PHP.

PHP is a great velocity language, provided you have a small(er) team or are willing to commit to additional controls on how you write your PHP (document types/structure of arguments mainly) to ensure that your PHP code is able to be read quickly by other developers.

Personally I prefer Go here because it enforces good readability by default and therefore scales better with team size.

fetbaffe · on Feb 2, 2017

Readability is not problem in PHP either. Follow PSR-1, PSR-2 and PSR-4 and use a command like tool like codesniffer in your build step to guarantee code standard on each commit (or use Upsource)

And in PHP7.x you have even more type hinting than before and with an IDE like PHPStorm refactoring is a breeze.

And with the release of PHP7, PHP is future proof. The community will continue improve it with the major features, they have shown it. Interest in the language have increased. More RFCs is contributed to the language than before. https://wiki.php.net/rfc

Multiple teams on a large code base is not really a problem in modern PHP. I do it every day. We follow modern design patterns, code reviews, code coverage over 80% of the system (old as new code). New code is probably over 95% coverage. Deploys regularly multiple times every week.

Almost all (>95%) of my problems stem from design decisions made in the past, not the language itself.

I'm not saying that you should not use Go (or Java). Both are fine languages. Use the right tool for the job. If you don't do a realtime stock trading system or some embedded system, but some web stack, I can't really see that the majority of the problems stem from language choice (whatever you choose). It is in the team, the culture, the understanding of the domain. There should be your focus.

Personally, the most two important things I look for in a language/platform is tooling and community.

noir_lord · on Feb 2, 2017

Which is why I still often use PHP.

For my usages its a reasonable language.

nvarsj · on Feb 2, 2017

You're still limited by the JVM technology, regardless of how you write your app - large heap, and big tail latencies (JVM's GC is designed to be throughput optimised, whereas golang is latency optimised).

jerven · on Feb 2, 2017

Just use one of the other JVM's that have GC that are latency optimised such as Zing from Azul, Metronome from IBM or OpenJDK with Shendoah from RedHat.

The power of Java is that there is more than one JVM and that can really save you a lot of money/developer time if the world changes under your ass ;) i.e. had a JVM based graph database, ran it on Hotspot -> big GC pauses, moved to Zing no more pauses. All we needed to do is run a different VM and problem went away (new problem was of course that Zing costs but not much, also now with Shendoah coming for free we could probably have moved to that)

With GO you can't do that yet. If your app is not latency, bound but throughput bound there is no place to switch too other that a rewrite. That flexibility of deployment on JVM tech gives us a lot insurance for no costs, until we need it.

dm3 · on Feb 2, 2017

Actually, the new G1 collector deals very well with latency-sensitive workflows. I'd say it's comparable to Go if you adjust your heap size to the working set. You can try running the benchmarks here - https://gitlab.com/gasche/gc-latency-experiment.

ldev · on Feb 2, 2017

Exactly, that's how some not very bright people were tricked into thinking that Node.js is actually fast.

Zach_the_Lizard · on Feb 2, 2017

And at work, we're now rewriting all those NodeJS services in Go or Java.

We hired some Node maintainer(s) a long time ago, rumor has it, who got us on the Node train.

kazagistar · on Feb 2, 2017

How do you do async in java? While it does have CompletableFutures now, none of the libraries (specifically databard drivers) seem to support it, so I always end up with a blocked thread per request.

tynpeddler · on Feb 2, 2017

Java has had non-blocking IO for some time. https://en.wikipedia.org/wiki/Non-blocking_I/O_(Java)

Unfortunately it seems difficult to use (to me at least), but frameworks like netty are build on top of it to provide incredible performance.

However, the fact that Java provides real threading means that a blocking io is not a performance problem if you use the correct patterns.

kasey_junk · on Feb 1, 2017

I've spent a fair bit of time in both, most recently the last couple of years in go. I think its a very mixed bag and there is no clear winner.

The tooling, especially for runtime operations, are so much superior to the golang options its night and day. I have much more success modeling complex business models in java with its better type system, and for doing low latency work its much easier to do on the jvm due to the availability of better libraries (which may get better in go) and the concurrency options are miles better on the jvm.

Go's stack allocation and gc defaults make for easy management in most of my default cases. The ease of adding http endpoints to things is phenomenal. Being able to write easy cli applications in the same language I write daemons in is great.

All told, I think for simple daemons and cli's I'd go golang, for more complex systems I'd go jvm.

I, personally, think the binary deployment thing is overblown. I've never had any problems deploying jvm applications and the automation to do either seems essentially the same to me.

As for the relative "heaviness" I think golang definitely feels lighter, but that is largely because golang apps do less. Once you start having them do more they start to "feel" just as heavy as java apps (for whatever "feel" means).

* [edit] called golang heavier meant lighter

jshen · on Feb 2, 2017

I have golang website/web app that runs at tens of megabytes per process. A very similar java web app runs in a few hundred megabytes per process.

I also run these in on cloud platforms that auto scale. The golang processes spin up very quickly, the java ones not so much.

In these two respects the JVM is heavy compared to golang for my very common scenarios. The heaviness also causes me to spend more money for the JVM solution.

mike_hearn · on Feb 2, 2017

But what max heap size did you set for the JVM?

I have an app that people were complaining took too much memory. A quick look with VisualVM showed that its actual heap usage when idling was only 50 mb but because we hadn't set any heap size limit, it was reserving hundreds of megs from the OS. The idea is that it can run faster if it does that. The fix was simply to use the -Xmx option to tell it to use less memory and GC more often.

fauigerzigerk · on Feb 2, 2017

The JVM is very inflexible in that respect. If you give it more memory it will keep all of it way beyond the point where it matters for performance. If you give it less memory you need to know exactly how much less you can give it before performance craters.

In other words, JVM deployments need a lot more tuning than Go and they will generally need a lot more memory as well. But you're right, not setting -Xmx at all will make the JVM look worse than it really is.

e3b0c · on Feb 2, 2017

We have similar experiences as well.

  $ ps -eo rss,cmd,user | grep jenkins
  4928228 /usr/bin/java -Djava.awt.he jenkins

  $ ps -eo rss,cmd,user | grep drone
  12940 /drone agent                root
  19924 /drone server               root

We run the two applications in the same machine. Admittedly Jenkins is much feature-rich but we only use its vanilla settings without whatever fancy plugins for a few legacy SVN repos.

P.S. The Drone server and agent are running within docker containers.

lossolo · on Feb 1, 2017

I don't know why you are getting downvoted for sharing your real world experience, lately "fanboyism" on HN is getting out of hand. I have similar experience with one of the services that was ported from Java to Go.

janwillemb · on Feb 2, 2017

"Don't rewrite an application from scratch. There is absolutely no reason to believe that you are going to do a better job than you did the first time." -- Joel on Software, Things You Should Never Do, Part I [1]

[1] https://www.joelonsoftware.com/2000/04/06/things-you-should-...

tonyedgecombe · on Feb 2, 2017

I do like that article but have ignored it several times for good reasons.

pythonaut_16 · on Feb 2, 2017

I think a better interpretation of the title/article is "Don't assume that rewriting from scratch will fix all of your problems"

exabrial · on Feb 2, 2017

These types of arguments cause many intelligent people to headdesk. They're hardly an apples to apples comparison.

Of course "Go was Faster". It's because you started with a clean slate!

jshen · on Feb 2, 2017

That's not it. A fairly small http server in go will run in tens of megabytes. The same thing on the JVM requires a couple hundred megabytes at best. The difference in startup time is roughly the same as well.

mike_hearn · on Feb 2, 2017

Are you sure? I've written small HTTP servers in Java that can happily run with a heap of 30-50mb or less. Runtime overheads add some on top of that, but not much.

I think the perception of Java suffers a lot because it will consume all the RAM on your machine by default if you let it (but not immediately). It's a very poor default because even though there are technical arguments for doing that (goes faster), they aren't well known and people tend to assume "more memory usage == worse design".

There are a lot of myths about the JVM out there. We can see on this thread the idea that it takes 1.5 seconds to start being repeated multiple times, each time someone else points out that it's actually more like tens of milliseconds to start.

hrjet · on Feb 2, 2017

> I've written small HTTP servers in Java that can happily run with a heap of 30-50mb or less. Runtime overheads add some on top of that, but not much.

I second that. I have deployed a medium traffic web-server written in Scala backed by a postgresql DB on 128MB VPS, back in 2009!

> I think the perception of Java suffers a lot because it will consume all the RAM on your machine by default if you let it (but not immediately).

I don't think that is true. The default heap size for Oracle and OpenJDK VMs has been bounded as far as I remember. In fact, I would like it if the VM, by default, allowed the heap size to grow upto available RAM when GC pressure increases, but that doesn't seem to be the case as of now.

Edit: Did you mean non-heap VM arenas grow indefinitely? If so, I am not aware of them.

geodel · on Feb 2, 2017

Must be Java 6 in 2009. Java memory usage increased with new releases to make it perform better. For medium traffic site it would have worked fine because GC would have ample time to clean unused objects.

jshen · on Feb 2, 2017

128mb is a lot compared to go which will often run around 10mb. Was your jvm back then 32mb or 64mb? If it was 32 your memory requirement will be higher on 64.

hrjet · on Feb 2, 2017

128MB was the total RAM in the VPS including OS + nginx + JVM + Postgresql. The heap allocated to the JVM process was about 64MB, but bear in mind that this was an actual application. So, it's hard to do a detailed comparison between JVM and Go without standardising on the application. All that I am claiming is that JVM is in the same ball park.

jshen · on Feb 2, 2017

It's not in the same ballpark. I'll throw some code up when I get a chance.

Edit: do you have a twitter or Reddit account? I'll ping you when I have code examples if you want.

geodel · on Feb 2, 2017

> it will consume all the RAM on your machine by default if you let it

I wonder if Oracle documents are plain wrong for JDK 8 docs for maximum heap size[1]:

"Smaller of 1/4th of the physical memory or 1GB. Before Java SE 5.0, the default maximum heap size was 64MB. You can override this default using the -Xmx command-line option.

Also Oracle has chosen correct defaults because it took Java long time to shed its reputation of being dog slow and if they optimize for memory it will start looking worse in performance.

1. https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc...

jshen · on Feb 2, 2017

Try to do it and shoot me a github link. Trust me, I've tried, but I'm no JVM expert so there could be some magic flag I'm unaware of.

I can get it to start around 30-50mb, but as soon as you hit it with traffic the memory usage jumps up.

mnw21cam · on Feb 2, 2017

It was a couple of seconds on my Cyrix 300MHz-ish CPU back in 1998. I would have expected it to get a little better since then.

mnw21cam · on Feb 2, 2017

I have a little server I wrote in Java. Admittedly it is not a HTTP server, but it quite happily handles a thousand simultaneous connections with a memory limit of 200MB. It's currently sitting around 26MB, but I'm sure some of that would disappear if the VM did a GC.

exabrial · on Feb 3, 2017

Not correct at all. You can run Tomcat in 5mb of ram and it starts in less than 250ms

kodablah · on Feb 1, 2017

As another commenter mentioned, I think this is much more the programmer and less the language. Sure, the language may recommend certain approaches which carry across teams differently, but it still often comes down to the app, not the language. I implemented a rudimentary Java AOT targetting Go and the trimmed-down stdlib grew so big Go took hours compiling it (granted some of that is how I approached OOP and what not).

geodel · on Feb 2, 2017

> I implemented a rudimentary Java AOT targetting Go and the trimmed-down stdlib grew so big Go took hours compiling it

Have you reported to Go devs? Sounds interested use case.

kodablah · on Feb 2, 2017

Yes[0] though I think the issue title is a bit off. They did bring it down from over 7 hours to 30 minutes ish in a recent release, but it's still is too long, too much CPU, and too much mem. They are very reactive of course which is something I can never say about OpenJDK.

0 - https://github.com/golang/go/issues/18602

astrostl · on Feb 2, 2017

> memory usage was slashed

A lot of this has to do with another unmentioned, terrifically annoying property of the JVM: pre-launch min/max heap allocation. Standard operating procedure is to go with the default, and overbump it if your needs exceed it. I can't possibly imagine how many petabytes of memory are unnecessarily assigned to JVMs throughout the world as I type, apps consuming 79MB with a 256MB/512MB heap size.

klodolph · on Feb 2, 2017

I wonder how much of this is due to a couple differences: in Go you can embed structures in others instead of using a pointer, strings are UTF-8, and arrays (slices) are resizeable by default.

(I'm sure a chunk of the difference is due to a better understanding of the program during rewrites.)

andrewvc · on Feb 2, 2017

These sorts of comments are (no offense) worse than useless. Benchmarking is one of the most difficult things to do in software, and anecdotes like this just make things confusing for new engineers and feed the perpetual hype train around newer languages.

Please refrain from making statements like this unless you have a reproducible quantifiable analysis.

If you really wanted to demonstrate the effect you describe you'd need to have the same team rewrite the application twice, once Java->Java, once Java->go, making sure to align the program structure as much as possible (making exceptions to take advantage of lang specific features of course).

If you were to do that, then that would be interesting! No one does that of course because it's expensive and wasteful from a business perspective, but it's the only way to determine anything useful.

nikcub · on Feb 2, 2017

Microsoft seem to have learned a lot from Java in designing their new .NET Core CLR. It has gotten almost everything right:

* a small and fast CLR (JVM)

* a class library that defaults to almost nothing but primitive classes

* proper and standardized version, platform and package management (NuGet)

* open source and MIT license[0]

* a patent promise[1]

* arguably the best dev IDE available (Visual Studio) and one of the best up-and-coming dev text editors (VS Code)

* Native ORM, templating, MVC, web server so there is one way to do things

* open source middleware standard (OWIN)

* they left out, for now, attempting the hard ugly stuff like x-platform GUI

* all platforms are equal citizens, they acquired Xamarin for dev tools and release their own Docker containers.

* it's already getting good distribution (on RedHat) even tho it's only 6 months out from a 1.0 release.

Java may have missed the window for fixing some of these issues in their platform - I feel that if Android were being developed today, they'd almost certainly take .NET Core as the runtime.

I've yet to commit to using .NET Core anywhere, but from what I know about it so far it is impressive.

[0] https://github.com/dotnet/coreclr

[1] https://raw.githubusercontent.com/dotnet/coreclr/master/PATE...

coldpie · on Feb 2, 2017

> * all platforms are equal citizens

This may be true for the Core CLR specifically, but it's not true of real .NET apps that are being built today. The vast, vast majority are strongly tied to the Windows platform, especially because of the lack of a cross-platform GUI like you mention. As a Wine developer, it's a huge pain in our side because we either have to run the entire .NET virtual machine, which is hard, or depend on Mono, which is by design not completely compatible. This results in really souring my opinion of .NET and .NET applications when compared with win32 applications that do tend to work quite well in Wine.

frant-hartm · on Feb 2, 2017

I could more or less agree with most of it apart from

> arguably the best dev IDE available (Visual Studio) and one of the best up-and-coming dev text editors (VS Code)

https://www.jetbrains.com/resharper/documentation/comparison...

Refactoring, Coding assistance, Navigation & search sections being most important.

kdazzle · on Feb 2, 2017

Yeah, if I had a dollar for every time I had to restart Visual Studio in order to get something to work...especially test debugging. But IntelliJ always works perfectly. Must say that I can't wait for Jetbrains' Rider to come out.

btschaegg · on Feb 2, 2017

Yeah, same here. I can't count the amount of times I heard statements like that one (also about Eclipse) and was puzzled. I'm starting to think that people saying this just haven't had the curiosity to really explore the alternatives. That said, even though VS causes me to cringe pretty constantly when I use it, you have to give props to MS for the language integration tools they put together for .NET. Some of the tricks they managed to come up with (like moving the instruction pointer in a method while debugging) is pretty impressive. Unfortunately, every time I get amazed by something like this, either some blatantly stupid behavior of VS destroys the magic again, or it outright crashes. Sigh.

kdevrou · on Feb 2, 2017

I couldn't agree more. I constantly see this claim made about Visual Studio. I find it to be in the way most of the time. It does sound like most of the features that I want are in Resharper; I'll have to try it out.

esond · on Feb 2, 2017

You will never, ever look back.

alimbada · on Feb 2, 2017

Although the last version I used seriously was VS2013, VS on its own is pretty mediocre. With ReSharper though, nothing beats it in my opinion.

On the other hand I've been using Eclipse and IntelliJ for the past year. Eclipse is not even worth talking about but even IntelliJ does not come close to vanilla VS in terms of usability. Again, my opinion.

tonyedgecombe · on Feb 2, 2017

Yes, it will be interesting to see what the JetBrains C# IDE is like when it's released.

EGreg · on Feb 2, 2017

What is better than VS? PHPStorm etc?

alimbada · on Feb 3, 2017

I actually can't name anything better. I was just saying that whilst VS is "the best", the best isn't really that great (without a plugin from Jetbrains)...

geodel · on Feb 2, 2017

> I feel that if Android were being developed today, they'd almost certainly take .NET Core as the runtime.

If it were to develop today as against raising against time(Apple) then Google would have written their own runtime and everything.

simonh · on Feb 2, 2017

But then when Android was initially developed and acquired by Google Apple wasn't in the phone business at all, so the base architecture was already laid down long before the race started.

geodel · on Feb 2, 2017

Apple was getting in phone business at that time. Original Android was nothing like Android user saw when it was released on phone. Google had advanced knowledge about Apple plans as Eric Schmidt was on Apple board at that time.

untoreh · on Feb 2, 2017

isn't dart with flutter pretty much that?

filomeno · on Feb 2, 2017

> Microsoft seem to have learned a lot from Java in designing their new .NET Core CLR

Of course they did. It's not a secret they designed it as a Java clone when the justice ruled they couldn't embrace the original one.

However, they missed something: cross-platformness. So essentially you get a windows only Java platform. That's why not everybody finds it impressive nor are looking forward to commit to using it everywhere (they wouldn't be able, though)

towndrunk · on Feb 2, 2017

> they left out, for now, attempting the hard ugly stuff like x-platform GUI

What is the status of this? Will MS be bringing WPF (XAML) to all platforms?

ManFromUranus · on Feb 2, 2017

IMO Microsoft will never bring WPF to any other platform but windows. Core CLR (and web GUIs) are what will be available on other platforms. I think that WPF will always remain a windows thing. For that matter WPF is not even getting developed much on Windows and is largely left as it is in favor of putting their effort into web technologies and CoreCLR

jhh · on Feb 1, 2017

I agree with many points in this article. That being said, there are dimensions of heaviness not captured in the article as far as I can see:

1. The startup times, not so much of the JVM itself, that just takes 1,5 secs, but the startup time of your application gets higher if you have a lot of classes on the classpath. I guess it's the classpath scanning that takes a lot of time (?).

2. Memory usage of Java objects is quite heavy. See this article: http://www.ibm.com/developerworks/library/j-codetoheap/index...

3. The heavyness of the ecosystem in terms of the magnitude of concepts and tools being used and the enterprisy-ness of libraries.

needusername · on Feb 1, 2017

> 1. The startup times, not so much of the JVM itself, that just takes 1,5 secs

Where do you get these numbers from? On my five year old MacBook Pro with default JVM options parsing a 20 MB file:

real 0m0.248s

user 0m0.325s

sys 0m0.043s

> 2. Memory usage of Java objects is quite heavy.

That's IBMs enterprise VM that uses three word headers. HotSpot is actually better. If you compare that with other "lightweight" programming languages it is really, really light.

ekidd · on Feb 1, 2017

> real 0m0.248s

A quarter of a second to start up the VM, run some code, and exit again is actually pretty steep compared to typical interpreted and compiled languages. Among other things, this means that you can't really call Java executables from a loop in a shell script.

For comparison purposes, both Ruby and Rust will show between "0.00 elapsed" and "0.02 elapsed" for a simple "Hello, world" program on my laptop.

Orangeair · on Feb 1, 2017

That's just Hello World, though. He said his app was parsing a 20 MB file.

To do a fair comparison, with your example, I just compiled and ran Hello World in Java on my machine and got this:

real 0.06 user 0.06 sys 0.01

fnord123 · on Feb 2, 2017

Here you go:

https://bitbucket.org/ewanhiggs/csv-game

asmosoinio · on Feb 1, 2017

The parent did write "parsing a 20 MB file". So not a hello world.

pvdebbe · on Feb 2, 2017

richdougherty · on Feb 1, 2017

I'm not arguing with you, those are genuine problems, but there are a few projects in a pipeline to address a few of these things.

1. Startup time being addressed by precompiling the standard library (or your own library). See "JEP 295: Ahead-of-Time Compilation": http://openjdk.java.net/jeps/295. Also addressed by modularisation of the standard library, "JEP 220: Modular Run-Time Images".

2. Memory usage (and less garbage collection overhead) using value types. See "JEP 169: Value Objects": http://openjdk.java.net/jeps/169.

_nodg · on Feb 1, 2017

> The heavyness of the ecosystem in terms of the magnitude of concepts and tools being used and the enterprisy-ness of libraries.

You don't have to use the enterprisey libraries though. Using Dropwizard, for example, gives you a tight and performant set of libraries that have a fairly minimal learning curve and require relatively little boilerplate.

eikenberry · on Feb 1, 2017

While this is true in practice it can be hard. You don't always have control of what libs you are using and often finding lightweight alternatives to many libraries is hard to impossible. It is better once you get outside Java proper, but nearly all the alternative languages on the JVM tout access to the Java ecosystem as a plus which then brings back in all that pain.

cutler · on Feb 2, 2017

This is one thing I've never understood about Clojure - the Java interop. I actually love Clojure but I close my eyes to the fact that it requires an object-oriented VM to work its magic. Clojure is a functional Lisp based on immutable data structures which is about as far from Java OOP as it gets yet we're encouraged to mix Java objects and classes into our Clojure apps as if nothing matters.

pjmlp · on Feb 2, 2017

Funny, when I code in Clojure there are these things called multi-methods, protocols and multiple dispatch.

I think it was originally designed in a Lisp library called CLOS, which incidentally stands for Common Lisp Object System.

Very nice explained on how to implement OOP in Lisp, in a book called "The Art of the Metaobject Protocol".

Users of Lisp based languages should think twice before criticizing OOP.

kazinator · on Feb 3, 2017

Users of Lisp based languages generally think about six times, on average, before criticizing OOP (each time).

_nodg · on Feb 1, 2017

> You don't always have control of what libs you are using

Well that's true regardless of the language. If you're not making the decisions on the codebase, there can be all kinds of gnarly dependencies and practices that you have to adhere to. I agree that big legacy corps tend to have over cumbersome setups, but hey, at least it's not cobol. My advice is not to work for big legacy corps.

eikenberry · on Feb 2, 2017

But some languages have better cultures/eco-systems than others. Java has one of the worst.

_nodg · on Feb 2, 2017

Far from it IMO. It depends on which subculture you immerse yourself in. If you subscribe to the IBM/Oracle/Red Hat thought leaders, then yes - you'll encounter enterprisey stuff, because they're all targeting legacy corps.

Believe me that I know where you're coming from -- I have a real aversion the big enterprise side of the Java world. There's a lot of interesting development in Java open source though, and it'd be a shame to throw the baby out with the bathwater.

snuxoll · on Feb 1, 2017

I mean, even the "enterprisey" stuff like Spring Boot is more than fast enough. I have a little REST service I just deployed to production today, 5 seconds to start up on my laptop's SSD (unfortunately it took about 50 seconds in production because our SAN is dog-slow for some reason).

bjoernw · on Feb 2, 2017

How is Spring Boot "enterprisey"? It makes modern java programming simpler and more accessible by hiding some of the unnecessary complexity. It enables things like https://jhipster.github.io/ which to me is the Rails equivalent in the java world.

cutler · on Feb 2, 2017

JHipster equivalent to Rails? You have to be joking, surely. I just setup a JHipster site and when I opened it up in IntelliJ it was the same labyrinthine mess I've come to expect from Java frameworks, ie. knee-deep in endless subdirectories and everything abstracted away to the point of incomprehension. Contrast that with the simplicity of Ruby and Rails. Java by its very nature makes it impossible to build simple, easily comprehended frameworks and apps. The trouble is that devs who have spent most of their lives in the Java ecosystem can only think relatively, ie. Java Framework X is simpler than Java Framework Y. Unless they expose themselves to something like Ruby or Clojure they will never experience true simplicity.

flaie · on Feb 2, 2017

This is non-sense, something like Spring boot is easy to comprehend, there's no labyrinthine mess whatsoever. You've got the choice to use other simple frameworks like Spark, or plain libs like Jersey + Jackson. All in all it's still the same, you write your Controller, your services and that's it. Where is that complicated?

And I'm a big fan of Clojure, but it's not because Clojure is cool that Java becomes de-facto a big pile of poo. People have been drilled by so much FUD about Javaland that they simply can't stand to try it correctly without preconceptions.

bjoernw · on Feb 2, 2017

I didn't say equivalent in general. I said equivalent in the Java world where we are more used to having to deal with bloat. Just free yourself from your preconceptions and run ./gradlew . Everything will start up fine, no mess, I promise.

kazagistar · on Feb 2, 2017

> hiding complexity

This makes me shiver in terror.

bjoernw · on Feb 2, 2017

Our job as programmers is to hide complexity and expose it when needed. Nothing to shiver about.

snuxoll · on Feb 2, 2017

Spring Boot hides needing to deploy to an application server along with the extra configuration that entails, it doesn't hide Spring from you.

There's still a ton of "enterprise-grade" shit in Spring, you just aren't forced to use it if you don't want - but it's always there, lurking behind the scenes.

javajosh · on Feb 1, 2017

Startup time on my late 2014 MPBr for Clojure Hello World is indeed 1.29s, which is what the OP was measuring.

   $ time java -jar target/uberjar/clojure.jar
   Hello, World!
   1.29s user 0.08s system 181% cpu 0.755 total

   $ /usr/sbin/system_profiler -detailLevel full 
      Model Name: MacBook Pro
      Model Identifier: MacBookPro11,3
      Processor Name: Intel Core i7
      Processor Speed: 2.3 GHz
      Number of Processors: 1
      Total Number of Cores: 4
      L2 Cache (per Core): 256 KB
      L3 Cache: 6 MB
      Memory: 16 GB

eveningcoffee · on Feb 1, 2017

It is Clojure. It loads an additional runtime by itself. It is unfortunately not usable for CLI applications. Pure Java does the same thing in fraction of time. http://blog.ndk.io/jvm-slow-startup.html

developer2 · on Feb 2, 2017

4. Garbage collection. The fact that Java does not have a refcount collector, that can release memory back to the process's pool as soon as something goes out of scope and is no longer referenced, is horrid. Nearly every major software written in Java goes through the worst kind of struggle wherein users have to assign a 4 GB heap size to run a service that only really needs 500 MB. When fatal Out-Of-Memory crashes are the status quo, something is very very wrong.

matt2000 · on Feb 2, 2017

I'm sorry to call out this comment specifically, but almost everything you said here is not true. Out-Of-Memory crashes are not the status quo. The JVM garbage collector is (generally) a very high performance system that has improved incredibly over the past decade, it's not as simple as saying it's missing reference counting so it's "horrid".

These are the kind of lazy generalization that causes people to make poor technology decisions.

jstimpfle · on Feb 1, 2017

These points are true in my experience (EDIT: 1.5 startup time sounds like too much) and they are enough to debunk the claim "The JVM is not that heavy". I hadn't ever heard anybody considering disk consumption or installation time before, when making that claim.

To add,

4. Garbage collection and lack of value typed records. As far as I know there is currently no way around going full SOA (structures of arrays (of primitive types)) for large data collections.

Object overhead (memory usage) and GC overhead are the reason why only SOA will work (and it's a pain because the language doesn't make it convenient) if you have like >10^7 objects. (That's my personal experience from a 2-month project, and I normally don't use Java).

pjmlp · on Feb 1, 2017

> As far as I know there is currently no way around going full SOA

There are if you use language extensions like Packed Objects on the IBM JVM or Object Layouts on Azul.

So just like C, you have C and then GCC C, clang C, ....

Eventually Java 10 will fix this, but for those that like to live on the edge there are already snapshots available.

needusername · on Feb 2, 2017

> Object Layouts on Azul.

https://objectlayout.github.io/ObjectLayout/ does not save you any headers. It just allows you to control where your objects are in memory and compiler optimizations based on this. It does not help you with memory footprint.

Also I'm not sure if it's really implemented on Zing considering that from the outside the project seems dead.

> Eventually Java 10 will fix this

I would not be so sure. The challenges especially regarding primitive generics are not to be underestimated. See

http://cr.openjdk.java.net/~jrose/values/shady-values.html

pjmlp · on Feb 2, 2017

> It just allows you to control where your objects are in memory and compiler optimizations based on this. It does not help you with memory footprint.

It is already better than what you get on Hotspot.

> The challenges especially regarding primitive generics are not to be underestimated. See

The challenge here is due to how Java designers to build them in first place.

Modula-3 and Eiffel are two examples of languages with proper generics, value types and toolchains that do AOT compilation to native code.

So I am still hopeful.

However, like everything, some challenges are technical and some are political.

Scarbutt · on Feb 1, 2017

1.5 secs for the jvm only seems excessive.

$time java HelloWorld

Hello, World

real 0m0.071s

user 0m0.053s

sys 0m0.020s

That is a linux vm running in a mba (first run).

d_burfoot · on Feb 1, 2017

Did you test that 1.5 second claim yourself? I literally just wrote a HelloWorld and ran it on my MacBook, the total time for the whole program was <0.2 seconds.

jhh · on Feb 1, 2017

I agree that my claim is false. I kind of wrote that from the top of my head.

In any case, the point that I wanted to make in the parent comment was that the JVM startup time itself was basically fine.

I just checked on my Macbook and a HelloWorld class gives me .13 secs real.

monodeldiablo · on Feb 2, 2017

To be fair, 0.2 seconds is still ludicrously long. I can literally say the output of the program in less time than the runtime can.

akvadrako · on Feb 1, 2017

In my mind, 1½ seconds is huge; that essentially rules out any interactive usage. It's even annoying for rapid development cycles. Only low expectations or heavy orchestration can overcome such a startling disadvantage.

kasey_junk · on Feb 1, 2017

It rules out any interactive usage where you are starting and stopping the jvm, like in a command line context.

There are work arounds for this (things that reuse jvms and such) but until that is overcome the jvm is largely not appropriate for cli tools that start/stop.

But for other kinds of interactive programs, things with long running sessions and such, it is pretty easy to a) lower that startup time and b) do things that mitigate it to the user.

derefr · on Feb 1, 2017

I've always thought it'd be nice to build a sort of hybrid between a "ClojureScript for bash", and a Java boot-script + RPC client.

Picture a Clojure macro library just for writing CLI driver programs, where you could call all your Clojure code like normal, and where some of the subcommand-defining methods of the driver program could be annotated with something like "@inline".

The un-annotated subcommands, as a simpler case, would translate into calls to spawn a JRE and feed your ARGV over to it once it gets running. These would be the slow-startup calls, so you'd just use them for the things that need the full "horsepower" of the JVM.

The @inline subcommands, on the other hand, would grab your app, its deps, and the JRE, do a whole-program dead-code-elimination process over them to trim them down to just what that subcommand needs, and then would transpile that whole resulting blob to bash code and shove it into a bash function. (So, something like Emscripten with a different frontend + backend.)

jeremiep · on Feb 1, 2017

That's completely false in the context of a Lisp.

I boot the JVM once and iterate endlessly in the same process. Same for ClojureScript in the browser or node.js. Lisp is by far the most interactive language there is with the fastest iteration times (AFAIK).

1.5 seconds would be huge if you had to constantly restart your application like you do everywhere outside Lisp. Iterating in Clojure is literally instant.

I wrote applications in dozens of languages, and none come remotely close to Clojure's iteration speed or joy of use.

bandrami · on Feb 1, 2017

Lisp is by far the most interactive language there is with the fastest iteration times (AFAIK).

That's Forth. Lisp comes next.

icebraining · on Feb 1, 2017

if you had to constantly restart your application like you do everywhere outside Lisp.

This was probably true in the 80s, but hasn't been in a while. Many languages have this, either built-in or as a tool. In the case of the JVM, there's spring-loaded, which works in Java, Groovy, etc.

eveningcoffee · on Feb 1, 2017

1.5 is huge, except it is completely wrong. JVM startup time is within fraction of the second.

akvadrako · on Feb 1, 2017

You are right, I was just going with what the grandparent said. But I think with normal amounts of class scanning and other overhead, 1.5 seconds becomes the practical normal.

Certainly the JVM startup always feels slow, in my experience.

eveningcoffee · on Feb 1, 2017

Well, if you create a lot of additional objects on startup then it will take some time. JVM startup is still fast. http://blog.ndk.io/jvm-slow-startup.html

akvadrako · on Feb 1, 2017

That links says 1.2 seconds for a hello world! 1.2 microseconds is what I would expect to be called fast.

mickronome · on Feb 1, 2017

1.2 seconds was hello world in clojure, the Java hello world presented the numbers below, so it's mostly clojure that is slow:

$ time java Hello Hello world 0.04user 0.01system 0:00.12elapsed 43%CPU (0avgtext+0avgdata 15436maxresident)k 29672inputs+64outputs (82major+3920minor

While 120ms elapsed is not stellar, it's rarely a problem with how the JVM ecosystem looks.

eveningcoffee · on Feb 1, 2017

Please be kind to reread it.

Startup time of a simple Java application and therefore also whole JVM is 0.4s (in the linked article).

1.2s is for the implementation in Closure that includes its additional quite heavy runtime.

akvadrako · on Feb 1, 2017

I read it fine, but Clojure is an application of the JVM. The fact that a popular interpreted language of the JVM takes 1.2 seconds for hello world is a problem of the JVM itself, or at least it's ecosystem. An interpreted language in C wouldn't take nearly that long.

pjmlp · on Feb 2, 2017

The way an interpreter is implemented and the language it runs on don't have anything to do with each other.

A Clojure interpreter written in C, if written the same way as for Java, it would run just as slow, given the way it is building Clojure every time the application starts.

dkersten · on Feb 1, 2017

That's clojure, not java. JVM startup is much faster.

matsemann · on Feb 1, 2017

That 1.5 seconds is FUD anyway.

LaSombra · on Feb 1, 2017

On my old ThinkPad X201 with HDD and 8GB of RAM, running Fedora 25 and Gnome 3 I can start vanilla WildFly 10.1 in less than 10 seconds, http://imgur.com/a/BCDNP:

    20:43:44,578 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly Full 10.1.0.Final (WildFly Core 2.2.0.Final) started in 6551ms - Started 331 of 577 services (393 services are lazy, passive or on-demand)

Not too shabby IMHO.

voidfunc · on Feb 1, 2017

There's that word again "enterprisey".

linkmotif · on Feb 1, 2017

This interview with Bob Lee is really interesting on this topic: https://www.infoq.com/interviews/lee-java-di.

Apparently, Square was first built out on Ruby with the mindset that the JVM is an old clunker.

Fast-forward a few years they switched to the JVM because it was faster and the language (I know, not related) provided compile-time safety.

nvarsj · on Feb 2, 2017

Almost anything would be better than the Ruby runtime, which is notoriously bad. JVM performs best with largish heaps (>512Mi) - if your services fit into that model, it's a great piece of technology that is very fast.

But I have to agree w/ others that after using golang, where an equivalent web app would run in <50Mi of RAM with far better tail latencies, the memory cost of the JVM feels very large.

bjoernw · on Feb 2, 2017

Same with Twitter. Remember the Fail Whale? Those were the non-jvm days before they ran ruby on the jvm.

twic · on Feb 1, 2017

> Apparently, Square was first built out on Ruby with the mindset that the JVM is an old clunker

Ah yes, this would be the same industry where people lead their teams to switch from Java to Go because they believe it will improve the productivity of development.

jakevn · on Feb 2, 2017

If one doesn't switch technologies to improve the efficacy of their team, why ever switch technologies? Should we simply use the first technologies conceived until the end of time?

Jeaye · on Feb 1, 2017

Java, especially JAR files, can be quite light weight. However, JVM environments, and development with Java and Clojure, can be very heavy _and_ slow.

For Clojure, starting `lein repl`, takes 16 seconds on my 2012 Macbook and 9 seconds on my similarly-aged Dell laptop, both with SSDs and i7 quads.

Regarding memory usage examples, the base memory usage of a Google App Engine instance running the move trivial Hello World Java program takes around 140MB. Given that the default F1 instance has a soft memory limit of 128MB, it becomes clear that the JVM is working against you in both cost effectiveness (the price to spin up new instances as your existing ones are already above the soft limit) and latency (since spinning up instances is slow). Add Clojure on top and the problem certainly doesn't get any better. As an added annoyance, which is specific to App Engine but a result of using the JVM, it's impossible to specify JAVA_OPTS, so any of the -X flags, without switching to the Flexible environment.

As a result of both of the above, choosing Clojure for developing on App Engine, as my specific example, has had the serious downfall of slow development tools and memory issues out of the gate on my instances, causing me to pay more for a beefier instance class. The REPL is really hard to beat, but the combination of JVM and Clojure are the biggest pain in the ass, with this stack.