Vert.x – JVM Polyglot Alternative to Node.js

andrewvc · on May 4, 2012

I'm really excited about this. While node-js is a great project, it still doesn't have the awesome instrumentation and tooling around it the JVM does. Additionally, real threading is damn nice, and the JVM definitely has that.

Combining this with languages like ruby, clojure, and scala seems like a definite win.

Zak · on May 5, 2012

I'm not sure.

Async-by-default doesn't seem like a great model for programming the server-side part of most things that are done over HTTP. Three use cases jump out at me:

1. Javascript doesn't have threads, but you want to write a usable HTTP server in it.

2. Threads are scary.

3. You want to make certain actions asynchronous over HTTP (i.e. client says "start", then server maybe says "done" later).

Now consider Clojure:

1. Clojure has threads (on the JVM, at least) and one typically uses an existing Java web server when a web server is called for. Unlike Javascript when node.js came out, Clojure isn't lacking options for running or writing a web server.

2. Clojure makes a lot of threaded operations pretty non-scary. Its native data structures are all immutable and it has constructs for concurrent state. These are not any harder to work with than the async/callback model. I find it more natural, but I'm used to Clojure so that could be bias.

3. Async-when-desired is already easy in Clojure. Futures provide a very easy way to do stuff in a thread pool without blocking. Agents provide the same thing for state changes. It really is as easy as (future do-blocking-thing) and (send-off an-agent do-blocking-thing some-args).

I can imagine why I might want this sort of thing in certain languages in addition to Javascript, but why would I want it in Clojure?

Uchikoma · on May 4, 2012

Finagle works fine. Thank you. Love it. But it does not take off. Why?

What all the JVM Node.js clones are missing and what Node.js sets apart are async libraries. There is no async (MySQL) JDBC driver for starters. If your IO drivers are not async, your async container is not very useful in real life.

kyleburton · on May 4, 2012

This. The biggest difference is in the two cultures: blocking is anathema to the Node.js community - they will literally reject libraries or code that blocks because it destroys the entire model; the JVM community does not value non-blocking code - most of the core (JDBC, Networking in general, File system operations) is all written in a blocking style - the JVM community accepts this with the implicit assumption that threads will help assuage those issues.

Python, Ruby and Perl all have the same cultural tolerance for blocking code. The Node.js community has a complete lack of tolerance for blocking code.

I work with the JVM every day (Clojure) and wish it was different wrt the common use of non-blocking code, but it's going to be a long road to get there on the JVM.

Kyle

pron · on May 4, 2012

Java Executors combined with Guava's ListenableFutures easily turn any blocking operation to an asynchronous one.

Netty's entire model is asynchronous, and Java 7 now has AsynchronousChannels for IO which, I assume, Netty will make use of.

All in all, the JVM has a much more solid and performant foundation than anything Node can provide. The whole difference will come down to a programming style preference. I am not entirely sure why Vert.x adopted the Node style rather than the proven servlet container, as I'm sure both styles provide comparable performance. I guess each may shine under different loads/usage patterns (my guess is that Vert.x/Node can squeeze more performance from a single thread, but servlets are more scalable).

Uchikoma · on May 4, 2012

There is no async MySDQL JDBC driver. If you encapsulate it in an async layer, you need to keep a thread for the connection.

pron · on May 4, 2012

Is that supposed to be bad? The programming style will be the same. If threads are done right, and the JVM can manage their affinity well (especially on NUMA architectures), it's best to use them and pass a relatively small amount of data between them, then they can provide much better performance than accessing the same large piece of RAM from many threads (that's what happens if you simply replicate a single event-loop thread with asynchronous IO).

tensor · on May 4, 2012

Maybe I am missing something, but how can you possibly have an async SQL driver without threads like this? This sounds like a case of your Node.js database driver hiding the exact same behaviour described here within C code.

purplefox · on May 4, 2012

If the wire protocol for the driver is published, then you can write a 100% async driver for it. I.e. no threads blocking, ever. In fact, I already did this for redis and vert.x (I will dig out the code for this some time).

If you are dealing with something where you don't know what the wire protocol is and you just have a blocking client library to play with (e.g. JDBC - JDBC is, by definition blocking - see the JDBC API), then you can't do much but to wrap the blocking api in an async facade and limit the number of threads that block at any one time. This is exactly what we do in vert.x. We accept the fact that many libraries in the Java world are blocking (e.g. JDBC) so we allow you to use them by running them on as a worker. This is one area where we differ from node.js. Node.js makes you run everything on an event loop. This is just silly for some things, e.g. long running computations (remember the Fibonacci number affair?), or calling blocking apis. With vert.x you run "event-loopy" things on the event loop but you can run "non event-loopy" things on a worker. It's a hybrid.

Uchikoma · on May 5, 2012

A limited number of threads will not scale as real async wake-on-data connections will scale. If demand is higher than your thread pool, for the use case that you're web response builds on async backend requests, your site will be down.

d503 · on May 4, 2012

PostgresSQL's libpq supports nonblocking asynchronous operation, and node-postgres takes advantage of that.

http://www.postgresql.org/docs/9.1/static/libpq-async.html

https://github.com/brianc/node-postgres/blob/master/src/bind...

(notice the Connect method on line 325 of binding.cc)

At some level a client-server database driver isn't all that different from any other network client; you send a request over a socket and wait for a result. There's no reason you have to block while waiting.

Moreover some databases (like Postgres) let you receive asynchronous notifications signaled by transactions on other connections; that's how trigger-based replication systems like Bucardo do their thing.

http://www.postgresql.org/docs/9.1/static/sql-notify.html

kodablah · on May 4, 2012

Because it would be based on asynchronous socket responses. So you wouldn't iterate like you currently do w/ a ResultSet but rather have a simple "RowHandler" or sorts. However you still run into the trouble you do w/ node if you decide to do a lot of blocking work in there instead of just sending the row to some ExecutorService thread to get worked on.

batista · on May 4, 2012

Yes, but the lib can do it for you, and you won't have to know a thing.

Locke1689 · on May 4, 2012

What are you talking about? libevent invented everything Node.js uses and Python's Twisted had and has everything Node.js could dream of.

Node.js is just a reinvention of old technologies in Javascript.

Edit: Removed flame about Javascript because I don't want to have this debate again.

kyleburton · on May 4, 2012

I am talking about the cultures surrounding these languages and frameworks. Node's community rejects blocking libraries. Java's does not. I've used the non-blocking frameworks in Perl (POE and my own), C (select, and some of the poll variants), Ruby (event machine) and they are fine if you can avoid blocking libraries -- in these communities it is generally acceptable to write blocking libraries. I don't see it as a technical hurdle, I see it as a cultural one.

Locke1689 · on May 4, 2012

You need to start backing up your claims with actual data. What networking libraries are blocking in Node.js that are not blocking in Twisted? Moreover, what can't you do with a Twisted Deferred that you can do in Node.js?

There are two main things that block in computing:

* I/O

* CPU

You better believe that Node blocks on CPU, so what I/O does Node not block on that Twisted does?

batista · on May 4, 2012

I am talking about the cultures surrounding these languages and frameworks. Node's community rejects blocking libraries. Java's does not.

So what?

You get tons of Java libs to use, a majority of blocking ones and lots of non blocking on one hand, or you restrict yourself to the fewer non-blocking libs of varying quality available for Node.

With Java you can also turn blocking libs to non blocking with a wrapper and threads, whereas with Node.js if it's blocking you're screwing, because the js engine is single-threaded.

ricardobeat · on May 4, 2012

Node is nearing 10k published libraries. Is there a comprehensive site listing Java libraries?

It's trivial to offload blocking operations to other processes in node too, it's just not the preferred option.

moe · on May 4, 2012

Node is nearing 10k published libraries.

And ruby has 38237 gems. 99,9% of which are garbage.

Library-count is a terrible metric.

juiceandjuice · on May 4, 2012

Have you ever programmed in Java? That's a strange question to ask if you have.

http://mvnrepository.com/

Whatever libraries node has is a drop in the bucket compared to java.

ricardobeat · on May 6, 2012

I haven't, thats why I asked. Node is just 2 years old. You'll get 90% of garbage on any open package manager.

soc88 · on May 4, 2012

You make it sound like "rejecting blocking libraries" was some noble principle. Due to the severe limitations of JavaScript, there is no other choice.

Uchikoma · on May 4, 2012

We run a major site on Java, had some thread trouble years ago in the very beginning, works very well now when tuned. Threads work.

BUT: I assume people will move to backend services with REST and combining REST backend results to a page. This increases IO a lot and will kill your latency and default thread models when you do sync code. You'd need to use async IO, composeable futures to manage latency and thread count. And if you do async backend REST, why not do async JDBC etc. But there are no libraries.

batista · on May 4, 2012

>This. The biggest difference is in the two cultures: blocking is anathema to the Node.js community - they will literally reject libraries or code that blocks because it destroys the entire model

Really? So they reject any kind of library that does anything except call a callback? Because everything else, from calculating 2+2 to creating a template blocks. And it doesn't matter when it happens, when it happens it blocks.

spullara · on May 4, 2012

Twitter is sponsoring a Summer of Code developer to create an async library for MySQL: http://engineering.twitter.com/2012/05/summer-of-code-at-twi...

Uchikoma · on May 5, 2012

This can't be voted high enough.

sehugg · on May 4, 2012

Your async container still can be useful, since you can have worker threads executing blocking operations. This works fine as long as your JDBC queries are fast enough to feed the event loop without creating tons of threads.

What async containers like Netty (and by proxy, Vert.x) solve is high-latency cases like long polling and file upload/download. Netty for one does have support for zero-copy file transfers: http://docs.jboss.org/netty/3.2/xref/org/jboss/netty/example...

In my experience, these are a small percentage of cases so the majority of the app can use blocking I/O -- even long-polling code, as long as you are using the async framework for the client connection.

andrewvc · on May 4, 2012

So, I googled 'finagle java' and all the results were terrible. There was no concise description with a short example. Nothing led me to believe 'node.js' clone in anything I saw in the first few results. In fact, the first result was this: http://engineering.twitter.com/2011/08/finagle-protocol-agno...

"Protocol-agnostic RPC System" does not sound like a node clone at all.

Uchikoma · on May 4, 2012

(as a side note, what do you think REST PUT calls over HTTP on Node are?)

Finagle is about composing async IO calls, e.g. here

http://twitter.github.com/scala_school/finagle.html

This works much better in Scala, as Java is not very good at composing Futures. It needs a lot of boiler plate code as it lacks noise free closures.

tlrobinson · on May 4, 2012

There was no MySQL driver for Node when it was first created. You have to start somewhere.

batista · on May 4, 2012

>What all the JVM Node.js clones are missing and what Node.js sets apart are async libraries. There is no async (MySQL) JDBC driver for starters.

Would you call Node.js's MySQL drivers "production quality"?

ricardobeat · on May 4, 2012

Being non-existent and being production quality or not are very different things. BTW there are hundreds of node apps in production.

batista · on May 4, 2012

>Being non-existent and being production quality or not are very different things.

As far production is concerned, they might as well be the same thing.

>BTW there are hundreds of node apps in production.

Do they use some of Node's MySQL drivers though? For they (or their lack thereof) was the topic of this subthread.

ricardobeat · on May 4, 2012

The topic was the non-existence of drivers in other platforms, you brought quality up. Node itself isn't 1.0 yet. I don't see how not having anything is better than having something in development.

transloadit.com, from the module mantainers, has been using it for an year+. I don't have any magic insight into what modules sites use, but judging from the activity in their repo it's quite popular (~1000 watchers, 100+ forks): https://github.com/felixge/node-mysql

scubaguy · on May 4, 2012

Finagle looks like it is a bit lower level framework compared to vert.x. While Finagle aims to be a framework that supports services using multiple protocols, vert.x appears to be much more tailored for HTTP web services.

gcampbell · on May 4, 2012

finagle-http (https://github.com/twitter/finagle/tree/master/finagle-http) provides pretty much everything you need to build an HTTP web service.

scubaguy · on May 4, 2012

But take a look at the Finagle example from Heroku and compare it to the example from vert.x. There's a lot of boilerplate in the Finagle version because Finagle is a general purpose async service framework, which was my entire point.

jhspaybar · on May 4, 2012

Maybe I'm one of the weird ones, but I absolutely love types and would probably write everything in JS if it had typing similar to Java or C++. As it is, I'm using Java on Jetty instead for my current web application but would love to see a really solid event Node.js style typed framework. With that said, at this point I'm not sure I'd give up my Servlets, frameworks(like CometD) for doing WebSockets, and the other niceties that a true servlet container gives me. I can't wait to see where this goes though!

stcredzero · on May 4, 2012

There's only two kinds of weirdos in this world. Those who love types, and those who don't.

Animus7 · on May 4, 2012

Impressive. It's basically a kinda-sorta-rewrite of Node.js APIs on the JVM.

Looking through the docs, the main difference I see is that this is opting for a comparatively heavy-core approach which contrasts with Node's ruthless minimalism + third-party modules.

For example:

-file system access is convoluted with HTTP handling: req.response.sendFile()

-pieces of web framework functionality by default, but no full solution (RouteMatcher)

-integrated WebSockets with a novel but unconventional accept/reject API

-heavyweight SockJS integration

It will be interesting how all of his plays out. And I'm definitely interested in hearing evidence for claims such as

> a run-time with real concurrency and unrivalled performance

stephen · on May 4, 2012

> -file system access is convoluted with HTTP handling: req.response.sendFile()

I haven't looked in to it, but just guessing this is probably so that it can do 0-copy sending of files (e.g. doesn't have to buffer/stream the contents through the JVM).

purplefox · on May 4, 2012

That's right. If you use the sendFile() method and you're on an OS that supports it, then the kernel will do the copying directly from file to socket for you.

You can also serve files in the more conventional "node.js-style" way (i.e. pump the buffers manually from file to socket) if you like. It's just slower than getting the kernel to do the work for you.

dap · on May 4, 2012

But you have to dedicate a thread to sendFile() (by nature).

jbooth · on May 4, 2012

Not really, typically you'd have one thread handling all sendFiles through a single selector over the destination sockets. As a matter of fact, it's actually really painful to do sendFile in Java from a single thread, because when the channel isn't ready for sending, rather than returning EAGAIN and letting you busy-loop or wait/retry or whatever, it throws an exception. So you have to use a selector to do sendfile, and in that case, why not use multiple tasks with the same selector?

dap · on May 4, 2012

(Response to jbooth, but for some reason I can't reply to that directly.)

The whole point of sendfile is to make one system call to send all the data in one stream to another, which in general may block. If you're polling and sending only small chunks at a time (whatever you can write without blocking), is it really that much of an advantage over read/write on the same poll? (If you're not doing that, then you have to block, and you have to dedicate a thread to it.)

jbooth · on May 4, 2012

If the socket you're writing to has been set to nonblocking, then sendfile exhibits the behavior I described, sending EAGAIN sometimes (check man sendfile). This means typically you want to put a selector in front of it and poll the selector, then send to any sockets that are writable, loop back and poll again.

It's still an advantage over read/write because you're getting the 0-copy behavior.

bascule · on May 4, 2012

"real concurrency" is a silly term, but I assume he means threads and therefore a multicore concurrency model vis-a-vis thread-level parallelism, allowing a single VM to utilize all cores in the system.

This is opposed to Node, which must run at least one VM for each CPU in order to utilize all of the cores in a system, unless your only use of threads is ThreadPoolExecutor-style pools, then you can use the horrible hack that is threads-a-gogo

purplefox · on May 4, 2012

Yes, I meant threads ;) E.g. A web server using node.js on a 32 core server. You would have to manually manage 32 instances of node, and use a load balancer or the cluster module in order to route requests to the instances. With vert.x you just start one instance and from the command line you tell it how many instances to start. It then scales over your cores, no glue code or cluster module to write. (There's an example of this on the front page of the website).

mbq · on May 5, 2012

Having VM per core may be quite beneficial -- you get more fault tolerance, immunity to GC glitches and one tier less when scaling over several machines. And there are nice tools to manage multiple processes.

abeatnik · on May 4, 2012

The install pre-requisites recommends that Windows users install a linux VM - but I found the beta 11 version works directly.

purplefox · on May 4, 2012

Yes, vert.x should work directly on Windows. I shall update the wiki accordingly :)

sausagefeet · on May 4, 2012

Frustrating to see a library like this be called "Next generation" when the code structure is a step backwards as far as I can tell. We have had green threads, and more than that green threads that can multiplex over multiple cores for a long time. Let's move on, people.

dap · on May 4, 2012

From the article:

> InfoQ: What about running a real-time app on the JVM vs. on Node.js, with respect to debugging, monitoring and operations?

> Answer: I'd say monitoring and operations are really the concern of the environment in which you deploy vert.x than vert.x itself. e.g. if you deployed vert.x in a cloud, the cloud provider would probably provide monitoring for you.

This makes it sound like a toy. How can I deploy something to production when I have no way of seeing what it's doing? How is a cloud provider supposed to provide debugging/introspection for JavaScript running on the JVM (by means of a brand new facility)?

treenyc · on May 4, 2012

also look at http://ringojs.org

qznc · on May 4, 2012

also look at http://vibed.org/

loftsy · on May 4, 2012

also look at Apache AWF http://incubator.apache.org/awf/

sehugg · on May 4, 2012

Looks like a user-friendly interface to Netty and Hazelcast with some special sauce sprinkled in. I love Netty, and Hazelcast is, er, interesting and hopefully getting more reliable. Should be fun.

ww520 · on May 4, 2012

Netty and Hazelcast are amazing pieces of software. I've used them with good success. Hazelcast is another unique enabler that makes the impossible or difficult to do trivial.

dotborg · on May 5, 2012

This is not anything new, this funcionality is part of Apache Cocoon since almost 10 years and is called Flowscript: http://cocoon.apache.org/2.1/userdocs/flow/api.html

You can use and create Java objects from your JavaScript.

emblemparade · on May 5, 2012

There are other alternatives for polyglot JVM goodness, for example this based on Restlet:

http://threecrickets.com/prudence/

Plus a whole framework based around MongoDB:

http://threecrickets.com/savory/

pan69 · on May 4, 2012

Trying to install this on Ubuntu. Is it me or is the "binary" download link [1] not really a binary download link?

[1] https://github.com/purplefox/vert.x/downloads

purplefox · on May 4, 2012

Works fine here:

https://gist.github.com/2594983

Are you sure you're not looking at the github tags, rather than the downloads?

pan69 · on May 4, 2012

OK. I think I know what I did wrong. On the Github (download link on the website) I click "Download .tar.gz" assuming I was getting the latest version. I might got the latest version but it seems to be the latest of source. I should have chosen one of the packages.

mattgreenrocks · on May 4, 2012

I was hoping it provided an alternative ideology to node.js in the form of fibers.

purplefox · on May 4, 2012

Fibers (or equivalent constructs) aren't supported by all the languages that Vert.x supports (e.g. Java) so we can't really support something like that until we can do it in all the langs.

I know Fibers/Green threads are all the rage right now, and it is certainly something to keep an eye on, but I am not entirely convinced that roll your own threading is going to be any more performant than what the kernel can do.

If we can find a way of implementing fibers efficiently, that supports millions of fibers on a single JVM instance, I would be interested.

almost · on May 4, 2012

    req.response.sendFile('webroot/' + file);

!!!!

I'll have file='../../secret.txt' thanks

purplefox · on May 4, 2012

Yes, of course in a real web server you'd make sure you do the checks ;) The documentation actually mentions this point explicitly :)

almost · on May 4, 2012

I'm pretty sure I don't want to use a web server by people who think a 5 line demo that gives unrestricted access to the hosts file system is the best way to show off it's capabilities. Sorry, but that's just stupid.

purplefox · on May 4, 2012

http://vertx.io/core_manual_js.html#serving-files-directly-d...

almost · on May 4, 2012

I'm unsure what you're trying to say there. Yes, I am aware that it's possible to serve files without exposing your whole file system. Did you think that was something that might be in doubt?

anuraj · on May 4, 2012

This is most welcome. Hope they continue to support this effort. We need more light weight approaches in established languages like Java.

salimmadjd · on May 5, 2012

I went through their tutorial and it seems very enticing. hopefully it'll get some real traction and broader support.

wiradikusuma · on May 4, 2012

i wonder how is it compared to servlet container (e.g. tomcay and servlet spec?

hanswesterbeek · on May 4, 2012

it ignores those, for good reason

ExpiredLink · on May 4, 2012

which is?

tsewlliw · on May 4, 2012

Every concurrent request gets its own thread, whereas this seems to be one thread per core.

Also, the servlet API is crazy, partly because of all its baggage. There seems to be a lot of exploration around what the right API is right now, but its pretty clear the world wants something new.

pron · on May 4, 2012

> Every concurrent request gets its own thread, whereas this seems to be one thread per core.

No. Servlet containers use a thread pool (that grows and shrinks dynamically).

pivo · on May 4, 2012

There's really no conflict between the statement, "every request gets it's own thread" and the use of thread pools.

The point is that a given request in a servlet container is handled in it's own thread. That thread will probably come from a pool and be reused to handle another request of course, but that's sort of immaterial.

lucian1900 · on May 4, 2012

So it's more like Twisted (a library) than Node (a runtime + a library).

tantalor · on May 4, 2012

Node is more like a library than a runtime. Its runtime is V8. The node libs could be ported to another JavaScript engine such as jsc, narwhal, or ringo.

> Node.js consists of Google's V8 JavaScript engine plus several built-in libraries.

http://en.wikipedia.org/wiki/Nodejs

lucian1900 · on May 6, 2012

Node ships with both runtime and async library, unlike Twisted, which doesn't ship Python.