Hacker News new | past | comments | ask | show | jobs | submit login
Vert.x – JVM Polyglot Alternative to Node.js (infoq.com)
162 points by lnmx on May 4, 2012 | hide | past | favorite | 81 comments



I'm really excited about this. While node-js is a great project, it still doesn't have the awesome instrumentation and tooling around it the JVM does. Additionally, real threading is damn nice, and the JVM definitely has that.

Combining this with languages like ruby, clojure, and scala seems like a definite win.


I'm not sure.

Async-by-default doesn't seem like a great model for programming the server-side part of most things that are done over HTTP. Three use cases jump out at me:

1. Javascript doesn't have threads, but you want to write a usable HTTP server in it.

2. Threads are scary.

3. You want to make certain actions asynchronous over HTTP (i.e. client says "start", then server maybe says "done" later).

Now consider Clojure:

1. Clojure has threads (on the JVM, at least) and one typically uses an existing Java web server when a web server is called for. Unlike Javascript when node.js came out, Clojure isn't lacking options for running or writing a web server.

2. Clojure makes a lot of threaded operations pretty non-scary. Its native data structures are all immutable and it has constructs for concurrent state. These are not any harder to work with than the async/callback model. I find it more natural, but I'm used to Clojure so that could be bias.

3. Async-when-desired is already easy in Clojure. Futures provide a very easy way to do stuff in a thread pool without blocking. Agents provide the same thing for state changes. It really is as easy as (future do-blocking-thing) and (send-off an-agent do-blocking-thing some-args).

I can imagine why I might want this sort of thing in certain languages in addition to Javascript, but why would I want it in Clojure?


Finagle works fine. Thank you. Love it. But it does not take off. Why?

What all the JVM Node.js clones are missing and what Node.js sets apart are async libraries. There is no async (MySQL) JDBC driver for starters. If your IO drivers are not async, your async container is not very useful in real life.


This. The biggest difference is in the two cultures: blocking is anathema to the Node.js community - they will literally reject libraries or code that blocks because it destroys the entire model; the JVM community does not value non-blocking code - most of the core (JDBC, Networking in general, File system operations) is all written in a blocking style - the JVM community accepts this with the implicit assumption that threads will help assuage those issues.

Python, Ruby and Perl all have the same cultural tolerance for blocking code. The Node.js community has a complete lack of tolerance for blocking code.

I work with the JVM every day (Clojure) and wish it was different wrt the common use of non-blocking code, but it's going to be a long road to get there on the JVM.

Kyle


Java Executors combined with Guava's ListenableFutures easily turn any blocking operation to an asynchronous one.

Netty's entire model is asynchronous, and Java 7 now has AsynchronousChannels for IO which, I assume, Netty will make use of.

All in all, the JVM has a much more solid and performant foundation than anything Node can provide. The whole difference will come down to a programming style preference. I am not entirely sure why Vert.x adopted the Node style rather than the proven servlet container, as I'm sure both styles provide comparable performance. I guess each may shine under different loads/usage patterns (my guess is that Vert.x/Node can squeeze more performance from a single thread, but servlets are more scalable).


There is no async MySDQL JDBC driver. If you encapsulate it in an async layer, you need to keep a thread for the connection.


Is that supposed to be bad? The programming style will be the same. If threads are done right, and the JVM can manage their affinity well (especially on NUMA architectures), it's best to use them and pass a relatively small amount of data between them, then they can provide much better performance than accessing the same large piece of RAM from many threads (that's what happens if you simply replicate a single event-loop thread with asynchronous IO).


Maybe I am missing something, but how can you possibly have an async SQL driver without threads like this? This sounds like a case of your Node.js database driver hiding the exact same behaviour described here within C code.


If the wire protocol for the driver is published, then you can write a 100% async driver for it. I.e. no threads blocking, ever. In fact, I already did this for redis and vert.x (I will dig out the code for this some time).

If you are dealing with something where you don't know what the wire protocol is and you just have a blocking client library to play with (e.g. JDBC - JDBC is, by definition blocking - see the JDBC API), then you can't do much but to wrap the blocking api in an async facade and limit the number of threads that block at any one time. This is exactly what we do in vert.x. We accept the fact that many libraries in the Java world are blocking (e.g. JDBC) so we allow you to use them by running them on as a worker. This is one area where we differ from node.js. Node.js makes you run everything on an event loop. This is just silly for some things, e.g. long running computations (remember the Fibonacci number affair?), or calling blocking apis. With vert.x you run "event-loopy" things on the event loop but you can run "non event-loopy" things on a worker. It's a hybrid.


A limited number of threads will not scale as real async wake-on-data connections will scale. If demand is higher than your thread pool, for the use case that you're web response builds on async backend requests, your site will be down.


PostgresSQL's libpq supports nonblocking asynchronous operation, and node-postgres takes advantage of that.

http://www.postgresql.org/docs/9.1/static/libpq-async.html

https://github.com/brianc/node-postgres/blob/master/src/bind...

(notice the Connect method on line 325 of binding.cc)

At some level a client-server database driver isn't all that different from any other network client; you send a request over a socket and wait for a result. There's no reason you have to block while waiting.

Moreover some databases (like Postgres) let you receive asynchronous notifications signaled by transactions on other connections; that's how trigger-based replication systems like Bucardo do their thing.

http://www.postgresql.org/docs/9.1/static/sql-notify.html


Because it would be based on asynchronous socket responses. So you wouldn't iterate like you currently do w/ a ResultSet but rather have a simple "RowHandler" or sorts. However you still run into the trouble you do w/ node if you decide to do a lot of blocking work in there instead of just sending the row to some ExecutorService thread to get worked on.


Yes, but the lib can do it for you, and you won't have to know a thing.


What are you talking about? libevent invented everything Node.js uses and Python's Twisted had and has everything Node.js could dream of.

Node.js is just a reinvention of old technologies in Javascript.

Edit: Removed flame about Javascript because I don't want to have this debate again.


I am talking about the cultures surrounding these languages and frameworks. Node's community rejects blocking libraries. Java's does not. I've used the non-blocking frameworks in Perl (POE and my own), C (select, and some of the poll variants), Ruby (event machine) and they are fine if you can avoid blocking libraries -- in these communities it is generally acceptable to write blocking libraries. I don't see it as a technical hurdle, I see it as a cultural one.


You need to start backing up your claims with actual data. What networking libraries are blocking in Node.js that are not blocking in Twisted? Moreover, what can't you do with a Twisted Deferred that you can do in Node.js?

There are two main things that block in computing:

* I/O

* CPU

You better believe that Node blocks on CPU, so what I/O does Node not block on that Twisted does?


I am talking about the cultures surrounding these languages and frameworks. Node's community rejects blocking libraries. Java's does not.

So what?

You get tons of Java libs to use, a majority of blocking ones and lots of non blocking on one hand, or you restrict yourself to the fewer non-blocking libs of varying quality available for Node.

With Java you can also turn blocking libs to non blocking with a wrapper and threads, whereas with Node.js if it's blocking you're screwing, because the js engine is single-threaded.


Node is nearing 10k published libraries. Is there a comprehensive site listing Java libraries?

It's trivial to offload blocking operations to other processes in node too, it's just not the preferred option.


Node is nearing 10k published libraries.

And ruby has 38237 gems. 99,9% of which are garbage.

Library-count is a terrible metric.


Have you ever programmed in Java? That's a strange question to ask if you have.

http://mvnrepository.com/

Whatever libraries node has is a drop in the bucket compared to java.


I haven't, thats why I asked. Node is just 2 years old. You'll get 90% of garbage on any open package manager.


You make it sound like "rejecting blocking libraries" was some noble principle. Due to the severe limitations of JavaScript, there is no other choice.


We run a major site on Java, had some thread trouble years ago in the very beginning, works very well now when tuned. Threads work.

BUT: I assume people will move to backend services with REST and combining REST backend results to a page. This increases IO a lot and will kill your latency and default thread models when you do sync code. You'd need to use async IO, composeable futures to manage latency and thread count. And if you do async backend REST, why not do async JDBC etc. But there are no libraries.


>This. The biggest difference is in the two cultures: blocking is anathema to the Node.js community - they will literally reject libraries or code that blocks because it destroys the entire model

Really? So they reject any kind of library that does anything except call a callback? Because everything else, from calculating 2+2 to creating a template blocks. And it doesn't matter when it happens, when it happens it blocks.


Twitter is sponsoring a Summer of Code developer to create an async library for MySQL: http://engineering.twitter.com/2012/05/summer-of-code-at-twi...


This can't be voted high enough.


Your async container still can be useful, since you can have worker threads executing blocking operations. This works fine as long as your JDBC queries are fast enough to feed the event loop without creating tons of threads.

What async containers like Netty (and by proxy, Vert.x) solve is high-latency cases like long polling and file upload/download. Netty for one does have support for zero-copy file transfers: http://docs.jboss.org/netty/3.2/xref/org/jboss/netty/example...

In my experience, these are a small percentage of cases so the majority of the app can use blocking I/O -- even long-polling code, as long as you are using the async framework for the client connection.


So, I googled 'finagle java' and all the results were terrible. There was no concise description with a short example. Nothing led me to believe 'node.js' clone in anything I saw in the first few results. In fact, the first result was this: http://engineering.twitter.com/2011/08/finagle-protocol-agno...

"Protocol-agnostic RPC System" does not sound like a node clone at all.


(as a side note, what do you think REST PUT calls over HTTP on Node are?)

Finagle is about composing async IO calls, e.g. here

http://twitter.github.com/scala_school/finagle.html

This works much better in Scala, as Java is not very good at composing Futures. It needs a lot of boiler plate code as it lacks noise free closures.


There was no MySQL driver for Node when it was first created. You have to start somewhere.


>What all the JVM Node.js clones are missing and what Node.js sets apart are async libraries. There is no async (MySQL) JDBC driver for starters.

Would you call Node.js's MySQL drivers "production quality"?


Being non-existent and being production quality or not are very different things. BTW there are hundreds of node apps in production.


>Being non-existent and being production quality or not are very different things.

As far production is concerned, they might as well be the same thing.

>BTW there are hundreds of node apps in production.

Do they use some of Node's MySQL drivers though? For they (or their lack thereof) was the topic of this subthread.


The topic was the non-existence of drivers in other platforms, you brought quality up. Node itself isn't 1.0 yet. I don't see how not having anything is better than having something in development.

transloadit.com, from the module mantainers, has been using it for an year+. I don't have any magic insight into what modules sites use, but judging from the activity in their repo it's quite popular (~1000 watchers, 100+ forks): https://github.com/felixge/node-mysql


Finagle looks like it is a bit lower level framework compared to vert.x. While Finagle aims to be a framework that supports services using multiple protocols, vert.x appears to be much more tailored for HTTP web services.


finagle-http (https://github.com/twitter/finagle/tree/master/finagle-http) provides pretty much everything you need to build an HTTP web service.


But take a look at the Finagle example from Heroku and compare it to the example from vert.x. There's a lot of boilerplate in the Finagle version because Finagle is a general purpose async service framework, which was my entire point.


Maybe I'm one of the weird ones, but I absolutely love types and would probably write everything in JS if it had typing similar to Java or C++. As it is, I'm using Java on Jetty instead for my current web application but would love to see a really solid event Node.js style typed framework. With that said, at this point I'm not sure I'd give up my Servlets, frameworks(like CometD) for doing WebSockets, and the other niceties that a true servlet container gives me. I can't wait to see where this goes though!


There's only two kinds of weirdos in this world. Those who love types, and those who don't.


Impressive. It's basically a kinda-sorta-rewrite of Node.js APIs on the JVM.

Looking through the docs, the main difference I see is that this is opting for a comparatively heavy-core approach which contrasts with Node's ruthless minimalism + third-party modules.

For example:

-file system access is convoluted with HTTP handling: req.response.sendFile()

-pieces of web framework functionality by default, but no full solution (RouteMatcher)

-integrated WebSockets with a novel but unconventional accept/reject API

-heavyweight SockJS integration

It will be interesting how all of his plays out. And I'm definitely interested in hearing evidence for claims such as

> a run-time with real concurrency and unrivalled performance


> -file system access is convoluted with HTTP handling: req.response.sendFile()

I haven't looked in to it, but just guessing this is probably so that it can do 0-copy sending of files (e.g. doesn't have to buffer/stream the contents through the JVM).


That's right. If you use the sendFile() method and you're on an OS that supports it, then the kernel will do the copying directly from file to socket for you.

You can also serve files in the more conventional "node.js-style" way (i.e. pump the buffers manually from file to socket) if you like. It's just slower than getting the kernel to do the work for you.


But you have to dedicate a thread to sendFile() (by nature).


Not really, typically you'd have one thread handling all sendFiles through a single selector over the destination sockets. As a matter of fact, it's actually really painful to do sendFile in Java from a single thread, because when the channel isn't ready for sending, rather than returning EAGAIN and letting you busy-loop or wait/retry or whatever, it throws an exception. So you have to use a selector to do sendfile, and in that case, why not use multiple tasks with the same selector?


(Response to jbooth, but for some reason I can't reply to that directly.)

The whole point of sendfile is to make one system call to send all the data in one stream to another, which in general may block. If you're polling and sending only small chunks at a time (whatever you can write without blocking), is it really that much of an advantage over read/write on the same poll? (If you're not doing that, then you have to block, and you have to dedicate a thread to it.)


If the socket you're writing to has been set to nonblocking, then sendfile exhibits the behavior I described, sending EAGAIN sometimes (check man sendfile). This means typically you want to put a selector in front of it and poll the selector, then send to any sockets that are writable, loop back and poll again.

It's still an advantage over read/write because you're getting the 0-copy behavior.


"real concurrency" is a silly term, but I assume he means threads and therefore a multicore concurrency model vis-a-vis thread-level parallelism, allowing a single VM to utilize all cores in the system.

This is opposed to Node, which must run at least one VM for each CPU in order to utilize all of the cores in a system, unless your only use of threads is ThreadPoolExecutor-style pools, then you can use the horrible hack that is threads-a-gogo


Yes, I meant threads ;) E.g. A web server using node.js on a 32 core server. You would have to manually manage 32 instances of node, and use a load balancer or the cluster module in order to route requests to the instances. With vert.x you just start one instance and from the command line you tell it how many instances to start. It then scales over your cores, no glue code or cluster module to write. (There's an example of this on the front page of the website).


Having VM per core may be quite beneficial -- you get more fault tolerance, immunity to GC glitches and one tier less when scaling over several machines. And there are nice tools to manage multiple processes.


The install pre-requisites recommends that Windows users install a linux VM - but I found the beta 11 version works directly.


Yes, vert.x should work directly on Windows. I shall update the wiki accordingly :)


Frustrating to see a library like this be called "Next generation" when the code structure is a step backwards as far as I can tell. We have had green threads, and more than that green threads that can multiplex over multiple cores for a long time. Let's move on, people.


From the article:

> InfoQ: What about running a real-time app on the JVM vs. on Node.js, with respect to debugging, monitoring and operations?

> Answer: I'd say monitoring and operations are really the concern of the environment in which you deploy vert.x than vert.x itself. e.g. if you deployed vert.x in a cloud, the cloud provider would probably provide monitoring for you.

This makes it sound like a toy. How can I deploy something to production when I have no way of seeing what it's doing? How is a cloud provider supposed to provide debugging/introspection for JavaScript running on the JVM (by means of a brand new facility)?


also look at http://ringojs.org


also look at http://vibed.org/


also look at Apache AWF http://incubator.apache.org/awf/


Looks like a user-friendly interface to Netty and Hazelcast with some special sauce sprinkled in. I love Netty, and Hazelcast is, er, interesting and hopefully getting more reliable. Should be fun.


Netty and Hazelcast are amazing pieces of software. I've used them with good success. Hazelcast is another unique enabler that makes the impossible or difficult to do trivial.


This is not anything new, this funcionality is part of Apache Cocoon since almost 10 years and is called Flowscript: http://cocoon.apache.org/2.1/userdocs/flow/api.html

You can use and create Java objects from your JavaScript.


There are other alternatives for polyglot JVM goodness, for example this based on Restlet:

http://threecrickets.com/prudence/

Plus a whole framework based around MongoDB:

http://threecrickets.com/savory/


Trying to install this on Ubuntu. Is it me or is the "binary" download link [1] not really a binary download link?

[1] https://github.com/purplefox/vert.x/downloads


Works fine here:

https://gist.github.com/2594983

Are you sure you're not looking at the github tags, rather than the downloads?


OK. I think I know what I did wrong. On the Github (download link on the website) I click "Download .tar.gz" assuming I was getting the latest version. I might got the latest version but it seems to be the latest of source. I should have chosen one of the packages.


I was hoping it provided an alternative ideology to node.js in the form of fibers.


Fibers (or equivalent constructs) aren't supported by all the languages that Vert.x supports (e.g. Java) so we can't really support something like that until we can do it in all the langs.

I know Fibers/Green threads are all the rage right now, and it is certainly something to keep an eye on, but I am not entirely convinced that roll your own threading is going to be any more performant than what the kernel can do.

If we can find a way of implementing fibers efficiently, that supports millions of fibers on a single JVM instance, I would be interested.


    req.response.sendFile('webroot/' + file);
!!!!

I'll have file='../../secret.txt' thanks


Yes, of course in a real web server you'd make sure you do the checks ;) The documentation actually mentions this point explicitly :)


I'm pretty sure I don't want to use a web server by people who think a 5 line demo that gives unrestricted access to the hosts file system is the best way to show off it's capabilities. Sorry, but that's just stupid.



I'm unsure what you're trying to say there. Yes, I am aware that it's possible to serve files without exposing your whole file system. Did you think that was something that might be in doubt?


This is most welcome. Hope they continue to support this effort. We need more light weight approaches in established languages like Java.


I went through their tutorial and it seems very enticing. hopefully it'll get some real traction and broader support.


i wonder how is it compared to servlet container (e.g. tomcay and servlet spec?


it ignores those, for good reason


which is?


Every concurrent request gets its own thread, whereas this seems to be one thread per core.

Also, the servlet API is crazy, partly because of all its baggage. There seems to be a lot of exploration around what the right API is right now, but its pretty clear the world wants something new.


> Every concurrent request gets its own thread, whereas this seems to be one thread per core.

No. Servlet containers use a thread pool (that grows and shrinks dynamically).


There's really no conflict between the statement, "every request gets it's own thread" and the use of thread pools.

The point is that a given request in a servlet container is handled in it's own thread. That thread will probably come from a pool and be reused to handle another request of course, but that's sort of immaterial.


So it's more like Twisted (a library) than Node (a runtime + a library).


Node is more like a library than a runtime. Its runtime is V8. The node libs could be ported to another JavaScript engine such as jsc, narwhal, or ringo.

> Node.js consists of Google's V8 JavaScript engine plus several built-in libraries.

http://en.wikipedia.org/wiki/Nodejs


Node ships with both runtime and async library, unlike Twisted, which doesn't ship Python.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: