Implementing your own language on the Parrot VM

icey · on April 8, 2009

In my opinion, if someone were to want to port Clojure to something other than the JVM, Parrot would be an amazing place to go.

The idea of having Clojure + CPAN makes my pants tighter.

I know that there is someone working on Clojure on .Net, but I don't think that really adds much vs the JVM; especially since it seems like some of the best libraries for the .Net stack have come from Java.

I'm sure I'm at risk of sounding like yet another Clojure fanboy, but to me this really seems like one of the few times where I've seen an article and that's the first thing that popped into my head.

old-gregg · on April 8, 2009

In my opinion, anything on JVM isn't going to become a mainstream language because VMs refuse to share code between OS processes.

.NET and Mono solve this with AOT-precompiled modules. Ruby and Python kind of solve this by heavily reusing C runtime/libraries, so at least there aren't 25 copies of printf() and qsort() wasting space in RAM.

JVM doesn't solve this at all. JVM-based softwares act like pigs showing zero respect to the enrivonment they're in. They're good at "one program = one computer" tasks, i.e. only for server-side or perhaps developer's workstation scenarios.

Flash/Air is even worse: not only it won't share anything, but it also brings in its own graphics, font rasterization, hot key bindings, scroll bar behavior, etc, making every software built on top of it look and feel like a poorly designed console game running on your PC under software emulator.

This is why I never touch anything like JVM/AIR: I don't believe the cloud is "the answer" and I want my code to run on servers, desktops, netbooks, routers and cell phones.

nradov · on April 8, 2009

I find it hard to believe that sharing libraries in memory makes a significant difference. Most libraries are small relative to RAM sizes. The only exception is on cell phones, which only run a few processes anyway.

old-gregg · on April 8, 2009

You don't need to "believe", you just have to know it. CS is a discipline, not a church.

As an exercise, get a full list of all your processes and do the math by replacing them by hypothetical JVM instances [one per process] and I suggest paying attention to "shared pool" size for each process too. I guarantee you'll be shocked by total RAM consumed by an "empty" OSX or Vista. Just imagine: every little tiny process, even simple background services, will have a full copy of your entire userspace layer in its address space.

Moreover, with 12-core CPUs becoming a reality very soon, the issue becomes even more important. With proper code sharing on OS level 12 instances of a process should (theoretically) take not much more RAM than just one. But because of VM overhead you'll be essentially duplicating it 12 times.

More code duplication also means more I/O hits on disk and system bus and CPU L1/L2 cache drain.

This is why Java is for enterprise/web software only. It is simply retarded to build on it in a true multi-process environment.

OS-level code sharing is hugely important. Your hard drive is full of code, gigabytes of it, and you can't just clone big chunks of it into every process who needs it. Without code sharing you'd have to wait a few more years for an iPhone. He-he, actually because of lack of it, you aren't reading this in a JVM-powered browser and probably never will.

nradov · on April 8, 2009

Have you actually tested this on a complex system of multiple applications running or JVMs to measure the effects, or are you just guessing that it would be a problem? I don't dispute that sharing libraries across multiple processes would be a nice optimization, but for typical use cases it's way down on the priority list.

Most Java applications are written to use multiple threads (rather than multiple processes), which all execute on the same JVM and share all memory. Web apps typically run in some sort of J2EE container and those can be configured to share libraries across multiple applications; the OS doesn't come into play. Even if you're talking about running multiple separate desktop applications those will mostly only have the J2SE runtime library in common; for Java 1.6 the rt.jar file is 42MB, and only a fraction of it even needs to be loaded into memory.

old-gregg · on April 8, 2009

Have you actually tested this on a complex system?

What? Have you actually tested the multiplication table with rocks or buttons? That's simple math. Memory consumption is in front of you: I already pointed you to tools and told you what numbers to use.

Your last paragraph only restates what I already said: Java is for few-processes-per-machine use case. And cloning 42MB of code into every process is insane, I can't believe you're actually proposing this. BTW, that's compressed 42MB, you seem to actually be using J2EE download size to estimate your RAM consumption. Nice.

I am amazed by the damage done by Java lobotomy at US schools and have very little hope for progress moving forward: UNIX clones from the 70s are here to stay, not only we can't hope to advance in that area (OS design) but I'm afraid we're losing skill to even comprehend existing systems.

One more time: right now, as you're reading this, nearly all processes of you machine have much, much larger code segments than data segments. Most of your RAM, young man, is consumed by code, not data. The code-vs-data ratio becomes even more ridiculous if you're running a few instances of VMWare. Most of the time you're waiting for your Macbook Pro to do something, it's pushing code around. And the reason Vista is so bloated and so much slower is because it loads a lot more code into RAM than XP did. So yes, code sharing is hugely important. A server machine, however, is very different: there a very few applications and lots of data, so using something inefficient as JVM makes sense if it speeds up development/deployment.

nradov · on April 9, 2009

You missed the point. Code sharing in memory only helps if you have multiple processes all using the exact same library. That problem is already solved for server applications, and is mostly irrelevant for mobile platforms, so I guess you're mostly worried about the desktop environment. But typical desktops only run a few user-mode processes at a time, and wouldn't be able to share much code regardless of whether the JVM supported it or not.

While it would certainly be possible to implement that feature, it seems most of us just don't care. RAM is cheap and I would rather see the Sun and IBM developers focus on more important issues. Dynamic method invocation and tail-call optimization are a lot higher on my list.

The reason you're not seeing much progress on OS design is not because we don't comprehend the systems, but rather because what we have now is good enough for what most people want to do. The OS has largely ceased to be an obstacle and quietly faded into the background. Disruptive OS innovation will have to wait until someone comes up with a killer app that requires fundamental capabilities which current OS designs can't support.

Most of the time I'm waiting for my Vista machine to do something it's not pushing code around. It's idling, waiting for a response from a server.

old-gregg · on April 9, 2009

But typical desktops only run a few user-mode processes at a time

No. Run ps -A goddamit. A "typical" desktop, especially a UNIX-derived variant, runs a lot of processes. And servers in the near future will do too.

and wouldn't be able to share much code

No, there are a ton of userspace code which is being shared between processes. The code that allocates memory, the code that draws lines and buttons on your screen, the code that sorts strings, the code that implements threading, the code that renders bitmaps out of TTF glyphs, reads files and opens sockets, each process needs megabytes of code you seem to have no idea about: it's not even about libraries, it's about everything not drivers, do you understand now? There are good reasons why JVM programs are memory pigs: they are, essentially, running their own OS in complete isolation.

Disruptive OS innovation will have to wait until someone comes up with a killer app that requires fundamental capabilities which current OS designs can't support.

The problem is that programmers are cheap. There is no market pressure to evolve and leave technology from the 70s in the past. Armies of incubated Java code monkeys cost less in the short run as opposed to going against the stream of mediocrity. This is why new platforms (iPhone) are so exciting and refreshing: they let us leave obsoleted/inherited junk behind: iPhone doesn't have JVM nor Flash/AIR for very good reasons: OS is your VM. Has always been. It's just the right thing to do.

The OS has largely ceased to be an obstacle and quietly faded into the background.

You have demonstrated enough ignorance regarding what OS actually is to allow me to safely ignore this comment. There are a few folks at Apple and Microsoft who are still capable of understanding these issues, this is why Java, despite of 10+ years of availability, with all its beautiful promises, still lives in the obscurity of server rooms.

nradov · on April 9, 2009

This is silly. The "typical" desktop runs MS Windows, with only a few user processes running at a time. Some of those processes may have a bunch of threads going, but the process count is small. Of course all of those processes share the low-level OS libraries. And guess what: each JVM process shares those OS libraries, too.

The Java standard library does duplicate some of the higher-level OS features, which is the only practical way to achieve cross-platform compatibility. Any cross-platform solution is going to impose some overhead. Of course you can write native iPhone apps with no extra overhead, but then they can't be used by the 95% of mobile device users who have other platforms.

I'm not sure what you mean about code running in server rooms being obscure. That's what drives web applications and network services which are more visible and critical to many of us than our desktop applications.

By the way, anyone who wants to use the JVM for writing desktop applications should take a look at the Eclipse RCP. I haven't had a need to develop for that platform myself, but the results I've seen from other companies look pretty good.

mbreese · on April 8, 2009

I think that the biggest problem with not sharing libraries is with the initial spin up of the JVM. If you shared even just the standard java libraries, startup times would be better.

If I remember correctly, the Apple JVM does/did this. I'm not sure about other though (IBM, BEA?).

old-gregg · on April 8, 2009

Well... spin up would improve, sure, but the much bigger issue is that a system quickly becomes bloated by code, cloned for a significant number of java processes. And the number of processes will only be increasing because of multiplying CPU cores.

I think the biggest innovation in systems programming would be to merge a VM with Linux kernel, i.e. the kernel should have a garbage collector and byte code compiler built-in. This would bring tremendous performance gains and programmer's productivity improvements.

Microsoft is slowly moving towards that direction: .NET is already tightly integrated into the OS: at least all standard .NET libraries are pre-compiled, and shared across processes: using C or C++ on Windows makes absolutely no sense right now. By introducing GC into Obj-C Apple is doing the same thing. Next step for them would be to bring in LLVM and suddenly JVM on OSX will look even weirder than it does now.

icey · on April 8, 2009

Help me understand your point.. You're saying Java will never become mainstream because it runs on a VM?

Edit: The more I read your response the less it makes sense to me. You say that you hate VMs because they don't respect the OS, but then you go on to say you want to write code that will run everywhere... That's the point of running in a VM, no?

old-gregg · on April 8, 2009

Thank you for proof-reading, I edited my post above: I guess I am complaining about implementation of JVM/AIR. Yes, I want my code to run everywhere, but I want to use more lightweight VMs for that, those that blend into native OS more easily, and code sharing between processes is a big part of it, i.e. if I want to be able to write my application as a module which can be shared between 20 processes without being copied around 20 times.

Basically the only parts of VMs I like is the bytecode and garbage collection. JVM/AIR do a lot more than that.

andreyf · on April 8, 2009

I want to use more lightweight VMs for that, those that blend into native OS more easily

Well, it's a tradeoff, right? With the JVM, because of the separation from the OS, you have that write-once-run-anywhere interoperability?

Xichekolas · on April 8, 2009

Not that I totally understand it either, but I think he was railing against the JVM in particular, not VMs in general.

andreyf · on April 8, 2009

A lot of parts of Clojure seemed to be created specifically with the JVM in mind. Not sure if porting it is a good idea.

celoyd · on April 8, 2009

Yep. Clojure is designed around the JVM and libraries written in Java-the-language. If you ported it, by the time you took out the Java-y stuff and added things like TCO and call/cc, it would be a new lisp.

And I think new mid-sized lisps are what we need. If Parrot brings a wave of lisps bigger than Scheme but smaller than CL, designed for Unix and the internet, competing with Clojure and Arc, everyone wins. If they can call each other's libraries, it'll be wonderful.

We should be looking way beyond Clojure.

icey · on April 8, 2009

Are you sure it is designed specifically around the JVM? Rich used to maintain side-by-side versions that ran on .Net and the JVM.

celoyd · on April 8, 2009

Tail call optimization is an example of something that Clojure doesn't have because the JVM makes it hard, although it's easy (I'm told) on the .NET system.

chancho · on April 8, 2009

Can you give some examples? (Not doubting. I don't know much about Clojure and I'm curious.)

andreyf · on April 9, 2009

My only experience with Clojure is reveiwing Stuart Halloway's books on it, so this is certainly not the most qualified opinion you'll get, but as others have mentioned, tail-call optimization (lack thereof in the JVM, that is) significantly affected the language. It's not possible on the JVM, so Clojure relies heavily on the 'recur' keyword, so that recursive code can be translated into non-recursive bytecode.

Another language feature that could be ignored is the trampoline, which is made easy with the #(foo 'bar %1) notation, but can be ignored if you're not worrying about the JVM.

Aside from that, I would be very interested in seeing a Parrot Scheme based on ideas from Arc and Clojure both.

MrRage · on April 8, 2009

> I know that there is someone working on Clojure on .Net, but I don't think that really adds much vs the JVM

The .NET CLR supports tail recursion, whereas the JVM doesn't. Don't think it's worth porting simply for that, though.

maximilian · on April 8, 2009

I'm planning a summer, "write my own language" project for fun, and I wonder how well the parrot compiler tools would be for this. I imagine its a nice way to get my language into an AST, even if I don't target the Parrot VM.

Xichekolas · on April 8, 2009

Yeah I'm doing the same 'summer project'. I was writing my own parser/lexer by hand, but vaguely remembered hearing that Parrot had hit 1.0, so figured it'd be a good platform to experiment on. No idea if it'll support everything I want to try, but it at least seems easy to get started.

maximilian · on April 8, 2009

I tried doing that too, but writing a parser was a pain in the ass in C. I was going a little crazy trying..

I do a lot of numerical things for my masters, and I want to write a simple numerics language that is heavily JIT compiled to speed up the code. Most numerics is done in pretty tight loops (at least everything I see), so I'm hoping to get pretty good performance. Its also just fun to read about VMs and armed with a parser, it won't be as hard to target different ones and compare performance.

chancho · on April 8, 2009

You should check out LLVM. It generates really fast code and has a JIT compiler. I don't know much about Parrot, but it seems heavily skewed toward making dynamic languages fast whereas I believe LLVM aims more toward static languages. Not that you couldn't implement any language on top of either, but take for example the fact that Apple's support of LLVM was (I think) in part so they could use it as a JIT compiler for OpenGL shaders in their software-fallback drivers. It has good support for cross-platform SIMD instructions and such, which will benefit numerical computation greatly.

-------------

Also, why don't you both use something like flex and bison? Is writing a lexer/parser by hand still that much of a nerd rite-of-passage? That's the most banal part of implementing a language. You could spend your summer writing an optimizer instead.

krokas · on May 2, 2009

I can rather suggest use libJIT instead of LLVM overkill. libJIT is both faster, platform independent, and much easier in use than LLVM (or GNU lightning).

Also LLVM is not really suitable for JIT compilation. For instance, LLVM compilation time is just huge, and not suitable for real world JIT compilers. Maybe only for those small toy programs LLVM developers show on conferences. This was one of the many other reasons that for our Just-In-Time compiler we use libJIT. It has been well tested in embedded systems, industrial lasers by TRUMPF Laser division, working on Portable.NET Just-In-Time compiler, and a managed implementation of Windows.Forms.

Moreoever, LLVM does not easily support many features needed for implementation of ECMA-335, Microsoft Common Language Infrastructure, Common Language Runtime, Common Intermediate Language, and Common Type System:

For instance,

These are just a couple of them:

    * the whole spectrum of ECMA 335 for CLR types and operations 

    * async exception handling 

    * precise stack marking 

    * multiple custom call conventions 

    * fast Just-In-Time compilation

Try to Google for libJIT. You can have a look for more information here:

http://code.google.com/p/libjit-linear-scan-register-allocat...

Also have a look to libJIT tutorial:

http://www.gnu.org/software/dotgnu/libjit-doc/libjit.html

Xichekolas · on April 8, 2009

I wasn't doing it to prove anything or even to get a working language to use... it just seemed like a fun problem to solve. I've done a bit with yacc and lex before, and had a lot of fun back in the day with the Metacircular Evaluator in SICP.

chancho · on April 8, 2009

Ah sorry. You said "I wrote it by hand" and he said "it's a pain in the ass" so I mistakenly thought you felt the same way. One person's banal is another's fun.

maximilian · on April 8, 2009

I'm also planning on targeting the LLVM. I am just interested in using the PCT to minimize annoying parser work...