Hacker News new | past | comments | ask | show | jobs | submit login
Jacobin: A more than minimal JVM written in Go (jacobin.org)
295 points by yla92 10 months ago | hide | past | favorite | 178 comments



> An important factor in reducing the size of the codebase and executable is that Jacobin relies on Go’s built-in memory management to perform garbage collection, and so it contains no GC code.

This breaks my brain thinking about it. A lot of what the JVM does is interpreting/JITing bytecode and ensuring it links/executes correctly, and writing that logic itself in Go is one thing. But how does Go's GC help you garbage collect objects in the JVM you're implementing?

For example, you have objects in the JVM heap, tracked by the code you're writing in Go. You need to do a GC run. How does the Go GC know about the objects you're managing in the JVM? Do you just... write wrapper objects in Go around each of them, and code up a destructor so that freeing the Go object frees the tracked JVM object? How do you inform the Go VM about which pointers exist to your JVM objects?

I realize I'm in way out of my depth here, and only have a "user"'s understanding of what GC's do, having never implemented one myself, but it seems crazy to me that Go's GC can Just Work with a VM you're writing in Go itself.


I suspect every JVM heap alloc is implemented by doing an alloc in Go. The JVM references to the object are pointers in the Go VM. So no special magic is needed. When the Go VM stops referencing an object, the Go GC will collect it.


Does this mean that code running inside Jacobin might be vulnerable to memory exhaustion issues[1], whereas in JVM they might have gotten an OutOfMemoryError instead because JVM heap size is fixed at startup time?

[1] For example https://pkg.go.dev/vuln/GO-2023-1704


Interesting idea. That sounds like this could run java programs with only the memory they actually need, instead of dividing a server up into pieces just because a java program "might" some day, possibly, maybe, ever need that much.

Which tends to be cargo culted into "use these arguments when running java programs", thus a "hello world" responder gets allocated 128GB of ram.


Java leans much more heavily on its GC than Go does so it will be interesting to see whether that's really an approach that works.


Not too familiar with Go, but my first instinct is how will it handle non-vanilla references, of the weak, soft, or phantom variety?


Given how primitive the Go GC is, I doubt it'll work at all.


I don't think primitive is a good description of the go gc. It's definitely got different design constraints (Vs the various java gc, for example), particularly around the philosophy of minimising knobs, but within those constraints it's pretty highly optimised.


It will work, bat since it’s not a moving GC you may end up with a lot of heap fragmentation, and as I don’t think it is generational it may get into a state where it stops collecting or has quite long pause times (can’t remember if it limits its pause times).


It is kind of interesting to look at some of the differences in GC approach in the JVM vs Go - the different goals, different tradeoffs, different approaches, etc. Go's is definitely simpler in that there is a single implementation, it doesn't have nearly as many tuning knobs, and it is focused on one things vs the JVM GC implementations that give you a lot of control (whether that is good or not..) over tuning knobs and it is a pretty explicit goal to support the different GC-related use cases (ie, low-latency vs long-running jobs where you only care about throughput).

One of the things I really like about Go is that a lot of the designs and decisions, along with their rationales, are pretty well documented. Here are the GC docs, for example - https://go.dev/doc/gc-guide.

For example, Go doesn't move data on the heap around so to combat fragmentation, it breaks the heap up into a bunch of different arenas based on fixed sizes. So a 900 byte object goes into the 1K arena, etc. This wastes some heap space, but saves the overhead and complexity with moving data around.



Arenas are still experimental (as of go v1.21).


Sorry, I used the wrong terminology. They are called “spans” in Go’s GC. There are different sizes of spans that allocations end up in, which helps avoid fragmentation.


It will "work". It won't be as fast and precise as the JVM.


As noted by someone in a sibling thread, it's possible it might yield better memory total allocation size, which if true could be interesting and worthwhile by itself as a tradeoff to consider on an app by app basis.


?

Go has a complete GC. It's not like Go relies on reference counting.

The biggest problem is that it's non-moving, so fragmentation is an issue. But that's true of many languages, e.g. C/C++.


A “complete”—i.e. functioning—tracing GC is a weekend project. (Mark-sweep, mark-compact, or stop-and-copy, take your pick.) Perhaps not as simple as basic unoptimized reference counting, but still not hard.

The hard part, the one that has occupied JVM engineers for almost three decades now, comes afterwards: when you try to make things not freeze when memory is low, or when you have multiple threads mutating the same heap, or ultimately when you’re adapting the GC to the particulars of your language. (E.g. Haskell has an awesome concurrent GC that’d work like crap for Java, because it assumes tons of really short-lived, really small garbage and almost no mutation. The other way around is also bound to be problematic—I don’t know how the Scala people do it.)

So a GC being tracing and not refcounting is not really a useful benchmark. And Go’s GC is undeniably less advanced than OpenJDK’s, simply because almost every other GC is. It can still suit Go’s purposes, but it does mean running Java on top of it is bound to yield interesting results.

(And can we please stop pretending C and C++ are in any way close as languages? Even if the latter reuses some parts from the former’s runtime.)


> Go’s GC is undeniably less advanced than OpenJDK’s

Java relies very heavily on its GC and tends to generate a lot more short lived objects which need collection than Go. Go's approach to memory management learns from this and focuses on creating fewer short-lived memory objects and providing much shorter GC pauses than Java. It's definitely less complex than Java's GC but it's also very performant and a lot less trouble than Java's GC in my experience.


Can you elaborate more on this?

> E.g. Haskell has an awesome concurrent GC that’d work like crap for Java, because it assumes tons of really short-lived, really small garbage and almost no mutation. The other way around is also bound to be problematic—I don’t know how the Scala people do it

I don't know a ton about Haskell's GC, but at surface level it seems very similar to several of the JVM GC implementations - a generational GC with a concept of a nursery. Java GC is very heavily designed around the weak generational hypothesis (ie, most objects don't live long) and very much optimizes for short-lived object lifecycles, so most GC implementations have at least a few nursery-type areas before anything gets to the main heap where GC is incredibly cheap, plus some stuff ends up getting allocated on the stack in some cases.

The only big difference is that in Haskell there are probably some optimizations you can do if most of your structures are immutable since nothing in an older generation can refer to something in the nursery. But it isn't super clear to me that alone makes a big enough difference?


One major simplification you can make is that due to purity, older values _never_ point to newer values. This means when doing generational GC, you don’t have to check for pointers from older generations into newer generations.


This feels wrong. Specifically, doesn't laziness bite you in this scenario? If I make a stream that is realized over GC runs, I would expect that a node in an old generation could point to a realized data element from a newer generation. Why not?


It does: "Nevertheless, implicit mutation of heap objects is rife at runtime, thanks to the implementation of lazy evaluation (which, we admit, is somewhat ironic)." says <https://www.microsoft.com/en-us/research/wp-content/uploads/...>


Sure, you're saying that it won't be as performant.

And that' true. IDK if you noticed, but there's no JIT either.

---

> can we please stop pretending C and C++ are in any way close?

If we also pretend we don't know why it was named C++.


> Sure, you're saying that it won't be as performant.

I mean, I expect it won’t be, but that wasn’t really my point, no.

What I wanted to say is thet I expect the comparison to be interesting: I might not find Go’s particular brand of simplicity attractive, but I like simple designs in general, and Go’s GC is much less involved than OpenJDK’s one while still having received some tuning—it’s neither a weekend toy nor a multi-programmer-century monster. And it’d be interesting to see how much the simpler design really loses to the scariest monster of them all.

> And that' true. IDK if you noticed, but there's no JIT either.

That might have been interesting in a general comparison of Java VMs, but I’m concerned with GCs and in that light it’s not. It could be that a slow VM is so much slower that the GC difference gets lost in the noise, but given an actually bad GC situation can lock up the mutator for literal seconds I expect there will be a meaningful comparison independent of the rest of the VMs.

>> can we please stop pretending C and C++ are in any way close?

> If we also pretend we don't know why it was named C++.

Marketing gimmick? I’m absolutely fine ignoring people who try to suggest things which are not true through manipulative branding. I don’t feel guilty about that.

To be clear, there absolutely is C-ish C++ in the world, and even if it’s not a lot relatively speaking it’s still a lot of code just because of how much C++ there is overall. And if C-ish code was the mainstream of the language, I’d be fine with this commingling. But it’s not, and neither is it the style the language’s designers are using as their benchmark. That’s been the case for at least a decade. So, no, I don’t think C/C++ is any more justified than, I don’t know, C/C#.

Finally, the name was chosen not only very early in C++ time but actually fairly early in C time as well. When C++ was named, C didn’t even have function prototypes! (Necessarily, as it copied those from C++.) I just don’t see why it matters what the Stroustrup’s intentions were when he chose the name in 1982. A lot has changed in forty years.


> it seems crazy to me that Go's GC can Just Work with a VM you're writing in Go itself.

Far from it, it is more natural to do that than anything else.

Simplified example:

  type Array struct {
    items *any[]
  }

  type Object struct {
    fields map[string]*any
  }
These are the JVM values, and when the references to them disappear, the JVM values they reference can be GC'd as well.


your example is not valid Go code:

syntax error: unexpected ], expected type argument list

and its also just poor style in general. "any" is already a pointer, so you would rarely design a pointer to any. example:

https://godocs.io/encoding/json#Marshal


It’s just pseudo code, relax. You’re not a compiler. You know what they meant.


[flagged]


This comment (and the subsequent follow-up reply) violates the HN comment guidelines:

- Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

- When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."


Lighten up. Someone answered a question in a concise manner with some pseudo code, probably in a language they haven’t context switched to in a couple weeks. The pedantry you’re displaying doesn’t help the conversation. Do you have substantive insight to provide about how the go GC works with Java’s different JVM improvements?

If not, you can just rate the post up or down


What of it, though? I noticed, but didn't think much of it because pseudocode doesn't need to be valid and usually isn't. What is the problem with these syntax errors that you are trying to address by calling attention to them?


Since the VM controls allocation of Java objects, just implement the VM to allocate the Java objects into Go's heap using Go's native allocator thereby allowing the native Go GC to clean those up when they become unreferenced.


How would one allocate Java objects using Go’s allocator, as a program written in Go? Does go provide such primitives?

Naively, something like:

    // Called when the hosted JVM code wants to allocate
    func makeJVMObject() (jObj, error) {
        var obj = new(jObj) // on go’s heap
        // do stuff
        return obj, nil
    }
would make sense, except how do we keep track of who’s referencing it? JVM objects have fields which tell the GC how to crawl the object tree in the mark phase (and so do Go objects), but how do we make the Go GC aware of the fields the JVM knows about? A map maybe?

Hmm, I guess a map could work… the jObj struct could have a map of fields it knows about, keys being the field name and values being where they point to…

Now that I think of it this probably must be how all GC’s work, they can’t rely on static information to know the fields of each type they’ve compiled, it’s gotta be something like a map somewhere.

I guess I may have answered my own question here.


Yep :)

In fact, I would go so far as to say that it's harder to implement those without Go's GC just working.

https://news.ycombinator.com/edit?id=37254746


Most languages have “precise garbage collectors” that always know from runtime type info which bits of an object are and are not heap pointers.

Sometimes you see add-on “conservative garbage collectors” that have to assume any word might be a pointer if it looks like an aligned address in an allocated page. They can’t move objects to do compaction because they’re never sure which words are not pointers.

Jacobin stores an object with a slice of its field values (each boxed as “any”) and types, so Go’s precise GC would be able to trace them: https://github.com/platypusguy/jacobin/blob/main/src/object/...


Leveraging the millions of man hours that goes into these run-time's subsystems is starting to become a "thing" I've noticed -- especially when running code not meant for them. For example, there's a Nintendo Switch emulator that I believe just uses the C# runtime's JIT instead of trying to roll their own. Lo and behold, it works and they've saved themselves thousands upon thousands of hours writing and debugging their own.

It's kind of cool actually.

I wonder if there's a future where somebody can just pick and choose language and runtime components parts to create the environment they want before even writing a line of code. We sort of do it a level lower with VMs and containers, and then pick and choose language features we want to use (e.g. C++), but I don't know of a good way to use Java's JVM, C#'s JIT, somebody else's memory profiler, another team's virtual memory subsystem etc. without writing a bunch of different pieces in different languages to get those benefits.


I would be very curious to side by side performance benchmarks between this, GraalVM, and vanilla JDK. My gut tells me (with no data to back this us) that the vanilla JVM will inch ahead once it's paid the cost of starting up but I would be interested to see how wrong I am.


Lead dev here. We've run a couple of benchmarks internally just for kicks. To create a fair comparison, you have to run the Hotspot JVM with the -Xint flag, which says interpret only. Right now our performance is anywhere from 15-25% of the speed of Hotspot with -Xint on small benchmarks. We figure that the use of go alone creates some important portion of that overhead when compared with the C++ of the Hotspot JVM. We're guessing that a well-optimized Jacobin interpreter will eventually get to 50-60% of the Hotspot's -Xint speed.

But we first want to get feature parity, before pivoting to performance. When we have feature parity, we'll run the Computer Language Benchmarks and post the results. That'll be fun to see!


Is that fair? The whole point of the tradeoffs made by HotSpot is that it's optimized for JITing.

Imho there is no way to do a fair comparison, as both implementations have completely different goals.

One thing that would be interesting is comparing the performance of some really simple command line program (e.g. `ls` or `cp` for small files) between Hotspot in interpreter mode, Jacobin and GraalVM native.


Do you (can you even have in go?) have a direct threaded interpreter? That alone may give a 2x performance difference last time I checked, or is that no longer the case?


I don't think you can even implement indirect threading in Go. Go supports neither computed goto nor mutually recursive tail-call optimization (only self-recursive).

Jump tables for switch statements was only implemented last year. If you squint that's close to indirect threading, but still with at least one unnecessary conditional per op.

For the curious, here's their giant switch: https://github.com/platypusguy/jacobin/blob/c508ec50f55ef381... In practice compilers have always been finicky when it comes to coaxing them to emit jump tables from switch statements, and I bet this is especially truly for Go.


Yeah, I think any bytecode interpreter ends up with a giant switch in the critical path at some point :)

Around the time that change was made to Go, Andrew and I were looking at this and wondering how big of a performance hit it was and if there were a better way to structure that. I had a hunch that the compiler should be smart enough to not compile that as a switch/giant if block, and a quick trip to a disassembler showed it using binary search. This commit: https://github.com/golang/go/commit/1ba96d8c0909eca59e28c048... added the jump table and has some nice analysis on where it makes sense to do binary search vs jump tables.

As far as I can tell, with certain restrictions (that are fine in this case), it is pretty reliable at optimizing giant if/else blocks and switches.


That's fair; I thought maybe the use of the Go GC might make the results a bit more interesting.


Go has a much more primitive GC, so I wouldn’t expect a positive result from that itself.


I'm not the author, but I have contributed a few things to the project over the past year. The performance isn't anywhere close to the vanilla JVM - several times slower at best. It is entirely interpreted and there aren't any of the optimizations that have made their way into the JVM over the decades it has been around.

It has been a fun project to play around with for someone like me who thinks this kind of stuff is fun and interesting but will probably never get a chance to work on it full time. Cool to see it noticed, though!


This JVM has no JIT, so OpenJDK and Graal will rather more than inch past it. It's not even a given that it will start up faster, since i don't believe it has been optimised nearly as hard.


You'd probably get better results writing a Go compiler in Java ...


Wrote a Go compiler in Java, used it to compile this JVM implementation written in Go


Great, now just use this JVM to run your compiler :)


Ouroboros


Lambda the ultimate


Well, you'd probably get pretty good perf with graalvm truffle, but certainly the building blocks of golang shouldn't be that problematic to implement in java given that the go language model is more straight-forward IMHO. I wouldn't assume it'd beat pure golang compilation but who knows.


The startup cost is dominated by class loading, which Jacobin will have to do more or less the same as OpenJDK [*]. GraalVM of course profits from AOT compilation.

[*] until the latter will implement condensers: https://openjdk.org/projects/leyden/notes/03-toward-condense...


Didn't they introduce some new format with Java 9 and jlink to store modules more efficiently? Why hasn't that fixed class loading performance?


The issue isn’t the format. Class loading includes bytecode verification (verifying the type soundness of whatever code the class files contain), and then executing the initialization code defined by each class (static code blocks etc.).

The optimizations in JDK 9 are for reducing the file size of the JDK (in terms of number of classes) when distributed with a standalone application, but that by itself doesn’t significantly affect startup time I believe, because classes are lazy-loaded only as required in any case.


The jimage format does optimize startup a bit because it builds a perfect hashtable to go from class name to classfile bytes. On the classical classpath that requires an O(N) scan of the JARs in the app.


Does a JVM really need to re-verify the standard library it ships with?


It’s actually the default to not do that (-Xverify:remote). So I was probably wrong about bytecode verification being a relevant part of startup time, unless you count the application itself.


The modules system is for better encapsulation and the ability to produce a JVM without features you don't need.

Faster start up time is in the works with projects like coordinated restore at checkpoint which will let you start a JVM up in an already "warmed" state.

https://github.com/CRaC/docs


If I understand the README correctly, CRaC in an abstract API that delegates to an implementation (CRIU) which is Linux-specific, described in its README as "the never-ending story, because we have to always keep up with the Linux kernel".

So this wouldn't help if I'm deploying a GUI app for Windows, for example.


Correct, it might, someday, become part of Java SE. If that ever happens I would expect it working on every platform Java supports would be a prerequisite for that.

Right now it's primarily useful for reducing start up times in AWS Lambdas and auto-scaling workloads.


It improves it a bit, and AppCDS improves it a lot (~30% startup win), but neither feature is widely used as it changes deployment workflows.


I see it more as an niche opportunity to integrate bits of Java code in Go applications. The start-up costs can be "paid" when the app starts. Performance will be bad, but it might save some rewrites (provided they can get things like db connections going).


Yeah that makes some sense; there's probably millions of lines of Java code out there where performance doesn't really matter, and so aren't worth rewriting. Being able to embed a JVM just to run those from Go projects isn't a bad idea.

Tangential, but a long time ago, I wanted to reuse a node.js slug library in .NET. I thought about trying to port it at first, but then I realized that this actual job didn't need to be terribly fast, so I instead embedded the Jurassic JS library into my code, and was able to load in the slug library that way directly). It wasn't especially fast (but actually faster than I thought it would be!), but it was certainly fast enough, and I didn't have to worry about not having feature parity.


Another good example was IKVM, which basically transpiled Java bytecode into .NET IL. Saxon XSLT used to build on top of that to provide .NET libraries from the same codebase as the original Java version.

(Saxon had since moved to transpiling the source code instead - the languages are close enough that this is fairly straightforward.)


Why not just do traditional IPC, or even a batch process starting up a JVM from time to time when needed?


There are always other solutions. This one might just work in some cases. As I wrote: it's (probably) niche.


Exactly.

The cool thing is that since this uses the Go garbage collector, you can create a very nice Go-native interface to that JVM code.


Clearly this is not going to be a high performance thing. I'd be surprised if it is even close.

It seems the goals for this are purely academic and about figuring out how Java works. They'll probably just do a simple interpreter and not a JIT compiler. That would be good enough for a POC. Additionally, they already indicated that they'll use Go's garbage collector, which won't be setting speed records with this either. And Java's typical usage of memory might actually stress it out a bit. Then there is the standard library which is going to need plenty of support for things like Threads, IO, various synchronization primitives and locks, etc. Doing that in Go is going to be a bit interesting but probably doable. Alternatively, they might just interface with native code directly and bypass the Go ecosystem. They might even reuse some things from openjdk for that. Speaking of which, native code and JNI would need to be implemented anyway.


It looks like a cool research project. But realistically I would be surprised if it could beat OpenJDK in any benchmark (though I don't think that's the purpose of the project).

Writing something which is compatible with a specification is one thing, making it performant is another thing. For example it's not that hard to create a webserver from scratch but making it performant takes quite some effort.


This is so cool. Can’t wait to go digging around through the code this weekend. Good luck with the project!


I know the link stresses that correctness and code clarity are the primary goals of this project, but I’m curious if performance is a goal as well? Do you hope that people will run “real” workloads with this or use it to embed other software in their Go applications?


Author here: Right now we're entirely focused on getting parity of functionality so that anything that runs on the Hotspot JVM runs similarly on Jacobin. There's still a lot of work to do to get there. However, once we do get there, we'll start working on performance.


what is the reason for this project to be started? What is the end goal? Why current JVM is not good enough?..


Thanks for asking. I've always thought of the JVM as magical technlogy--I'm certainly not alone in that view. But in trying to learn more about it, I was greatly frustrated by the difficulty of reading the code base.

As you likely know, the Hotspot JVM is open source. But reading the code is very difficult, in part because it grew organically and in part because of its unusual design, in which many actions are buried deep at the end of a long series of function calls involving unexpected classes and unusual methods, etc.

This led me to thinking there would be value in a JVM written as a single cohesive codebase. And given that there is a 300+ page JVM specification and a reference implementation, I thought to myself, how long could this take? Two years later, and with help from two major contributors, we're still finding out! ;-)

Eventually, we hope, it will be a fun/interesting experience for users to pop open their Go IDE and watch a Java program execute--which is why we're intent on making sure it's written in 100% go.

In a larger context, Jacobin might eventually be useful as an embeddable JVM.


Since they emphasize cohesiveness and clear code, the goal seems to be more on the educational side. It doesn’t look like they’d want to implement JIT bytecode compilation.


Ummm, excuse me, but where the f&$k has this been hiding? I’ve been looking for ways to extend my go applications with scripting support. I started with Lua (worked ) then Python (worked but hacky) then javascript using otto [1]. However it lacks ES6 support so having pretty OOP js code is a non-starter. I would love to have Java as a runtime that can be executed from goroutines.

[1] https://github.com/robertkrimen/otto


Have you had a look at Starlark? Its a python-like scripting language built specifically for embedding into applications. Originally it was written in Java to be used in Bazel but its been re-implemented in go and rust and has found use in a bunch of other places.

https://github.com/google/starlark-go


Yeah, it’s cool but doesn’t fit my use case. I’m not scripting configuration. I’m scripting functionality. I need to be able to enforce interfaces and signatures easily all while having near-go-like performance.

I’ll play around with it some more and see if I can’t get some OOP pydantic-like stuff going with it. Otherwise it’s a non-starter.


If you want near-go-like performance, Jacobin doesn't help you either.


I can handle <10 orders of magnitude performance loss, I can't handle 100+ as I get with Lua and other scripting engines in go.


What was your experience with Lua if it worked? What were the disadvantages that kept you looking for other extension languages?


Not the parent but there are several high quality native (meaning no CGO) Lua implementations for Go and it's a great choice if you want an embedded scripting language:

https://github.com/yuin/gopher-lua

https://github.com/Shopify/go-lua

Unless you specifically need a JVM either of these will be a much more practical and mature choice for embedded scripting.

Alternatively if you prefer JS then Otto is a good choice: https://github.com/robertkrimen/otto


I used Shopify's go-lua. It was the simplest to integrate. Otto is what I'm on now but with a "lot" of shims to get ES6 classes to work.


Advantages, Lua is old and simple so implementing it was a breeze. Disadvantage, I kept running into scenarios where oop made more sense and extending an interface. I could have done that through convention, but it’s not for me. I need to “enforce” the interface more than just a try catch. That led me to Python. Which also worked and suited that need for interface contracts but it introduced new problems. How can I enforce memory guards and sandboxing and execute parallel Python contexts? I ran into road blocks and gave Otto a try as it’s written in go, I would have more luck making the necessary changes if I needed a feature.


I am a fan of the Jacobin project! For your uses, you may also want to consider wazero [1], a pure-go WebAssembly runtime. Full disclosure: I am on the team :)

[1]: https://wazero.io/


I’m doing distributed backends, not webassembly, how does wazero fit into that scenario?


I don't know what kind of use case you have in mind specifically, but wherever you would use a scripting engine, you can embed a small Wasm runtime instead and let your end-users pick their favorite* language

(*) as long as there is a toolchain that supports using it on Wasm :P


CUE is another interesting language to use from within Go, and is rather natural, given CUE is implemented in Go, but you can also do way more cool things with CUE via the Go API.

We're using CUE to validate and transform data, as input to code gen, the basis for a DAG task engine, and more

https://cuelang.org | https://pkg.go.dev/cuelang.org/go@v0.6.0/cue | https://cuetorials.com/go-api (learn about CUE)

https://github.com/hofstadter-io/hof (where we are doing these things)


Not remotely close to my use case. Data filtering and extraction seems to be its niche, a problem you’re solving using it. My use case is different.


indeed, I've been looking for something more scripty to use from Go programs as well. Curious about the Gp+LUA, but haven't had the time to try it

Generally, this subthread became several comments about using lang X in Go


Have you considered just using go?

https://github.com/traefik/yaegi


What didn’t you like about Lua for your problem domain? It’s used quite a lot in computer games, and even nginx supports it.


I’m familiar with it from my many years using it in game engines (and implementing it in them). My problem is it’s fluid by nature. The concern of validation and interface semantics are dependent on the implementation, rather than as an engine driving the process. I could expose some functions and write a bunch of docs on which ones to call first, or I could switch to a more OO approach. I chose the latter.

If I want to script behavior, Lua is great. Python as well. But I’m not scripting behavior - I’m providing an engine.


"I have spent the last eight months researching the JVM--reading the docs and articles and doing exploratory coding in various languages with which to write the Jacobin JVM."[0]

I wish I had the discipline to decide I wanted to do create a long-term pet project, spend months researching what kind of project and which programming language to implement it in, and to still be motivated enough to keep actively updating it 2 years later. Also, the code comments are educational, and the write-ups are inspiring. Great stuff.

[0] from http://binstock.blogspot.com/2021/08/a-whole-new-project-jvm... linked from article


Discipline gets a lot of attention around here.

My suggestion is to find something that piques your interest for a long enough time. Thought around the "Flow" concept [1] suggests to pick something that is just within reach of your capabilities.

And perhaps try meditation to exercise building up discipline.

It also helps to ask yourself several times per day: "What would be the smartest thing to do now?", until it becomes a habit. You will then accidentally think that phrase, which gives some freedom to override other (possibly) bad habits.

[1] https://en.m.wikipedia.org/wiki/Flow_(psychology)


All of these have one problem, they're fickle. Intrinsic motivation, flow.. things most (myself included) seek and love, are not guaranteed.

The experience of adjusting difficulty and investment (you feel tired, you have emergencies.. do less, but do something, have rest days), going slow and steady versus inspiration driven is key IMO.

What is hard is accepting the near invisible small steps that don't seem to bring you closer to your goal.. that's where motivation dies.


Motivation is fickle, discipline is about doing the action when you’re not motivated and don’t want to.

Motivation comes and goes. Discipline shows up for work every day, rain or shine.

I’ve worked on long term projects, and it’s the discipline that lets you power through those tasks you don’t want to do. It’s required to finish anything of substance.

One thing that’s really helped me build discipline is the gym. It’s hard, I usually don’t want to go. But even on the days I don’t feel it, I go and get a workout in, even if it’s not a great workout. Even if I just go for a walk in the park or on a treadmill some days.

Once you do something consistently enough, it becomes habit, and develops its own inertia, to the point where it can become harder not to do it. I think habit and consistency are important in maintaining discipline.


And the experience I was mentioning is well summed up by "develops its own inertia". You become fine tuned at adjusting activity to keep inertia in some range, avoid the zero which would cost you long time to recover that inertia, but don't over work yourself. Cause a life of overworking is dreadful, the key is finding the sweet spot. I even think we're build for that, the right kind and amount of stimulation keeps life interesting.


Discipline is just another word for motivation. You are motivated to start, and are motivated to keep going. The only reason you call it discipline is to give yourself the illusion that you can’t lose it, but you can


I disagree that they’re the same.

When you have motivation, you don’t need discipline. The work comes naturally because you want to do it. When that fades, and it will, discipline is the only thing keeping you from quiting.


I understand what you think the difference is but I think you’re wrong


I could say the same to you.


Can anyone comment on the intersection of discipline and flow? I've always seen it as, flow is when you're doing something you enjoy, and discipline is when you force yourself to do something whether you enjoy it or not. But I've often found that just putting in enough time (about 45min-1hr for me) will trigger a flow state, or "evaporate" most resistance I had to a task.

So paradoxically I found that it's easier to work for 2 hours than for just 1, since the whole first hour is just getting settled in. This seems to go against the pomodoro stuff (working in 25 minute increments).


I think it's important to note these approaches aren't one size fits all.

What you're saying rings true for me: it's easier to work for 2 hours vs. 1, and the pomodoro method absolutely has not worked for me the few times I've tried.

I have friends and coworkers who are the opposite though. Their barrier to the "flow state" is much lower than mine but they seem to fizzle out faster.

IMO it's all about experimenting with different approaches and finding blend one is to your taste.


I've been looking into the field of reinforcement learning lately, and that gives an interesting framework to think about topics like these.

It seems that flow is a state of being in which you are almost constantly rewarded for your actions. Discipline might be a way to get there.

I suppose that some people accidentally stumble into a state of flow, and others don't, but feel an urge to get there. In order to get there, they try things that give an instant reward (drugs, social media), or they try things that they think will give a big reward in the future (things you read in books or hear from wise people).

Now, the question is, how are these rewards set? Is this something you can fiddle with? Will discipline actually change what you like? I guess some people enjoy discipline by itself, which makes all this highly complicated stuff to investigate properly.


I'd love to make a highly integrated webserver. At this point it seems like spending tons of time getting caught up on what other people have done, why they have done it. Trying different techniques to see what actually seems to work. Where abstractions need to be and what provides the nicest transition between accessing different portions of stack. At this point I'm getting paid to work in tons of languages and integrating different stacks.

I wish somebody would unify these things but I'm running out of time in the day .

People that are able to commit time to projects like that are truly amazing.


Reading the sibling responses so far, my only advice is to not take advice from random strangers on the internet about this sort of thing. You will just get bad takes tainted by survivorship bias.


May not be the best advice, but a BSc/MSc thesis (or phd for the even more masochists) may give one just the right amount of pressure to be able to go through a bigger project. At least it did help me.


have you tried 'redbull' ?


I had no idea classes were allowed in a Jacobin system


slow clap


Good job sir, now please leave


To the guillotine? Talk about mark-and-sweep garbage collection.


Very cool!

Hey, you know a feature which I would love to see (and maybe it's already in there) -- would be the ability to "orchestrate" Java code. In other words, to be able to add external event hooks from functions/procedures/methods -- at runtime...

These event hooks, when encountered, would be able to do such things as:

* Print/pipe a debug message to an external program or log or log viewer;

* The debug printer should contain the function/procedure/method name, parameter names, and parameter values automatically (there should be functionality to have them in both the invocation and after the function/procedure/method's code is complete, prior to returning to the caller, so two possible places per function/procedure/method...);

* The ability to selectively turn on/off such events at runtime;

* The ability to add additional code which could be evaluated for every such event (also at runtime), and make the determination if the event should be processed or skipped;

* The ability to do all of the above programmatically, via API...

* Some sort of GUI which automatically imports all function/procedure/method names of a running Java program at runtime, then gives the user the ability to track/log whichever ones they want, by simply selecting them to a secondary listbox of tracked function/procedure/methods...

Now, maybe some or all of that -- is already baked in there. I didn't look at the codebase long enough to know...

But if it is in there, that's awesome! And if one or more of those features are missing, then maybe a future version maintainer or forked version maintainer might be persuaded to add them in there...

(Side note: It would be nice to have the above functionality for all programming languages!)

Anyway, as said previously, looks very cool!


That sounds like what the Java world calls aspect-oriented programming.

Also, GraalVM/Truffle has that sort of ability today, supported for any Truffle language:

https://www.graalvm.org/latest/tools/graalvm-insight/


Lead dev here. Some very cool ideas in there. Will copy/paste this into our task tracker on YouTrack for further consideration. Much appreciated.


Does "capable of running Java 17 classes" imply Java >= 17 or Java <= 17?

I tried to run a Java 11 jar on my M1 Mac with $JAVA_HOME pointed at a temurin-11.0.20 JVM but no such luck:

    $ ./jacobin -jar my.jar
    Class Format Error: Class  has two package names: apple/security and com/sun/crypto/provider
      detected by file: cpParser.go, line: 241
    ParseAndPostClass: error parsing classes/module-info.class. Exiting.
    Class Format Error: Invalid access flags of MethodParameters attribute #1 in main
      detected by file: methodParser.go, line: 333
    Class Format Error:
      detected by file: methodParser.go, line: 109
    ParseAndPostClass: error parsing my.Main. Exiting.


Thanks for your note. The package notes on Jacobin say that we strongly discourage folks from running it in its present form. There are enough features still to be implemented, that for anything but trivial classes, you won't have a good experience. TBH, we're about a year out (we think) from having a version we can solicit users to test.

Nonetheless, if you'd be kind enough to post the above error and the class you used into the GitHub Issues tracker [0], we'll definitely include it in our test suite and make sure whatever the problem is, it'll be corrected.

[0] https://github.com/platypusguy/jacobin/issues


While they seem to be "git commit && exit", there are sibling repos that are interesting, too:

https://github.com/platypusguy/jacobin-dart (A JVM written in Dart)

https://github.com/platypusguy/jacobin-swift (Jacobin JVM written in Swift)


Author here: Yeah, those were first attempts in other languages.


Just curious, what's the purpose of this? Are there any production use cases it's targeting, or is it just for fun?


Lead dev here. I gave a detailed reply earlier. Grep for "I've always thought of the JVM as magical technlogy" on this page and you'll find it.


I grepped and did not find


Here's the text I believe that you're looking for:

> I've always thought of the JVM as magical technlogy--I'm certainly not alone in that view. But in trying to learn more about it, I was greatly frustrated by the difficulty of reading the code base. As you likely know, the Hotspot JVM is open source. But reading the code is very difficult, in part because it grew organically and in part because of its unusual design, in which many actions are buried deep at the end of a long series of function calls involving unexpected classes and unusual methods, etc.

> This led me to thinking there would be value in a JVM written as a single cohesive codebase. And given that there is a 300+ page JVM specification and a reference implementation, I thought to myself, how long could this take? Two years later, and with help from two major contributors, we're still finding out! ;-)

> Eventually, we hope, it will be a fun/interesting experience for users to pop open their Go IDE and watch a Java program execute--which is why we're intent on making sure it's written in 100% go.

> In a larger context, Jacobin might eventually be useful as an embeddable JVM.


Thanks!


I heard the project repository suffers from a detached head.


heh


I wonder if it uses Go routines as the virtual threads implementation.


If I am reading the code correctly, and as a parachute code reviewer I may very well not be, I am not sure this supports threading at all, or at least, not user-level threading.

I can find references to a thread module and it clearly creates a "main" thread, but I actually can't find anywhere where it creates anything other than a main thread, nor do I see any code for task switching manually/cooperatively, or anything else like that. There's a global thread table but I can only find test code manipulating it. And the main core VM loop doesn't seem to have any concurrency in it to me.

Corrections welcome; posted fully in the spirit of Cunningham's Law as I'm curious too.


Author here. No threading currently. Once we have all the bytecodes executing, we will perforce be obliged to add threading. Starting with the heavier threading that Java pre-Loom uses.


Can you just use green threads (goroutines) from the get-go? Or, by Hyrum's law, is the fact that legacy Java threads are really OS threads part of Java's de facto public API?


You have to distinguish between the implementation and the interface from the Java point of view. In Go, you'd use goroutines. Even if you want to use an OS thread you use a goroutine and then pin it to an OS thread, and there's no reason to do that in this case.

However, inside Java it may look like the original Threading API. In that case you'd have a JVM running the Thread abstraction in Java on top of green threads in Go, while not supporting Java green threads, and there's no contradiction or even anything weird going on. Perfectly normal.


Right right, but I wonder if there is some weird edge case where some Java program relies on threads being real OS threads...like, for sure Java threads being OS threads is an observable feature of a running Java program (you can see it in htop!). And maybe someone has built a Java based system somewhere that uses that...


I recall seeing some really hacky Java implementations that used thread references for state management... Mostly, in that the same devs expected the same experience in the C# ASP.Net lifecycle and it lead to a lot of wierd race bug conditions in practice.


That's a good question. I don't know enough about the new green threads to give you a definitive answer, but it's certainly something we'll examine closely.


I concur. Native methods have hardcoded implementations, wired up here, and there is nothing for java.lang.Thread:

https://github.com/platypusguy/jacobin/blob/18c541820cb73bda...

I couldn't figure out what happens if you invoke a native method that has not been set up like this.


They’ll have to implement java.lang.Thread somehow, or not a lot will run on that JVM.


Lead dev here. Yup, that's right. There are a few classes that we will be forced to implement in Go. We're fixated on keeping this number as low as possible, but Thread is one that is inescapable. Class is another inescapable one--if we're to support reflection, etc.


Correct, implementing the core libraries is actually a massive effort. You can't just take the Hotspot one because it is a massive cumbersome codebase and it would force your JVM to work like Hotspot internally. (Let's ignore for a moment the question whether that would even be legal)


Project Loom is not included in JDK17.


Interesting name choice


>The goal is to provide a more-than-minimal implementation of the JVM that can run most class files and JARs

As with everything, the final 10% is 90% of the work.


Lead dev here. It's been an amazing amount of work. I'd change the above adage to: Every 10% is 90% of the work!


I can't help but wonder if this means Java might work on Plan 9/9front with this.

But I'm also not sure that's something I'll ever need.


I don't know much about this project, or about JVMs in general. But could this run Minecraft? :)


So the garbage collected language is written in a garbage collected language? How does that effect the system?


Lead dev here. A significant portion of the JDK and the JVM is written in Java. There is much that happens in the JVM were gc is a good thing. Classes are constantly being instantiated and then discarded. Having those discarded classes GC'd is a benefit indeed.


The real question is, can it run clojure JARS and does that improve clojure start time?


Will it run clojure?


Initially I was very confused; why would a socialist magazine have an article about the JVM?



Author here: Long ago, when I got the domain, the project was going to be JAva COmpiler to BINary = Jacobin. And as the Jacobins were revolutionaries, it sort of fit the intended project. I don't recall whether Jacobin Magazine was in publication then or not TBH.


Really? Wouldn't your initial thought be that 18th century revolutionaries created a JVM?


So Go GC + Java GC?



The Go GC could be the Java GC.


Then it would be hard to write this in Go. Can't see how you'd be able to hook into the Go GC in an application written in Go.


Every Java reference to an object can be a reference to the allocated Go object for that object. The Go objects are the Java objects with no extra indirection. The Go object would be bytes with each reference field being a struct with a pointer. There should be a low level use these bytes for 'allocating' this object.


Thanks. That makes a lot of sense and isn't as complicated as I thought it would be.


From TFA:

> An important factor in reducing the size of the codebase and executable is that Jacobin relies on Go’s built-in memory management to perform garbage collection, and so it contains no GC code.

I don't see how you can't see it.


I can see how using Go's GC reduces the codebase.

But I don't see how that reduces binary size.

Compiled Go programs include a Go runtime in their binary, correct? (statically linked).

So if Go program uses Go's builtin GC, that GC wouldn't be part of the program code, but would be part of the runtime that it's packed with in the binary, right?

In essence replacing [GC in Go program part of the binary] with [GC in Go runtime part of the binary].

I mean, I can see the advantage(s) there. But reducing binary size doesn't seem to be one of them. Unless you count [Go program using its own GC + the one in Go runtime] vs. [only the one in Go runtime].

It's still a (read: at least 1) GC included in the binary.


>But reducing binary size doesn't seem to be one of them. Unless you count [Go program using its own GC + the one in Go runtime] vs. [only the one in Go runtime]

Yes, of course that's what they mean. Not having 0 code for GC altogether, but just using the GC that you already have, and is required overhead, as opposed to implementing another and having both.


You're right, but that's what's meant - using the Go GC vs also implementing a custom one.


From the project page: GC is handled by the golang runtime, which has its own GC

This is pretty cool, imo.


You couldn’t necessarily “hook into” it from the Java side, as it would be an opaque interface. But the Go side would make the Java GC process “just work”.


I believe it's a quit common practice to delegate the GC needs of a GC-language implemented on top of another GC-language, on the latter's GC.

As long as the language is GC agnostic, and just cares for stuff being collected, and not for some particular exposed semantics of the GC, it should be ok.


Go GC is already written in Go.


It'd be hard not to hook into the Go GC in an application written in Go...


Inception GC


only go's gc. there's no gc code, if I read correctly.


I once said that "just as there's no founder like Elon, there's no Java expert like Andrew (project author)"

Andrew's Java skills are at a different level. I only briefly met him once, but he used to be a core contributor to Oracle's Java magazine. 80% of the "Unix greybeard" type advanced Java knowledge I have is due to his articles.

Happy to see he's still building. Thanks Andrew!


Thank you for such kind words!


Ok why are you doing this? To see if some end around to another language GC is faster?


I can see it being used to bootstrap OpenJDK, particularly on platforms other than Windows, macOS or Linux.


So, where is Andrew's x/twitter?



seems his blogspot is also full of stuff http://binstock.blogspot.com/


For a second I wondered why the Jacobin would publish a story about a JVM…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: