I discovered this project when I was taking a look on Wizard, a fast and low-resource Wasm interpreter [1].
One of the things that excites me about Virgil is that is completely self-hosted (the runtime is fully implemented in Virgil, and it can compile itself).
Curiously enough, the person that created Virgil (programming language) and Wizard (Wasm runtime) was Ben L. Titzer, who worked on the V8 team at Google and co-created Wasm. I'm pretty excited with what he can bring to the table :)
Update: I'm trying to have virgil to self-compile into Wasm/WASI, and will try to upload it to WAPM [2] so everyone can use it easily (as the first compilation of v3c currently requires the JVM)... stay tuned!
> (as the first compilation of v3c currently requires the JVM)... stay tuned!
It will bootstrap using whichever stable binary appears to work on your platform.
The compiler does have support for specifying arbitrary Wasm imports and I wrote most of the "System" module against WASI. The only thing missing is 'chmod' to flip the permissions on a just-generated executable, but other than that, it should be possible to bootstrap on WASI.
I'm glad to see that this language has a garbage collector, in a time when "lightweight" languages increasingly forego GC and memory safety. Even in the Harvard architecture of WebAssembly which mitigates many of the security problems that lack of memory safety causes, memory safety is the right choice.
Thanks. I feel I have gotten tons more done over the years by having GC. It's just so much more productive. For the really nitty-gritty low-level details of a particular platform, like directly calling the kernel to set up signal-handling and memory protection, do I/O, etc Virgil has unsafe pointers. On targets like the JVM or the built-in interpreter where there are no pointers, there is no Pointer type.
I am also working on a new Wasm engine and I have a neat trick (TM) where the implementation of the proposed Wasm GC features are implemented by just reusing the Virgil GC. So the engine is really a lot simpler, doesn't need a handle mechanism, doesn't have a GC itself, and the one Virgil GC has a complete view of the entire heap, instead of independent collectors that need to cooperate.
Not GP, but the following come to mind: Jai, Zig, Odin, and Hare, all of which aspire in one way or other to be a modernized take on C. There is also a larger class of languages I call "safe-ish" as they are generally safe for single-threaded programs but can exhibit undefined behavior from data races; this includes Swift and Go, and likely other newer languages inspired by those.
> There is also a larger class of languages I call "safe-ish" as they are generally safe for single-threaded programs but can exhibit undefined behavior from data races; this includes Swift and Go, and likely other newer languages inspired by those.
Go does have a garbage collector though, so maybe the conflating of GC with safety (not by you, but earlier in thy thread) is a bit misleading.
Not only is a “heavyweights of the rust community”, he was literally one of the main designer of the language at Mozilla (he's still contributor #6 by commit[1] despite not having worked on it for the past 7 years!)
This is becoming a pointless meta, but the parent comment didn't indicate in any way that I was talking to a "professor". The comment said, great that there are more languages with GC. I disagree whoever may say that.
I'm not a "professor" but as a software engineer with 35 years in this industry I can say that new languages should avoid GC's (as in, generational and related) and stick to either ARC or Rust-like compile-time memory management.
Just because the original comment is by, let's say, a prominent figure, doesn't make it right.
P.S. I rarely downvote out of disagreement, only for comment quality.
> I'm not a "professor" but as a software engineer with 35 years in this industry I can say that new languages should avoid GC's
With respect, and much less experience than You, I really don’t think so. I believe the majority of languages are better off being managed. Low-level languages do have their place and I am very happy for Rust that does bring some novel idea to the field. But that lower detail is very much not needed for the majority of applications. Also, ARC is much much slower than a decent GC, so from a performance perspective as well, it would make sense to prefer GCd runtimes.
ARC is in fact faster than GC, and even more so on M1/M2 chips and the Swift runtime. There were benchmarks circulating here on Hacker News, unfortunately can't find those posts now. GC requires more memory (normally double the amount of that of an ARC runtime) and is slower even with more memory.
How can more and sync work be faster than a plain old pointer bump and then some asynchronous, asymptotic work done on another thread? Sure, it does take more memory, but in most cases (OpenJDK for example) it is simply a thread local arena allocation where it is literally an integer increase, and an eventual copy of live objects to another region. You couldn’t make it any faster, malloc and ARC are both orders of magnitude slower.
ARC, while in certain cases can elide, will still in most case have to issue atomic increases/decreases that are the slowest thing on modern processors. And on top it doesn’t even solve the problem completely (circular references), mandating a very similar solution than a tracing GC (as ref counting is in fact a form of GC, tracing looking it live edges between objects, ref counting looking at dead edges)
I'm not familiar with the details but it is said that Swift's ARC is several times faster than ObjC's, it somehow doesn't always require atomic inc/dec. It also got even better specifically on the M1 processors. As for GC's, with each cycle there's always overhead of going over the same objects that are not disposable.
Someone also conducted tests, for the same tasks and on equivalent CPU's Android requires 30% more energy and 2x RAM compared to iOS. Presumably the culprit is the GC.
That’s a very strong presumably, on a very niche use case of mobile devices.
It is not an accident that on powerful server machines all FAANG companies use managed languages for their critical web services, and there is no change on the horizon.
It might be because on the server side they usually don't care about energy or RAM much. The StackOverflow dev team has an interesting blog post somewhere, where they explain that they figured at one point C#'s GC was the bottleneck and they had to do a lot of optimizations at the expense of extra code complexity to minimize the GC overhead.
It is actually quite rare that companies think of their infrastructure costs, it's usually just taken for granted, plus that there aren't many ARC languages around.
Anyway I'm now rewriting one of my server projects from PHP to Swift (on Linux) and there's already a world of difference in terms of performance. For multiple reasons of course, not just ARC vs. GC, but still.
With all due respect, (big) servers care about energy costs a lot, at least as much as mobile phones. By the way, out of the manages languages Java has the lowest energy consumption. RAM takes the same energy whether filled or not.
Just because GC can be a bottleneck doesn’t mean it is bad or that alternatives wouldn’t have an analog bottleneck. Of course one should try to decrease the number of allocations (the same way you have to do in case of RC as well), but there are certain allocation types that simply have to be managed. For those a modern GC is the best choice in most use case.
Skimming the docs, I was surprised to see that there appears to be no built-in support for foreign function calls to C libraries ("It's a bold strategy Cotton...")
This makes more sense in light of a comment by the author in a previous discussion about C ABIs [1]:
> Virgil compiles to tiny native binaries and runs in user space on three different platforms without a lick of C code, and runs on Wasm and the JVM to boot. [...] No C ABI considerations over here.
Indeed, I've wanted to see how far I can get rebuilding userspace from the ground up. Virgil does that by exposing a raw syscall primitive on each target architecture, so when you target x86-linux you get "Linux.syscall<T>(int, T)". The compiler knows the calling convention of the kernel on each target platform, so it just puts things in the right registers and does "int 80" or "syscall". So library and runtime code that implements I/O, signals, etc just dial up the syscalls they want to get off the ground. So there's no need to resort to assembly; I view asm as the compiler's job.
Doesn’t that not work on many OSes since they treat libc as the stable ABI surface and make no guarantees about the syscall interface? If I recall correctly that’s what gave Go so many headaches on MacOS as they chose a similar strategy until, if I recall correctly, they abandoned it in recognition that it’s not tenable.
For the subset that Virgil uses to get off the ground, I haven't been broken by the kernel changing system calls. MacOS has been a pain for a number of other reasons though, not the least of which is deprecating 32-bit altogether...with a student's help I finally got around to generating x86-64 Mach-O binaries. That works on x86 macs again. But something is still wonky and they don't run under Rosetta 2.
Linux is rock solid though. I've never been broken by the kernel.
That’s because Linux is very unique in treating the syscall boundary as the ABI. I’m not sure which other OSes do that. Maybe Windows? Not sure. Certainly none of the BSDs.
Linux is the only operating system that guarantees kernel binary interface stability. You can discard the entire user space and write dependency-free software that interfaces with Linux directly. You can even boot Linux straight into your own program and bring up the system yourself.
This single function is all you need to do literally anything on x86_64 Linux from writing to a file descriptor to graphics card ioctls:
In all others, interfacing with the kernel directly eventually leads to breakage because the people in charge change the system calls. Go binaries broke on OS X because of this. We're not meant to bypass their system libraries.
It’s not that crazy when you consider that C has been the lingua franca of OS development for 50 years with a very thin library interface that’s been ported to every single platform under the sun. That’s a pretty sane maintenance choice all things considered since every single other language linked against libc for those 50 years, especially since the underlying hardware and OS has evolved (easier to maintain a highly standardized C library with the high level semantics you wish to standardize vs magic values in registers that are fixed for the duration of that platform and sometimes even across platforms.
I think Go was the first language that tried to buck that trend and it did not go well (aside from Linux).
I almost skipped over this because of WASM in the title. For me, WASM is neat but not actually something I need and just an extra layer for just running a program on a computer. So it's good to see it supports other targets. Now it just needs ARM support ;).
No code samples without digging into links in the documentation doesn’t seem very inviting. Especially when the project itself is a programming language.
Documentation could definitely use a nicer coat of paint. Not sure how to do syntax highlighting in raw markdown, and GitHub won't accept a new language definition unless it is in "hundreds" of projects. So maybe an image?
I've used a markdown to html converter to convert my blog posts into HTML with very nice and customizable code samples... in my case I used Go's Blackfriday library with bfchroma[1] doing syntax highlighting with Chroma[2]. To add your language to Chroma you have to provide a lexer, which in turn is written in Pygments[3] syntax.
Once you have that, you can post your docs in GitHub Pages (or something like Netlify[4] or Cloudflare[5]), they both can run a command to build your website (from markdown to html) every time you push to a branch, and then serve the HTML generated as a static site.
Before this though, your language seems similar enough to others (maybe Java or C#?) that if you tell the converter to use those languages, you'll get decent enough highlighting. I did this to highlight Zig code before it became supported by telling the converter it was typescript code (coincidentally, many keywords seem to have aligned well enough)!
Language implementations that share a compile target have the same lower bound for performance (being fast) but they have infinite and unrelated upper bounds for performance (being slow).
Let's say there's a Python to C compiler (there is) and let's say there's a C++ to C compiler (there was, may still be idk). The performance characteristics of programs compiled via both won't be very similar at all. They can't be any faster than C. They can both be much slower than it. They'll differ wildly from each other.
No, they don't perform the same, just like different languages that compile to native machine code don't all perform the same. It depends a lot on the compiler, which depends on both the language and other goals like compilation speed, memory/storage usage, effort making the compiler, etc.
Wasm doesn't yet have a GC. So, when you compile languages requiring GC like python or Go, they have to include that GC in the resulting wasm binary, their wasm GC implementation might not be as performant as native either. On the other hand languages like C or rust doesn't have this overhead. This makes wasm binaries from GC languages big in size and will affect their performance.
AFAIR, JVM used to not have an u64 type, only i64; does JVM now support it? Or how does Virgil handle it? If there's some custom overhead added to simulate u64 semantics on JVM, that should be clearly documented in a language purporting to be fast - yet I didn't see it. Did I miss it?
I'm asking also because if the compiler supports JVM as a target, it shouldn't be hard to add support for a Dalvik target; however, u64 is also a problem in case of this one.
One more question I have is about FFI - I didn't find a mention of it after a quick skim; can I call functions from some thirdparty JARs or JS/WebAPI?
You're right that the "long" type in Java is signed. The JVM uses two's complement representation, so for many operations, unsigned arithmetic is bit-equivalent to signed arithmetic. For the remaining ones, like less-than, divide, shift, etc, Virgil generates more complicated code to do unsigned arithmetic or comparison manually, e.g. by first checking if an input is negative.
> One more question I have is about FFI - I didn't find a mention of it after a quick skim; can I call functions from some thirdparty JARs or JS/WebAPI?
For the Wasm target, Virgil allows you to write an imported component, so the module that gets generated has the imports with the signatures that you want. You can then load the module in JS and supply Web bindings and such. That latter process is quite clunky, but in theory gives you access to any API that can be expressed in terms of wasm (before externref).
The JVM doesn’t have u64, but it does have standard lib functions like unsigned add that do view their i64 arguments as unsigned and will reliably get JIT compiled down to the correct machine code.
The resulting code in Java would not be the most readable, but it is absolutely fine as a target.
You cannot bootstrap any language entirely from source, you always need a compiler or interpreter written in another language.
Virgil bootstrapped for the first time using an interpreter I wrote in Java, back around 2009. When the new compiler could finally compile itself well enough to be stable, I checked in the first bootstrap binary, a jar file. Since then, every once in a while (41 times so far), when a major set of bugfixes or new features is done, I've rev'd stable by checking in binaries that are generated by the first compiling the existing code with the stable compiler, and then compiling the compiler with that compiler. I generally wait several stable revisions before using new features in the compiler. What that means is that you can always compile the source in the repo with the stable binary in the repo, and that compiler can compile itself again too, and both should behave identically. You can usually even go back a revision, but I've never had to do that.
There is a full interpreter built into the compiler as well, so if there is a bug in the stable compiler's codegen that is a showstopper, it can be fixed in the source and then the new source run in the interpreter of the old compiler in order to get a new stable binary. I've never had that happen, though.
I'd encourage you to write a smaller implementation for bootstrap of Virgil in a couple of other languages, so that folks who don't trust binaries can still use your language.
The Bootstrappable Builds folks are working on this for every part of a modern Linux distro. The main approaches are alternative smaller implementations written in other languages (including interpreters) and also compiling with chains of older versions that were written in other languages. They are also working on a full bootstrap from ~512 bytes of machine code plus a ton of source all the way up to a full Linux distro. They have gotten quite far in that and are continually improving the situation everywhere.
Do covariant return types moot the need for using double dispatch when implementing Visitor (for statically typed languages)? If so, that'd close at least one ergonomic gap between dynamic and static languages.
--
Whinging:
Virgil is practical. It uses modern techniques to address programmer's actual needs.
I love both functional and imperative programming. Separately. I do not want multiparadigm. I do not want metaprogramming in my bog standard data processing code. I do not want exquisite puzzle boxes (inspired by Haskell and ML).
For most of my code (done in anger), I want data centric, I want composition, I want static typing. To noob me, it appears Virgil is on the right path.
For just one example, Java jumped the shark. Specifically annotations, optionals, and lambdas. (I grudgingly accept the rationale behind type erasure for generics; it was a different time, when backward compatibility reigned supreme.)
We need features, often syntactic sugar, for the 98% of our daily work. String intrinsics (finally!), intrinsic regex expressions, intrinsic null-safe path expressions (not LINQ), tuples, multiple return values (destructuring), etc.
I want concision without magic.
In Java's defense, specifically, I love many of the JEPs of the last decade. Project Loom is a game changer. Switch expressions are great. Ditto values types and records. (There's more, but you get the idea.)
Also, project Zig embraces the practicality vibe. And shout out to D language.
It's sad to see a relatively new language replicating the mistake that is the class/inheritance model. That plus a lack of modern convenience features like if/else and block expressions leave me struggling to think of who might use this.
In Virgil you can write in a very object-oriented style, or not. There are first class functions, partial application, ADTs, proper type parameters, and so on. Heck, you can write C style with just functions and basic structs if you want; there aren't built-in classes or methods, so nothing forces you into any particular paradigm.
My own style has evolved as I've learned over the years. I use classes less and make them smaller. I use enums and ADTs a lot more these days. But I rarely go whole-hog functional unless it is in tests.
“ Languages are becoming increasingly multi-paradigm. Subtype polymorphism in statically-typed object-oriented languages is being supplemented with parametric polymorphism in the form of generics. Features like first-class functions and lambdas are appearing everywhere. Yet existing languages like Java, C#, C++, D, and Scala seem to accrete ever more complexity when they reach beyond their original paradigm into another; inevitably older features have some rough edges that lead to nonuniformity and pitfalls. Given a fresh start, a new language designer is faced with a daunting array of potential features. Where to start? What is important to get right first, and what can be added later? What features must work together, and what features are orthogonal? We report on our experience with Virgil III, a practical language with a careful balance of classes, functions, tuples and type parameters. Virgil intentionally lacks many advanced features, yet we find its core feature set enables new species of design patterns that bridge multiple paradigms and emulate features not directly supported such as interfaces, abstract data types, ad hoc polymorphism, and variant types. Surprisingly, we find variance for function types and tuple types often replaces the need for other kinds of type variance when libraries are designed in a more functional style.”
> Features like first-class functions and lambdas are appearing everywhere.
Smalltalk the proverbial Object Oriented language already had first-class functions a.k.a lambdas (called BlockClosures in Smalltalk). So it's not like you should choose between OOP-classes OR lambdas. Having them both is better than having just one of them.
Ironically, Smalltalk syntax for lambdas/closures remains my favorite. No “trailing closure” hack that doesn’t scale and looks ambiguous (looking at you swift). No one line limitation (looking at you Python). No ambiguity between function/method bodies and closures (looking at all of you members of the curly brace Algol descendents). No “you have to reference all of the arguments or you can’t use it” (looking at you Elixir).
There were two capital ironies with Smalltalk’s free functions (closures). Smalltalk USED them. It may have been “objects all the way down”, bit they drank closures the whole way. Both the extensive class library and your own code. I coded and invoked more closures in Smalltalk for a given unit of functionality than I have in any other language to date.
What’s even more ironic to me is that closures/functions in most compiled languages are a semantic facade. You see them in your mental model, but the language/library doesn’t model them for you to interact with. In Smalltalk, you could send messages to closures, AND you could add your own.
I have always found it ironic that in Smalltalk, the “only objects ultimate”, I was actually exposed to closures/functional patterns more than I have been in many other systems that supposedly were more about just that.
Right, in a language where "everything is an Object" it only makes sense that closures should be Objects as well, and that you can interact with them by sending messages to them. Then add a little bit of syntactic sugar, to make the code more succinct.
Another innovation of Smalltalk which may not be familiar to everybody is that all control structures in Smalltalk are implemented by passing closure-objects as arguments to methods like ifTrue:ifFalse .
Good point. But that's a technical implementation optimization, which the programmer does not need to be aware of.
It's like saying that functional languages implement tail-recursion as iteration in order to "get rid of" recursion.
But the programmer can still think in terms of recursive calls in their program, and can reason about the correctness of the program by assuming it works by recursion.
I think Python does allow you to play with closure itself by accessing __closure__. Although it is immutable by default, it is still easy to construct another function from code object and closure directly.
I don't know this language but I assume that using the class/inheritance model is optional, like in JavaScript. If you think it helps with your problem domain then use it.
I'm not sure "compiles to WASM" is a selling point anymore. Are there any serious languages that don't? You'd need to have no way of expressing as C code or LLVM IR, etc.
One of the things that excites me about Virgil is that is completely self-hosted (the runtime is fully implemented in Virgil, and it can compile itself).
Curiously enough, the person that created Virgil (programming language) and Wizard (Wasm runtime) was Ben L. Titzer, who worked on the V8 team at Google and co-created Wasm. I'm pretty excited with what he can bring to the table :)
Update: I'm trying to have virgil to self-compile into Wasm/WASI, and will try to upload it to WAPM [2] so everyone can use it easily (as the first compilation of v3c currently requires the JVM)... stay tuned!
[1]: https://github.com/titzer/wizard-engine
[2]: https://wapm.io/