Hacker News new | past | comments | ask | show | jobs | submit login
I’m porting the TypeScript type checker tsc to Go (kdy1.dev)
450 points by msoad on Jan 25, 2022 | hide | past | favorite | 241 comments



I'm the author of esbuild. I think this is a very smart, practical choice and is one of the approaches that has the best chance of succeeding (the other being a semi-automated porting tool instead of a purely manual port). I'm very excited to see someone take this approach. This is how I would do it if I were to do it. Web developers around the world are hoping this works out :)

The hard part about this project isn't doing the port but keeping up with the constant changes from the TypeScript team, especially if the TypeScript team isn't invested in helping you out. Staying as close as possible to the source language like this will be a big win there.

One of the big benefits of a native language like Go aside from lack of a JIT is the ability to easily add shared-memory parallelism. I haven't studied the TypeScript compiler internals so I'm not sure how easy it is to parallelize their type checker. However, I assume that it would be relatively straightforward to parallelize the parsing of JavaScript files, so I'd hope for at least be a big speedup there.


Unfortunately TypeScript is doing all of type-checking synchronously mostly due how it needs to build a global symbols list first. There are some hints that they might move towards more parallel processing but it requires a lot of major changes

https://github.com/microsoft/TypeScript/issues/30235


Yeah I was afraid of that. Ideally it would be possible to do type checking in parallel at the function level or something, but TypeScript allows you to omit the return type and then infers the return type from the function body which adds additional dependencies and makes that more complicated to parallelize. I wonder if it would still be possible though. Maybe just for functions where the return type and all argument types are present? Then the approach would be sort of outside-in where you type check the unparallelizable global stuff serially first and then type check the remaining isolated function body stuff in parallel afterward.


Love ESBuild! Convert a relatively large project from webpack to it today, really easy to do & much faster build times now!


> the ability to easily add shared-memory parallelism

I’m honestly not sure if there is any popular language where this isn’t true?


The problem is that you can't use shared-memory parallelism in TypeScript. If you could, then the most straightforward way of speeding up the TypeScript compiler would be to parallelize the compiler, not to port it.


I just don’t see how Go is more uniquely suited to do this than any other language with the same feature and admittedly I may have read too much into your comment.


That comment wasn't about Go exclusively. My point was that porting single-threaded code to a native language doesn't necessarily cause huge order-of-magnitude speed gains, especially if you plan on sticking close to the original structure of the code to make porting future changes easy, but that in this specific case a port sounds likely to realize significant gains (which is very exciting!).

Shared-memory parallelism is necessary but not sufficient. Ideally you'd be able to only make some small changes in a few spots to greatly increase parallelism. I'm hopeful that TypeScript's AST parsing is a good candidate since ASTs can typically be constructed independently from the rest of the program state. Lowering and printing ASTs is also likely parallelizable. It would also be great if it was also possible to parallelize parts of the type checker somehow, but that sounds harder. I had some ideas about that in another comment. Anyway I think this means that it's likely that a port of the TypeScript compiler could realistically lead to significant speedups and may even lead to extreme speedups.

As far as language choice, it sounds like due to maintenance reasons the decision space is narrowed to native, garbage collected languages similar in object model to TypeScript but with parallelism primitives. Some candidates could be at least Go, JVM-based languages, or .NET-based languages. I think those can all be ahead-of-time compiled to a static binary? Another consideration is how easy/compact/fast/cross-platform binaries are. Go makes this trivial but I haven't used JVM or .NET AOT builds myself so I don't know how they compare in e.g. executable size or startup overhead. One advantage of picking Go is that esbuild has already demonstrated that Go is an effective choice for this problem domain, which may mean less work/research/unknowns. A disadvantage of Go here might be that the object model is different than TypeScript vs. something like Java or C#. But all else being equal the choice comes down to personal preference.


node.js Workers + SharedArrayBuffers?

I'm not sure how well supported they are / haven't used them, but I think they enable this?


Yes that allows for shared memory. But the TypeScript compiler is written in TypeScript, and you can't put JavaScript objects (such as the TypeScript AST) in a SharedArrayBuffer. So you'd have to port the TypeScript compiler to another language that can target multithreaded WASM to do this.

You could also try sending JS objects between workers instead of using SharedArrayBuffer. But that copies the memory, which can take a significant amount of time, and of course also excludes certain kinds of advanced parallelism that require shared memory (instead of copied memory). Another drawback of using the JS worker model like this is that having multiple JS VMs probably means fewer CPU cores available due to there being multiple independent garbage collectors. I observed an effect like this when I tried to parallelize esbuild's JavaScript plugins. Running them in separate workers seemed to cause the gains from parallelism to stop when there were half as many workers as CPUs. I'm guessing is because half of the CPUs are busy collecting garbage for the other half. Using a language like Go with real shared memory presumably makes it easier for fewer resources to be spent on garbage collection.


Folks at Parcel wrote serializers that uses SharedArrayBuffer for their core stuff like graph[1]. I agree with parent comment that this didn't have to happen in Go, JS would've been fine if they were willing to depart from the source enough.

[1] https://github.com/parcel-bundler/parcel/tree/v2/packages/co...


Python? JavaScript? Ruby?


love esbuild, ditched webpack and rollpack since then. been a fan of golang too since then.


This reminds me of when people try to make a new Ruby implementation. They get the basics down and think "this isn't so bad! I can do this!". Fast forward a few months and they're stuck on some nasty little edge case around eval and method_missing that they need to run Rails. Or some weird aspect of C extensions.

TypeScript's compiler is an unholy, unsound mess, because JavaScript is an unholy, unsound mess. Sure the basics of type checking structs and variables is somewhat easy. But what about conditional types? What about weird generic constraints? What about indexing types? There's some really confusing stuff going on underneath the hood.

It's also a mess that *does not have a spec*. Even if you somehow manage to make a type checker that implements most of the features, there's no way to ensure it has 1 to 1 parity with TypeScript. Plus that parity can be broken at any point.

The only way I could see this happening is if the author somehow convinced Anders Hejlsberg or a significant portion of the TypeScript team to work on this new implementation. I'm not sure that's gonna happen.


I am absolutely blown away at how good TypeScript's type checker is, given how much of an unholy mess JavaScript is, and that the TypeScript language cannot just deprecate all the awful parts.

More than once I've yelled at my screen, "HOW DO YOU KNOW THIS?!" about some inferred or narrowed type.


However, more than once I've also yelled "HOW DO YOU NOT KNOW THIS?!" at something or other :)


Just today I was triggered that this compiles.

declare foo: never[] // it can only be an empty array

foo[1] // compiles

TypeScript is a nice type system, it's surprisingly expressive and flexible, like e.g. in Haskell I was surprised that there's this strange notion that for a given type you can only have one `Eq` or `Ord` instance for some type A.

But in TypeScript you can have as many Ord interfaces as you want. I may want to order some `User` type by last login, first login or whatever I want, but in Haskell you need to overcomplicate it and create a newtype.


> you can only have one `Eq` or `Ord` instance for some type A.

This is a good thing - it stops you from using one Ord instance when putting As into a Map or Set, and then trying to pull them back out using another. Newtypes let you provide "additional" instances in a safe way.


Why shouldn't I be able to use as many Ord instances as I want?

I'll give you a trivial example I have some data type representing a TvShow, why should there be just one way to sort a collection of TvShow? Maybe I want to sort by year, maybe by length, maybe by category.

It's trivial to do in fp-ts, you can have as many Ord instances as you wish.


Ord is for the natural (obvious and hopefully uncontroversial) ordering of a given type.

If your type doesn't have a natural ordering (as your TvShow doesn't) then you would use sortBy to supply an arbitrary compare function (just like what you would define in your Ord instance).


You seem to be a bit confused because indeed there's no such thing as "natural, uncontroversial, obvious ordering".

Ordering (and more specifically a total ordering) in mathematics is a set and a binary relation that has the transitive, irreflexive and connected properties.

Not even natural numbers have "uncontroversial natural ordering", as I guess even you can think of at least two different binary relations (<= and >=) that form two different orderings for N.

There's a lot of orderings more, here's one that sorts first odd and even numbers: "0 < 2 < 4 < 6 < ... < 1 < 3 < 5 < 7 < ...". And there's others like this.

https://en.wikipedia.org/wiki/Gray_code

Natural numbers in fact have (or better: form) an aleph one types of orderings. None is "special, uncontroversial or natural" just because we're used to think about the default <=.

The concept of ordering requires two things: the data type and the binary relation. It's a pair of things we can represent as (X, compare) but which compare you choose isn't implicit, natural or magical.

When talking about Haskell, it has the concept of type classes, where types and their behavior are bundled together. That's a design choice of Haskell which has pros and cons, but there's no mathematical foundation for a data type to have one preferred ordering.

Ocaml, Scala and most pure fp libraries I know do not have the limitations of Haskell when it comes to the concepts of ordering and equality, in fact in most of those when asking for an ordering ask you for a data type and a compare function. Exactly what the mathematical definition requires you to.

Not a typeclass where those are bundled together for some design decision.

Hope I clarified you that there's no such thing as "uncontroversial natural obvious" ordering, because there's no such things in mathematics.


the argument is that usually you do not want that, it is a Haskell typeclasses vs ocaml modules discussion


They just said why it's important to explicitly track which instance is in effect and newtypes do that.


I wouldn't expect never[] to mean 0 length array, tbf. It's a variable length array of _type_ never. It should be something akin to:

    declare foo: never[0];


I think the correct solution is:

    declare foo: []; // fixed-length array (tuple) without any items


JS is a strict language, so an array containing one or more bottoms is a bottom. The only legal value is the empty array, at least in theory.


foo[1] is undefined (literally the undefined symbol), in JavaScript index-out-of-bounds is ok.

Although i don’t know if TypeScript asserts that x[i] is T | undefined if x is T[] and the length is non-trivial. In strict mode this would be really useful as you can just do x[i]!, but it requires you to think a bit about index-out-of-bounds or at least throw on the undefined case.


> i don’t know if TypeScript asserts that x[i] is T | undefined if x is T[] and the length is non-trivial

There's a flag for that: https://www.typescriptlang.org/tsconfig#noUncheckedIndexedAc...


Just a note that TypeScripts type checking & compilation are two separate systems. It's never a compilation error because it always compiles.


In TS this is controllable. If noEmitOnError is true, then any error (including type errors) will stop it emitting JS output.


I believe the type you're looking for is `[]` (a tuple with no elements) instead of `never[]`.

https://www.typescriptlang.org/play?#code/CYUwxgNghgTiAEYD2A...


Yes I know, but `never[]` should also be unaccessible by index since it can never have any value.


variable length arrays are accessible at all indexes,

const a = foo[1]

should result in a bring of type `never|undefined` or `undefined` depending on flow analysis


> But in TypeScript you can have as many Ord interfaces as you want. I may want to order some `User` type by last login, first login or whatever I want, but in Haskell you need to overcomplicate it and create a newtype.

I don't think that's accurate. Typescript only has one instance of Ord per type and it's inconveniently implemented through the method toString.

To usefully sort anything other than strings in Typescript, you are using a function where you provide your own comparator at runtime. That's more analogous to Haskell's `sortBy :: (a -> a -> Ordering) -> [a] -> [a]` or `sortOn :: Ord b => (a -> b) -> [a] -> [a]` which don't put any type bounds on the type you're sorting.


I've managed to get the compiler to typecheck

  const x: Thing = exprA;
  const y: Thing = exprB;
  const z = [x, y];
but somehow not:

  const z: Thing[] = [exprA, exprB];
although exprA and exprB were pretty nasty typing wise.


> but somehow not: const z: Thing[] = [exprA, exprB];

Doesn't

    const z = [exprA, exprB] as const;
    type GnarlyThingTuple = typeof z;
or something like this help?


Where you trying to create a tuple? Does `const z: [Thing, Thing] = [exprA, exprB];` work?


I'm the opposite. I was continually perplexed by trivial stuff that TypeScript just couldn't figure put. Maybe that has changed in the last year but I doubt it.


In my 5+ years introducing TypeScript everywhere that my team has to use JavaScript, this is almost always because JavaScript is an insane language that allows many counterintuitive things.

TS is sometimes weird because it is a superset of JS and has to be compatible.


Do you have examples?


> This reminds me of when people try to make a new Ruby implementation. They get the basics down and think "this isn't so bad! I can do this!". Fast forward a few months and they're stuck on some nasty little edge case around eval and method_missing that they need to run Rails. Or some weird aspect of C extensions.

I think that's exactly why the author wrote this:

> Eventually, I realized that rewriting a massive project such as tsc would be extremely difficult to continue. But the performance gains were exciting. This led me to try another route: porting instead of a complete rewrite. So, I started looking at the TypeScript source code.

It will require ongoing work to keep up with changes to upstream, but by matching upstream's structure as closely as possible (given the language difference), the path to supporting these nasty little edge cases should be relatively clear.


Interesting, in theory you could make a rough js to golang compiler given the relatively strict subset of typescript that I recall the typescript compiler using (not that the strictness means a fully sound implementation is easy, but that you can use the regularity to cheat text transforms)


What is the difference between porting and complete rewrite? I thought they are the same?


> What is the difference between porting and complete rewrite? I thought they are the same?

As with so many terms in computing, there aren't universally agreed definitions. But in the author's post, I read "porting" as switching languages while changing as little as possible vs "complete rewrite" as redesigning/rearchitecting it along the way.


To add to that, I would say that simple forms of "porting" can be done without really understanding the overall architecture. You just need to understand each line (or a few lines around it) at a time. I understood the post to mean something like that, which sounds like a good plan here, to minimize the risk of adding bugs.

And in comparison, a "complete rewrite" would require a full understanding of the big picture design of the original code, come up with a perhaps different one in the rewrite, and keep the differences in your head as you go (or, forget about the original design entirely and solve it from scratch - and keep that in your head as you go). Far more work and risk, and it would have been a bad choice here.


If that's the case, it seems unlikely that they'll be able to get the hoped for performance improvements


Porting can involve a spectrum of changes. If you are porting to a different language, often for bug-for-bug compatibility you try to maintain the same rough code and class/object structure as the source codebase. This might give you some non-idiomatic code at first, but once you have good test coverage you can refactor. In this case you might maintain some of the upstream code structure to better match future changes.


I think here “rewrite” means completely changing the internal structure of the code (while preserving the functionality) whereas “port” means copy the structure of the internal code with as little modification as possible. And it’s hard to do the latter in Rust since Rust is not garbage collected.


If they focus on the core features and make a significant enough improvement, perhaps the core TS team could be convinced to pull it in as a native extension and make calls to it for the operations it covers? If something like 62x improvement is really achievable, it's hard to imagine the team not being interested at all.

I would personally be very happy to use a significantly slimmed down subset of TypeScript if it provided a 62x performance improvement. Slow type checking is currently the main drawback of TypeScript development in large projects.


I'm not saying this is impossible but I think you'd have to work rather carefully to not have to parse the code twice and effectively run two type checkers in parallel. Which, on second though, you could run the fast one then the slow one. Perhaps that's an option.

> I would personally be very happy to use a significantly slimmed down subset of TypeScript if it provided a 62x performance improvement. Slow type checking is currently the main drawback of TypeScript development in large projects.

We all think that until the slimmed down subset doesn't work with our favorite libraries. Sure, there may be people who use no dependencies, but I'd bet that there's sophisticated types in some very popular libraries that even low dependency programmers use.


To this point, many teams are already using subsets of TypeScript to improve compilation times. Often times it’s things like always declaring return types on functions to avoid the compiler having to infer it (especially between module boundaries). Calling this a “subset of typescript” might be strong wording, but it does suggest there is some desire in that general area.


It seems like depending on how you configure typescript (e.g, in tsconfig), typescript is already something like an ensemble of mini-languages with different dialects and semantics. Much more so than other languages. But I agree, restricted typescript that has to do less work (data or control flow analysis) would probably be on the orders of magnitude faster.


> To this point, many teams are already using subsets of TypeScript to improve compilation times. Often times it’s things like always declaring return types on functions to avoid the compiler having to infer it (especially between module boundaries).

I hadn't heard of this trick - how much improvement does it make? Seems like it might be good for readability, too?


I haven't found any large scale analyses but here's an example of a simple type annotation halving compilation time: https://stackoverflow.com/questions/36624273/investigating-l...

The Typescript compiler performance wiki might also be of interest: https://github.com/microsoft/TypeScript/wiki/Performance


I haven’t personally benchmarked it, but I know it’s available as an eslint rule (all exported functions must have declared return types).


This makes sense for semantic reasons though. If you accidentally return a `number` when you returned a `string` in some earlier branch, Typescript will happily unify them and now your function silently returns `string | number`. The linting rule prevents this.


I always thought this is because some people prefer this style.


One of the main requirements of TypeScript is that the compiler has to work in a browser. The monaco editor is the prince jewel of TypeScript applications.

So the only possibility I see is a webassembly alternative.


Not saying that the TypeScript team should actually do this, but Rust does have best-in-class WASM support.


> I would personally be very happy to use a significantly slimmed down subset of TypeScript if it provided a 62x performance improvement.

I'm just guessing but I think a lot of Typescript's really complex stuff is to support third party libraries with really complex types - at least they often refer to libraries they've improved support for when they discuss complex seeming additions. So I wonder how much of the JS ecosystem you'd have to cut off to get to such a subset. A particular example would be that I've often seen a chain of 20+ dependencies leading to Lodash, and I'd guess that Lodash is really complex to type. If you lose everything that's eventually dependent on something that's really complex to type, do you have much of NPM left?


As a maintainer for Redux, Redux Toolkit, and React-Redux, I can tell you that we would _love_ to have several additional features in TS that would make our library TS types much easier to maintain. Like, say, partially specified generics.

Tanner Linsley (author of React Table, React Query, and other similar libs) has expressed similar concerns.

(I'm actually starting to work on a talk for the upcoming TS Congress conference about "lessons learned maintaining TS libs", and this will be one of the things I point to.)


The issue here is that if you're running in the JS ecosystem you'll definitely want to use other people code (npm package or internal lib), if the subset breaks JS compatibility then you can break a significant amount of code without realising, if it is "only" a TS subset, then you need to make sure that each lib/package you import are compatible. Anyway this does not seem like a good solution.


to be fair, even within the TypeScript world that can be a problem. If typescript versions don't line up, or if your various strict options are too strict, you can get into really weird situations. It -generally- works, but when it doesn't, it's hell.


> I would personally be very happy to use a significantly slimmed down subset of TypeScript if it provided a 62x performance improvement. Slow type checking is currently the main drawback of TypeScript development in large projects.

Isn’t that already available by running in “transpile only” and having the type checking happen in the background?


62x improvement is not achievable. 6.2x might be.

edit: I see they are running it with 8 threads, in which case yes 50x is achievable. In any sizable codebase though, you should probably split into modules and run with some parallelism via e.g. `wsrun` or `turborepo`


esbuild is > 125x faster than Webpack, doing the exact same job. It's not some theoretical microbenchmark, you'll see it when converting a project from one to the other. If a software hasn't been designed with speed in mind, and I don't think tsc has, there can be massive performance improvements possible if you create a new implementation with performance as a top goal.


Webpack does way more than esbuild, including running a typechecking compiler instead of just transpiling, running compilers able to downlevel emit to ES5 and providing a deep plugin architecture allowing you to hook into any bit you like. But yes, it hasn't been designed with speed in mind - it has been designed for maximum extensibility instead. Its the same reason why Babel is slow compared to sucrase (written in JS, currently faster than SWC and esbuild but doing somewhat less - https://github.com/alangpierce/sucrase)

tsc has in fact been designed with speed in mind (I've been following the project since before it moved to GitHub, which is when they redesigned it for speed). Going beyond 1 order of magnitude performance improvement while implementing the full feature set is highly unlikely.


62X is still one order of magnitude.

I would bet on a port eventually being done and hitting at least 10X.


Well we have some intersection then, I think it will be 10x at most :)


Even if it’s not official i could use a slimmed-down, faster TypeScript. Even if it has slightly different semantics. I always use skipLibCheck so TS only checks my code.

It’s a form of linting, not semantics, so it doesn’t have to be exact.


No spec, but the golden files that check that the compiler is working as intended are very comprehensive. If you can generate the same ~38,000 files in this directory from the test cases, you're pretty much on par with tsc: https://github.com/microsoft/TypeScript/tree/main/tests/base...


But that only works for a bug-for-bug compiler, no? If you want to build something like a linter, you couldn't use them to validate you are covering the fully syntactic range. Though I guess it would help look at coverage if your linter threw any errors, but it wouldn't validate correctness.


> This reminds me of when people try to make a new Ruby implementation. They get the basics down and think "this isn't so bad! I can do this!". Fast forward a few months and they're stuck on some nasty little edge case around eval and method_missing that they need to run Rails. Or some weird aspect of C extensions.

This is true of many programming languages projects. It's not uncommon for someone to write a language in a weekend, and then marvel at how their very simple toy language is faster than C! And look how quickly it compiles! Thus begins a several years-long journey, where with every feature added the runtime and compiler get slower and slower and slower.


There is no specs so TypeScript's own impelemtption might not be accurate to what the team imagined on how things should work.

That means if this implementation passes all TypeScript's tests, it is as good as TypeScript.

Each new issue that is opened in either repo that shows discrepancy can be a new unit test added to the TypeScript codebase.


Good thing is that the only thing it does is type-checking. So if you miss some edge case it won't influence at all how your ts code works.

The only thing is it may not catch some type error, which JS already does none of and TS can skip wherever you want or even implicitly and people are fine with that.

If I get 98% of TS typechecking 10 times faster. It's a huge win.


You're vastly underestimating the advanced type system features TypeScript requires to typecheck even the most basic JavaScript code. It would be more like 10% rather than 98%. Intersection and union types are no joke, and are notoriously difficult to make performant.


I just saw an example where the upcoming 4.6 will improve the clever use of recursive types done by some libraries.

Give the people a turing complete type system and they will write a complete application with it.


My experience with a project like this is unless you do reach fabled 1:1 parity, the new tool is orders of magnitude less useful or even actively harmful.

Saving a minute every cycle sounds very attractive, but when a person spends a morning hacking and then finds out the normal tool chain doesn't accept their work when the optimized one did, they will quickly distrust it in the future. It'll also make them very reluctant to share it with their team.

The only reason my project was successful was we ultimately reached a set of compromises with the team that owned the woefully underspecified tool chain and upstreamed large swaths of changes. I would probably never do something like that again.


I think it's okay not to get near 1:1 parity given the amount of unchecked type-casts (foo as unknown as X) and ts-ignores that happen anyway in most code bases. The other advantage is that the typechecker is advisory, meaning the code will run (and might explode) but it will run even if the types are wrong since compiling typescript is mostly just throwing away the type info.


The good thing is you have a ton of code on the Internet to use to test it. Unlike a dynamic language reimplementation, a static language transpiler can check if the results produced are identical to `tsc`


Yeah, the lack of specification for Typescript is quickly turning into the Achilles heel for the web tooling ecosystem. Several projects got the parsing part done, but a lack of specification and the relatively fast release cycle means it isn't really feasible to invest into actually implementing the meat of the TS compiler, so everyone is stuck with that last remnant of critical tooling running slowly.


> It's also a mess that does not have a spec. Even if you somehow manage to make a type checker that implements most of the features, there's no way to ensure it has 1 to 1 parity with TypeScript. Plus that parity can be broken at any point.

You need a test suit for this not a spec.


One way to collaborate would be to write a spec as you go, along with an associated test suite that can be run against multiple implementations.

Presumably TypeScript has a test suite already that might serve as a starting point?


I really like dart actually. It's typed, close enough to JavaScript, and yet I don't think is an unholy mess ?


But in the meantime he gets to work on a fun project, learn a lot and get paid for it at the same time ;)


> tsc depends on shared mutability and has a cyclical mutable reference. Rust is designed to prevent this behavior. Having two references to the same data (shared mutability) is undefined behavior in Rust.

This problem has numerous solutions in Rust - (A)Rc, RefCell etc. I don't understand how this caused the author of the article to stop considering Rust.


Non-GC languages are bad in general at handling cyclical references (mutable or not), and Rust is especially ill suited for it because of the borrow checker. For example should an arbitrary reference in tsc become an rc::Rc or an rc::Weak? You have to tease out the implicit ownership model and one may not even exist.

tsc was designed expecting GC and it makes perfect sense to port it to another fast GC language.


I'm not a Rust fanboy (in fact, I rather despise the RESF, especially given Rust's mediocre tooling), but: I think that "You have to tease out the implicit ownership model and one may not even exist" suggests a problem with the program itself.

I can't think of any reason why a program must have an ill-formed ownership model (unlike, say, the fact that many interesting programs must have an embedded dynamic type system (tagged unions etc) to do their thing).

Now. The Rust language and existing tooling doesn't give you any help in figuring out the ownership model, but I can easily imagine a language and tools that would help with that.

This isn't meant to refute anything you've said (you said "[existing] non-GC languages are bad in general" not that non-GC languages cannot be good) - just add clarity.


A program doesn't have to have an ill-formed ownership model. But if it does and you're trying to translate from one language to another, you should probably pick the language that makes the job easier.

It reminds me of a programming assignment I had 20 years ago. We had to take our teacher's old FORTRAN program and "convert" it to Java or C++. I knew both equally well at the time - and I picked C++ because it had goto whereas Java didn't.

This meant that I didn't really have to unravel the meaning of the program. Whereas with Java, I would have to.


Speaking as someone who has written non-trivial compilers in both Rust and Go...

Compilers tend to use a bit thornier data structures and graphs than you might see in other realms. It is very convenient to just represent these graphs as objects connected by pointers, and know that you don't have to worry about objects that become unreachable.

For example, the values in an SSA-based IR are all full of pointers to other values. Each value represents the computation to produce that value, like "x12 / 4". A strength-reduction pass goes in and mutates this node, replacing it with "x12 >> 2" (if such a change is valid). It knows the change is valid because it follows pointers from the node to its inputs. The change is available to all upstream values immediately because it is mutated. These changes may make other values unreachable, at which point, you don't care about them. You only care about the values which are reachable from computations with side effects. These nodes definitely have cycles and IMO, the notion of weak references between them doesn't make sense.

I'm sure there's a way to do that in Rust. Please, nobody try to explain how. I could figure it out. The thing is... it's not obvious how to do it in Rust, and figuring out how to "explain ownership to the Rust compiler" without creating Rc<RefCell<T>> cycle leaks is a non-trivial task that sucks away time I could be spending writing the compiler.

This reminds me of a paper I saw go by years back about writing a compiler in Haskell. Pure Haskell code doesn't let you mutate objects and doesn't let you do pointer comparisons, so it creates problems if you're trying to implement an algorithm which is conceptually pure (at a high level, a function with inputs and outputs) but where you're taking the implementation from a paper which constructs the output iteratively, piece by piece. The problem is that your data structures have to be annotated to show which parts are mutable (IORef, STRef, or TVar), but "which parts are mutable" may have a different answer depending on whether you are inside some function transforming or constructing a data structure.

It was obvious to me, reading the paper, that while I love functional purity, and functional purity solves all sorts of problems and makes it easier to write code in general, it sometimes gets in the way of getting shit done. The same is true of Rust's notion of ownership... super useful when it lets me write fast, efficient code without having to think about ownership (when it just works and gets out of the way), and super annoying when you have to figure out how to explain your program to Rust.


Thank you for explaining - now that you've pointed it out, I can see that this is just another form of the "graphs are hard in Rust" problem that I've encountered before.

However, I still stand by my point - Rust might be bad at graphs, but I believe that a well-designed language with a borrow checker (maybe something closer to Lobster, which automatically inserts RC cells when you try to multiply mutable borrow[1] - something obviously more correct than what Rust does, I'm not sure how they messed that one up) wouldn't necessarily have to be.

[1] https://aardappel.github.io/lobster/memory_management.html


The problem is that Rc<> is not replacement for GC, in GC language you just don't care about having multiple mutable references at all because it's mem-safe (so you don't need to panic). The only thing you can get is concurrency issue but that's often fine for isolated algorithms (where you do mutex at the top and don't care from there or you just send the whole thing to the worker using channel). Rust is placing restrictions on your code not because it's bad code but because it cannot prove it's safe. Yeah so you can mark it as unsafe but you can also switch to a different language and have a happy rest of life.


Do you mean a Rust Rc<> or a generic "borrow-checked language" Rc<>? In the former case, yes, I absolutely agree that Rust is not ideal. However, my argument is about a theoretical language with a borrow-checker, in which case - why isn't something like Lobster sufficient, which automatically detects when you have multiple owners and inserts reference counting under the hood?

https://aardappel.github.io/lobster/memory_management.html


Among other things, the article you linked explicitly mentions that it does not solve one of the major problems with using Rc<T>, which is that it does not handle cycles.

It sounds like a more ergonomic system that has the same technical shortcomings.


yeah I mean rust Rc<>, I don't have experience with lobster, does it allow mutation from multiple places or does it panic just like rust?


The latter - although as a sibling commentator pointed out, it doesn't do cycles - so it's better, but not a panacea.


Ah, ok, having cycles is another problem, also hard, but my point was about multiple mutable references which is not a safety problem if you have GC. You only get race conditions.

And I think it could/should work also with Rc<> but for some reason, rust authors decided it's important to write everything as if it was multi-threaded even if it's single-threaded now (so it can be easily turned to multi-threaded later).

And that's cool but it adds a lot of burden to developers. I am probably missing something, there is likely a good reason for this, as always, but it doesn't change the point, rust is super-complicated for hobby stuff and you have to think about irrelevant things, hence for me, it is low-level language (with generics and macros and very cool builtin unit testing but it's still low-level)

And by irrelevant I don't mean mem-safety, that IS important for me, I rather mean worrying about race-conditions in a single-threaded code :)


If your program is an interpreter, then I think the ownership model is going to be dominated by the program being interpreted, not the intepreter itself... That is, has to be general enough to interpret any program, and many of those programs will have cyclic data structures.

TSC is not an interpreter, but its type system is Turing complete, so I wouldn't be surprised if it has some similar issues.


> I rather despise the RESF, especially given Rust's mediocre tooling

Which tooling are you talking about?


Copying from a previous comment[1]:

The Rust compiler is both dog-slow and massive (both from a source and binaries perspective), and doesn't have a working incremental compilation mode yet, or support in-process hot-patching. There's no Rust REPL (hacks like papyrus don't count). Poor structural editing support. Integration with various editors/IDEs is lacking (e.g. there's no support for reporting possible performance issues with code constructs in any editor that I'm aware of, nor is there in-editor per-function disassembly or LLVM IR inspection, or borrow-checker lifetime annotation, no extract-function refactoring).

[1] https://news.ycombinator.com/item?id=29925783


The compiler is relatively fast for recompilation but you are right, this is a problem that's actively being worked on.

> support in-process hot-patching

I assume that we are comparing this with say C++ compilers. I think that Visual Studio added this recently but idk if saying this is a valid criticism of the compiler.

> There's no Rust REPL

Are you comparing it with C++ compilers?

The Rust tooling is so much better than C++ tooling that it's not even funny. You are right, there are places where this could be improved but just cargo makes my life so so so much easier than any other package manager.


> The compiler is relatively fast for recompilation

Relative to Rust when doing a cold-compile, maybe, but certainly not relative to other languages like C, sane C++, Go, or Common Lisp.

I'm comparing Rust tooling with existing tooling in other languages in general, that is feasible to implement.

Common Lisp has multiple compilers (SBCL, Allegro, Clasp) that compile code nearly instantly (within a second for my modest 2kloc codebases, and individual functions in milliseconds), have REPLs (that also do native-code compilation under the hood using the compiler proper - no hacks), and support hotpatching.

Visual Studio implements in-process hot-patching, too, so between C++'s highly static nature, and the fact that SBCL is run by a tiny group of unpaid contributors, there's no reason for the Rust compiler to not have it.

Meanwhile, GHC (compiling Haskell, a statically-typed language) has a REPL, so it's clear that being dynamically-typed is not a requirement.

From a user's perspective, having either long compilation times or lacking a REPL is OK, but not having either is unacceptable.

Having a good package manager is necessary for having good tooling, but not sufficient.


Rust does have a third-party `Gc` type (which used to be in the stdlib).


Is that the one where you need to derive() for every struct you want to use and which is, in general, not compatible with the rest of rust ecosystem?


I honestly like the explicit weak ref model and it’s usually trivial (child objects are strong references, parents and siblings are weak), and rust has Gc as a last resort.

But if the author’s more familiar with Go, performance in Go is likely good enough


While I hate Go and love Rust, there is also the issue that RefCell/Gc/Rc are very different in terms of how expensive a memory allocation is compared to node. Historically there were very similar problems in trying to write C++ like one would write Java. Memory allocators today are much better and closer to GC languages, but I suspect it will still be more expensive for very allocation heavy code.

One way you could get this to work at the cost of some safety (depending on how you do it) would be to use arena style allocation. Obviously there are ways to do this safely in Rust, but it is probably easier to do it unsafely for a port.

However, if you just use a high performance language with a GC like Go/C#/Java, the port will be easier and you won't have to worry about doing too many allocations.


I think the point is valid. As a rust beginner I tried to make a graph data structure and it was difficult. Using RefCell is not very nice because it moves the problem from compile time to runtime and you still can’t mutate the same data structure more than once at the same time.


Rust encourages you to use an adjacent list/matrix for your graph representation, which is preferable almost all of the time for graph algorithms. It's a little unwieldy if the graph needs to be mutable during the core algorithms, but almost all algorithms in practice assume it will be stable.


This is interesting but I don't know Rust. Presumably your graph data structure grows dynamically, meaning you have to add nodes at runtime. When. you say you can't mutate the same data structure more than once at runtime, what does it mean? That you can't set both a parent and a child pointer in the same function, or what?


I’m very far from being a rust expert but it was difficult to build a bi-directional graph because it’s difficult to connect the nodes together. A simple recursive algorithm will not work. I ended up using a crate.

I found the following article interesting on the topic : https://aminb.gitbooks.io/rust-for-c/content/graphs/


It shouldn't be difficult:

1. You have a graph struct that has ownership of all nodes (in Rc containers).

2. Nodes have Weak references to connected nodes


In the case of RefCell, when you begin modifying its data, you’re not allowed to begin modifying its data a second time without declaring “I’m done” with the first modification. This makes doing several modifications in different places that are interspersed with each other tricky because it’s hard to reason about when each spot is actually done so that a modification can be performed elsewhere. (If your program is single threaded it’s possible you don’t even care about mutation happening in multiple places, but Rust cares because in general Rust programs need not be single threaded.)


Rust doesn't allow you to relax mutation constraints for a single-threaded program?


Given that ESBuild [0] is a way better replacement for Webpack already (and similarly written in Go) I can see how we could pretty soon have an entire toolchain written in Go for dealing with compiling JS.

Hopefully the whole toolchain will be embeddable, too. Being able to do configuration-in-code and produce an executable that does the whole build step in milliseconds would be cool.

[0] https://esbuild.github.io/


SWC (by the same author as this tentative tsc port) is a part of more and more people's typescript toolchain, and is written in Rust. Having all your ecosystem in a single language feels nice, but pragmatically different tools might be better in different languages.


The point of having it all embeddable in Go is that we could write a single executable that handles the entire process, with the toolchain embedded and the configuration written as code.


You can embed Rust code in Go, unless there's something about Go that prevents static linking across FFI boundaries.


It's not easy and not very common.


The author of esbuild also started the implementation of esbuild in rust, found it painful too and switched to build it in go.


For me the ideal use case for Rust is really for places where automatic memory management is a hard sell, even if technically possible, kernels, drivers, hypervisors, firmware, GPGPU, ...

For everything else I rather enjoy the productivity of automatic memory management.


My general inclination is to go straight to Rust if I already tried a different reasonably performant language (like Node.js) and it wasn't fast enough—because only Rust (and C and C++ and a few more obscure languages) guarantees that basically any performance optimization that I might want to do will at least be possible, even if it might be inelegant or unsafe. I don't want to run the risk of hitting another performance wall that I can't climb over.

That being said, porting this kind of codebase to Rust would indeed be really rough—definitely possible, but by no means pleasant. I like to think the platonic ideal Rust, which we're hopefully getting closer and closer to over time, is capable of elegantly handling cyclic data structures and the like even while encouraging a single-ownership model for the 95% of cases where that's the right design. I don't think the Rust we have today is there yet.


I wouldn't put node.js and performance on the same sentence.

Better than other dynamic languages, definitely. That is about it.

Plenty of languages that offer AOT compilation and automatically managed memory to chose from.


The "why not port to Rust" part boils down to these paragraphs:

> tsc uses a lot of shared mutability and many parts depend on garbage collection. Even though I’m an advocate and believer in Rust, it doesn’t feel like the right tool for the job here. Using Rust required me to use unsafe too much for this specific project.

> tsc depends on shared mutability and has a cyclical mutable reference. Rust is designed to prevent this behavior. Having two references to the same data (shared mutability) is undefined behavior in Rust.

I'd be curious to hear more. In particular:

* Two references to the same data: were cells too awkward?

* Actual garbage collection: my first instinct would be to use an arena for each compilation, destroying the whole thing at once rather than doing reference counting or garbage collection. I gather that's not cool in the age of language servers and incremental compilation, though. Does tsc support those?


All of that is valid for a rewrite, but not efficient if you’re porting the current implementation.

As I understand it, in this case, Rust would be able to have a line to line port from JavaScript.

I do still think your overall point holds. Why not use a library in the Rust ecosystem to help with this problem e.g. some GC<T> type.


> All of that is valid for a rewrite, but not efficient if you’re porting the current implementation.

Help me see it?

I can easily see how getting rid of the cyclical structure crosses from "line-to-line port" into "rewrite" territory. I'm not saying you're wrong, but I'm having more trouble seeing how "wrap some types in cells and pass along a lifetime and arena parameter" crosses that threshold. It might not be feasible at all, though, if tsc compilations are long-lived for language server + incremental compilation support.

> Why not use a library in the Rust ecosystem to help with this problem e.g. some GC<T> type.

iirc I've seen some experimental GC library mentioned. I think something like that is at least off the beaten path enough that I can see how the author might decide it's better to just use Go instead.


TSC supports both a watch mode and is used heavily as a language server.


It used to be in std.


I see a few people wanting to rewrite tsc in something else than Typescript because tsc itself is too slow. I don't remember seeing any "proof" or analysis that Typescript is the limiting factor in tsc. The basis often used is the performance of esbuild compared to other bundlers. From the esbuild FAQ, "Why is esbuild fast?":

> Several reasons:

> It's written in Go and compiles to native code.

> Parallelism is used heavily.

> Everything in esbuild is written from scratch.

> Memory is used efficienty.

Using another language will solve 1 and will maybe solve 2. From this FAQ too:

> For example, many bundlers use the official TypeScript compiler as a parser. But it was built to serve the goals of the TypeScript compiler team and they do not have performance as a top priority. Their code makes pretty heavy use of megamorphic object shapes and unnecessary dynamic property accesses (both well-known JavaScript speed bumps).

tsc itself seems like it could be optimized. By how much, I have no idea. But it sounds like it would be easier than writing something from scratch. esbuild, while a great tool, is limited in scope. Typechecking Typescript on the other hand is a moving target, and a fast one at that. All in all, I'm not convinced by this approach compared to "regular performance work" on tsc itself. I think people talk about that because it's easier to start your project than to change tsc. In that case, the problem sound political rather than technical.

There's the x62 speed from the article, but there's no info on what features are supported. I feel like this is something where the devil is in the details.

I'm happy being proven wrong on that subject, I fear that we're going increase the number of tools in the frontend ecosystem for no good reasons, and make every a bit harder again.


I had the same thought.

A direct port of tsc to a "faster" language doesn't guarantee a faster type checker. Sure it might be a bit faster, but my guess is that the large performance wins come from a different design that takes performance into consideration from the start.

Maybe the thought process is that a 1:1 rewrite will handle compatibility and feature completeness? I suppose you could optimise from there?


How can you avoid megamorphizing an ast hierarchy?


Just from a week ago on HN:

I'm making a Typescript type-checker in Rust: https://news.ycombinator.com/item?id=30033119


Next week it will be Zig then



Here's a relevant GitHub thread where some conversation that has been happening: https://github.com/swc-project/swc/issues/571


TSC doesn't need to "stick around", right? Just a run-once and the program is over?

In those cases, https://github.com/fitzgen/bumpalo works amazingly as an arena. You can pretty much forget about reference counting and have direct references everywhere in your graph. The disadvantage is that it's hard to modify your tree without leaving memory around.

We use it extensively in http://github.com/dioxusLabs/dioxus and don't need to worry about Rc anywhere in the graph/diffing code.


I think --watch mode means it's running while developing


Additionally, the language server used to power editors is long lived. TypeScript isn't just the compiler.


If anyone happens to know: what are tsc's bottlenecks?

I presume that if the bottleneck was just number of loop iterations, V8 would already be about as fast as C or any other language?

Is it simply that accessing/mutating javascript data structures like objects, arrays, set, etc. is slow? Are those not fully optimizable by V8 for some reason? Or is it something else like the GC being slow?


Yes, I'm curious about this too.

Given that gopherjs[1] can compile Go to JavaScript and in some cases thereby achieve better performance than the native Go compiler[2], it's not obvious to me that a JavaScript program ported mechanically to Go is likely to be any faster than the original. It seems to me that the effort might be better applied to optimising the performance of tsc itself.

[1] https://github.com/gopherjs/gopherjs [2] https://medium.com/gopherjs/surprises-in-gopherjs-performanc...


This is a problem that can take advantage of good parallelism, Go will give you that and at the same time be more memory efficient.


(In a different direction, Microsoft's pyright type checker for Python -- IMO the fastest and most correct type checker in the Python ecosystem -- is implemented in TypeScript!)


as someone who works on a competing typechecker, i'm definitely very impressed by pyright, especially since it has a single author! we have three people working on pytype and it definitely feels like too little; the way the pyright author manages such a great development velocity is mindblowing, and means they got the initial architecture very right.


The thing with all those projects is playing catch-up with upstream, and the possible variations from what is actually the official semantics depending on the TypeScript version.


Personally I wouldn't fret about keeping pace with upstream. Eventually the syntax will stabilize if it hasn't already. I'd be more worried about fidelity--does this produce the same result as the reference implementation? TFA says he's running against the "conformance" test suite, so that should keep things pretty tight and when bugs are found they'll be fixed in time and presumably the test suite will be updated accordingly.


> Eventually the syntax will stabilize if it hasn't already.

Every language in wide use today is still evolving and, if anything, it seems that the pace of language change is generally increasing.


Some amount of evolution is fine, and I think your “pace of language change is increasing” may be true for some languages and time scales, but most languages don’t change more in their second five years than their first.


The `conformance` suite is the "set of tests people added when a feature was first released", usually without the decade of following regression tests in the `compiler` suite. In TS itself, the `conformance` and `compiler` suite are considered the part of the same unit most of the time. (The compiler suite, however, is not helpfully subdivided by feature, since it oft deals with cross-cutting concerns!)


If OP delivers on the preliminary 62x speed up, how could upstream justify not jumping ship to OP’s implementation? Certainly the user base will demand it.


Being Microsoft, they would rather make use of their Typescript compiler for IoT, yet they haven't.

https://makecode.com/language


62X seems like a lot. It makes me wonder if someone has put serious effort into profiling and optimizing the JavaScript version. If not, that could be a better use of time than a port. I'd guess the TypeScript team may not be interested in maintaining a Go version, so this port will will always be behind the current TypeScript definition.


I'm not so positive about these projects.

They are good willed but TypeScript doesn't have an official spec, it's an implementation-defined language. It's not C, it's not JavaScript, it's whatever each release wants it to be. This makes it very difficult to follow every version and evolution, you're consistently chasing whatever has been merged on release.


And it's the exact reason I'm really reluctant to fully buy into Typescript... I always feel like I'm wrong about the "not an official spec" part of things but you are 100% correct in my understanding.

Also on "it's whatever each release wants it to be" - that is hell on earth if you're looking to target consistent software processes and it only gets worse the more you scale (both solution and team).

I get the value of Typescript - hell I've used it extensively myself... it's just that I can't stop feeling like this is all an anti-pattern: trying to make a loosely-typed language a typed language (if you can call it that?). =/


I share the same concerns. It’s also a little irking that if I want to enjoy good tooling for typescript, I now have to install another language on my system since the language that typescript sits on is a hot mess. I don’t mean to cast shame on JavaScript, but I’d be remiss to not say that modern js development is a bit insane. The only reason we all go along with it is because we have to.


You don't have to install another language to use tools written in Golang (or Rust) unless you have to build binaries yourself.


Maybe tsc already does this, but why not just make extensive use of caching to get the compile times for "incremental builds" under control? Why would it have to re-check all 10,000 files in my project just because I modified a single file? That seems way more manageable (and less error-prone) than rewriting the entire thing.


go means it's going to be easier to integrate to esbuild :)

I'm happy to see this, as honestly I don't really understand why you would compile TS without type checking.


> I'm happy to see this, as honestly I don't really understand why you would compile TS without type checking.

I don't think I'm remotely alone in doing this but I'll share from my perspective. I have two modes while writing TypeScript stuff: 1) prototyping something where I don't want to spend time writing types (or maybe I write some types but I don't care if they are correct or line up yet) but I still want to have JavaScript generated so I can test the thing out manually and 2) getting types correct as part of long-term maintenance and edge-case removal.

There are fast TypeScript -> JavaScript generators today like swc and esbuild. tsc is not fast. So I can speed up "1)" from above if I use swc and esbuild and only use tsc for "2)" from above.

As people write faster implementations of tsc-the-type-checker though I'll be happy to use that instead and just not use tsc anymore.


> 1) prototyping something where I don't want to spend time writing types

It doesn't compile to JS, but this is one of the canonical use cases for Stanza 's (1) "optional" type system

(1) http://lbstanza.org/optional_typing.html


Have you tried doing Type-Driven Development? IME, most of the time constraints are determined through types first with the code being done later tends to replace the game of "what kind of complex and non-obvious types fit my code the most" with a "oh, the implementation is obvious" reaction.

It also makes it easier to determine what behavior is worth testing and which ones are already covered by your types, if you tend to write tests last. If instead you're used to Test-Driven Development you can do types->tests->code.


I've spent a lot of time programming in Standard ML and yeah that's a prerequisite there.

But no I don't prefer it to what I do in TypeScript.


Out of curiosity, how often do you have a project in the first mode that gets large enough that the speed of compiling TS to JS is significant?


If you're writing UI code then even a few seconds is slow. You don't want to be waiting 5 seconds every time you move something a couple of pixels to the left.


There is a huge difference even in small and mid-size projects.

https://datastation.multiprocess.io/blog/2021-11-13-benchmar...


That makes sense, I would generally use JS directly for mode 1), but I understand the use case.


Personally, I let vscode do typechecking on open files & have a pre-commit git hook to typecheck changed files.

When it comes to starting up a development server & building the client, there's a huge cost to repeatedly typechecking the same 1000+ files. By cutting out typechecking & only compiling, I can reduce the webpack client build from 40 seconds to 3 seconds (using sucrase).


I'm answering to you but this is really for most siblings:

I maintain a fairly large TS code base (nodejs backend). I think a full rebuild is 1:30 on my relatively recent laptop (i7-8650U, lots of RAM). But in practice I always use tsc -w and compile is mostly instant after editing a file (I do have to wait whenever I launch the command after I boot, but after that, it's fast enough).

tsc now support incremental compilation too, though I haven't played with it too much as I'm happy with watch mode.


Incremental compilation is _okay_ but it seems to perform full rebuilds if the previous build is _too_ old which is weird- meaning that more builds than expected take the entire time to complete. Nonetheless I use it for all of my builds. On the other hand this may just be because of the spiraling nature of type inference, where TS needs to re-examine a large amount of the program from a change that appears simple.

Personally I only have a small number of projects that take more than 10-20 seconds to compile, but those ones are painful. I should probably do the same with -w for those.


What if not everyone else on your team is paying attention to their editor's warnings? What if the LS's async response doesn't come fast enough? What if, god forbid, you have to change something on a different computer?

If it's not in your CI/CD scripts, it's not actually getting checked.

> I can reduce the webpack client build... to 3 seconds

This is about 10x longer than any interaction with a computer should be.


> If it's not in your CI/CD scripts, it's not actually getting checked.

You can make a similar argument and say you must write in a sound type system language, and TS typesystem is unsound.


Sorry, I don't understand the analogy. Yes, TS allows some bugs through, and that's a problem too. But that's also a far cry from "always rely on humans to run complex processes in a consistent way."


This is similar to what I do (although I use esbuild), however like an idiot I just run tsc manually so obviously I forget sometimes and it takes 10 minutes to realise the build failed.

Not great... I'm ashamed but it's just me on this project atm.

edit: misread parent originally


If it takes one minute to type check a codebase and ten seconds to compile it to JavaScript, then whenever I simply need to produce a JavaScript file without type checking I'll save myself the fifty seconds. If my editor's TS language server does the checking and my bundler is just compiling for my dev environment, I don't want both tools duplicating each other's work.


Answering to myself, a colleague suggestion (from tsc doc actually): use tsc --noEmit (for type checking) and a fast compiler.

tsc --noEmit duration is 2/3 of a normal build on our fairly large code base, so that can definitely speeds things up (for CI for example)


We use SWC with ts-node to strip types in dev for faster iteration loops and run the full tsc typechecker in CI (and optionally as a separate process on your local machine).


You compile ts without type checking when the options to do so are 10x+ faster than with type checking.


I really don't get this, when I save my TS files they are already compiled into JS on the fly, ready to fed into a <script />.


By what? What if they were updated from an upstream repo, and never saved from an editor? What if you change branches without an upstream repo?


By VS and VSCode, the IDE/editor that belongs to the same company that does TypesScript.

If I really need to call tsc explicitly, the few seconds that I have to wait, versus the minutes on some other languages, won't make me lose sleep over it.


Aside from preferences, is there a reason why C# or F# wouldn't be a more natural choice? I'd imagine porting TS to C# would be easier (especially for this specific use-case) and that the performance would be good.


I wonder how it would fare if instead of rewriting from scratch, they applied automated transformation from the Typescript source code to Go. It wouldn't have to capture all the semantics of TS/JS, but be opinionated enough to transform the Typescript source.

Another possibility is to use something like GraalVM native-image to compile Typescript's type-checker into a binary. This might offer a speedup over its execution on V8.


I'm surprised to see no one comment on Zig. It seems like the bun.sh developers are making some bold changes with such a low-level language.


I'm super bullish on Zig, but it's a bit early to bank on for production things as the language is still changing/being developed.

To give another example of where this came up: they were discussing using Zig for the Ruby JIT, but despite really liking Zig the team decided to go with Rust because of its maturity: https://bugs.ruby-lang.org/issues/18481#note-21


I'm more curious about if the author research Ocaml as an alternative.


Well... How many paid work hours spent on tsc? Then does TypeScript have a formal language specification? I'm pessimistic it's possible to rewrite tsc with full compat (bug-to-bug, otherwise it doesn't really make much sense) without MS support... but who knows... I'd like fast compiles too, so good luck!


A formal specification would be basically useless, if nothing guarantees that the reference implementation actually follows it. Sharing & contributing to the same test suite is a much more practical approach.


As a smoke you can swap it in for a lot of the npm ecosystem and run the tests of those projects on the compiled code.


Specification isn't being followed is of course useless. But yeah, tests are definitely useless and in the absence of the spec is the only sane way to approach such rewrite.


Slightly off topic, but do I not get something about Go? It comes up waay to often nowadays here, perhaps even more than LISPs - last time Rust was the hyped one, but it seemed to have decreased a bit. But at least rust is an interesting language with some novel features, adding new value to the low level PL domain. I fail to see what is unique about Go, other than the perhaps least known non-blocking concurrency (based on the recent “invisible” article on this). I find the simplicity of it (with the often claimed reason being basically “code monkeys not hindering each other’s work”, quite offending) really off-puttingz


> I fail to see what is unique about Go

That's because you're looking at it purely from only one point of view - the number of available features in the language itself.

What's unique about Go is not so much about the language itself as the approach of its authors to creating it. Go hasn't meaningfully changed in the 10 years of its existence. That's not an accident - the bar to add anything is extremely high. A feature needs to be punching way above its cost to even be considered.

If you look around the programming language space, you'll see the exact opposite attitudes, features get added into languages willy-nilly. Not much point in having different languages when they just trade features until they become indistinguishable from each other.

There are only three other languages that I know of that at least sort-of follow a similar philosophy to Go: 1. C 2. Zig 3. Elixir

You aren't going to be writing general-purpose code in C, so that's out. Zig is about a decade away from being a contender. Elixir is a valid choice, albeit much more niche and unfamiliar to the average programmer than Go is.

So if you want a language that'll be there for you in the long haul, that will not invalidate what you've learned about it every 3 years or so, you're left with Go.


> If you look around the programming language space, you'll see the exact opposite attitudes, features get added into languages willy-nilly

Java has been going with the last-mover advantage of only ever incorporating a new feature when it proved to be useful by other languages. That has been going on for 25 years, so it is definitely queen of this category. Not sure why it is seldom appreciated, but it is also a very small language (compare the number of keywords it has by c# for example), and it was even named “blue-collar” for a reason. So I still don’t see Go anything new even from this perspective.


Go is about getting things done. I think it's interesting to HN because it is enough engineering to solve a problem, and no more. I think the biggest feature of golang is the community process that considers change slowly, in contrast to the JS ecosystem, that's a distinct feature.


Personally, I think the appeal of Go has much more to do with the compiler and the tooling than the language.

The compiler is blazing fast and you get small fast statically linked executables by default. That's very appealing to me even though the language itself is a bit spartan.


Super skeptical that porting this to a new language is going to have the kind of performance gains they expect ... assuming they want to replicate all of tsc's functionality


It’s already proven (esbuild, SWC, Bun) to drastically improve performance of TypeScript’s other responsibility: compilation (tsc doesn’t offer an option for this, but ts-node does, for comparison). And while the type checker is certainly more complex, I have a hard time imagining it won’t show similar improvements in a native language.


Compiling typescript is basically just stripping the type definitions/annotations and other typescript-specific parts of the syntax out. That's a huge difference from actually type checking it.


And yet, it’s drastically slower in JavaScript than in Go, Rust and Zig. I realize different algorithms will optimize differently between JS and native code, but I sincerely doubt tsc is anywhere near optimal speed.


That's mostly because tsc's parser is used by the JS implementations.

Writing a parser for the purpose would achieve performance closer to the other implementations.


esbuild, SWC, and Bun are not ports of existing programs.


Oof trust me from experience you do _not_ want to write a compiler or parser or anything like that in Go. The string support is so limited -- for example you have to implement basic things like reverse string from scratch instead of them (and much more) being in the standard library. Much better to use something like Crystal or Rust.


ESBuild seems to be doing fine.

I’m a crystal fan though, would love to see more tooling built with it.


The only string handling a compiler needs is usually during tokenization. After that the compiler only manipulates tokens, AST, SSA, IR, etc.


And then you face Go's usual awkward issues with having a lack of generics which becomes important when, say, traversing a tree.


I'm the author of esbuild. I hadn't written Go before I started esbuild and I thought I would miss generics more, but I didn't. There is literally only one situation in the whole esbuild code base where I've found myself wishing for generics: sorting an array. The workaround is easy and involves implementing an interface with three methods, each of which is typically a single line: https://pkg.go.dev/sort#Interface. And I only needed a few of these interfaces for a few different types. That wasn't nearly enough of a reason to not use Go, especially considering the other advantages.


I agree. I haven't missed generics at all since switching to Go. However that being said, do you think that the upcoming Go 1.18 with generic support is going to increase performance in Esbuild? Just from not having to Unbox every interface{}, or by writing less to the heap? Have you experimented with generics and esbuild?


No, I haven't experimented with generics and esbuild. I hadn't considered whether generics could improve performance or not. Just thinking about it quickly now. I'm not convinced it would because esbuild hardly makes use of interface{}. If someone can demonstrate a noticeable performance improvement then I'd be happy to start using generics for that reason.

The main pattern esbuild uses is an interface with a single dummy method to denote a union type like this: https://github.com/evanw/esbuild/blob/34899aaa1d76acd3b4adc5.... It's used several times and is basically esbuild's core data structure. I'd like to be able to optimize this pattern. Specifically I'd like to avoid each reference to a union type taking up two whole pointers in memory (Go represents interfaces as a pointer to the object and a separate pointer to the method table).

I'm only using the method table as a tag for the tagged union, so ideally it'd be a single byte or something even less expensive like part of the pointer. I don't think generics can help with this? But Go doesn't let you do fancy stuff like this so I'm just leaving it be. A low-level language like C++/Rust could do better here, but that comes at the cost of a significant decrease in productivity, so I'm ok with this trade-off.


Thankfully that will end next month.


Truthfully I'm not looking forward to it.


Sounds strange. Where do you reserve strings in the compiler?


Nah. I’ve written compilers in much less friendly languages than go.


Way better than C.


Cannot wait for their webpack replacement.


I'm sorrowful that such a significant percentage of this post was a preemptive apology for not using Rust.


Well this a blow for the "Rewrite it in Rust" movement.


I'll throw this out there (even though I know the response I'll get), why not use Crystal-lang instead of Go-lang? Besides the obvious differences in funding, what does Go have that Crystal doesn't?


It will be interesting to see if they create a language server so we can use it in VSCode.


Is this something that could be automated with something like https://aws.github.io/jsii/?


So this is a "This week vercel is sponsoring" episode...


The way of dealing with cycles is generational arenas. Also where is the codebase? I would like to see the uses of unsafe.


>shared mutability

Sure but this goes against "Don't communicate by sharing memory", no?


> Sure but this goes against "Don't communicate by sharing memory", no?

Yes, but that's advice rather than something Go requires. It's reasonable to say it's outweighed by the desire to avoid significantly restructuring the existing logic. The author is explicitly choosing to match the upstream program's structure to limit the project scope, and whether that structure is good or not is almost besides the point.

btw: especially in the Rust community, shared mutability doesn't always mean shared between threads/tasks. It's also a concern in single-threaded/non-concurrent code due to aliasing. This case does actually seem to be involving multithreading given that the author mentioned "using 8 threads".


Curious, can you elaborate on what are the issues of mutability in single-threaded code?


It comes down to how a language construct allocates freedom to the programmer vs the optimizing compiler.

* Rust references are strict. &mut references guarantee that nothing else can access the backing memory while the &mut is valid. & references guarantee that nothing (including &mut by the rule above, or unsafe code with raw pointers by programmer's pinkie swear) can change the backing memory while the &mut is valid. This allows the compiler to perform more optimizations behind the scene, like assuming a value loaded into a register is still up-to-date, even if there's been a function call between.

* Rust raw pointers make no guarantees (but require the programmer to type "unsafe" to use).

* C pointers are IMHO weirder. iirc, two pointers of the same type may alias (unless you use the rare "restrict" keyword), but two pointers of different type can't, unless one is "char*".

* AFAIK, Go pointers are at least as permissive as C's. They might not make any aliasing guarantees at all.

* Other languages vary...



But why isn't Microsoft doing it? Leaving all the hard work to others?


This is a "rewrite" not a "port".


tsc relies in shared state and is slow, so they port it to a language that allows shared state.

Strange reasoning.


It sounds like you think "tsc is slow" because "it uses shared state". If that is what you're getting at, you've connected two unconnected statements.

It does tend to imply the new tsc isn't going to be able to take much more advantage of concurrency than the current one, which may put a limit on what performance gains it can expect. I don't know if the current tsc is able to do anything concurrently. (Here, "concurrently" doesn't merely mean "able to have multiple events in flight and thus do multiple tasks at the same 'time'" but truly "doing work on multiple CPUs simultaneously".) Go is generally faster than Node, but it'll be hard to guess by how much in this case.

A Rust port would be harder but would probably lead on in a direction to improve the concurrency. However, that may be an infeasible amount of work, not because of Rust per se but because a port is intrinsically easier than a port and a rearchitecting.


As other people have pointed out, it doesn't seem like the author (or anyone else, perhaps other than you) is suggesting that shared state is a significant reason that tsc is slow.

But moreover, my takeaway was that the author is making a distinction between a complete rewrite and a port, where the former seems to be any rewrite that just tries to do the same thing (but may be architecturally very different), while the latter seems to be an approach where you attempt to directly adapt the original source code into a new language on a statement-by-statement basis. The latter would likely require the target language to support the programming paradigms which are used heavily in the original source code.


The reasoning is only strange if you assume that the cause of slowness is the shared state.

CPUs have no problems with mutability. JS is slow because of dynamic typing, dispatch, flexible object layout, and an object model that requires lots and lots of heap allocation and pointer chasing.


(er, some text in the parent seems to have vanished)

Sure, if you absolutely wish it to, in an `unsafe` block, you can define two writeable pointers to the same piece of data. I don't think I've ever seen anybody write such code in Rust (except perhaps when linking to a C library that required it?), nor have I ever felt the need to do so.

Now, it is true that Rust is designed to discourage writing cyclic data structures. I can imagine that trying to port (rather than redesign) an algorithm that deeply requires cyclic data structures, that will make your life miserable.


You actually can't define two `&mut` references to the same data in Rust. It's invalid Rust code; if the compiler notices, it'll refuse to compile, and if it doesn't notice, your code will randomly break.


You can define two `*mut`, though, in unsafe code.


A better option is to use Cell or RefCell; they allow completely safe shared mutability in single-threaded code.

https://doc.rust-lang.org/std/cell/


I'm aware of that :)

I'm just trying to decypher that cryptic sentence from the OP.


You can, just not at the same time.


tsc is a large JavaScript program run at the command line, which is a pathological case for a JIT-compiled language: by the time the hot spots get native code generated, tsc may be close to finished already. Porting it to any ahead-of-time compiled language, without significant changes to the structure of the code, is likely to be a big performance win.


tsc is slow, so you port it to a faster language. tsc relies on shared state, so you use a language that supports shared state so you're just porting the same design to a new language instead of redesigning the whole program.


At first I thought this was for type checking Go code... Ya know, all that interface{} stuff...


One thing that was not addressed in the post is the option of using a GC library in Rust. I'm not a Rust expert, but to me, it seems like this makes it quite feasible to quickly port a compiler. They could wrap most complex data structures in GC and optimize them as time goes by. That being said, I'd assume that the library's GC would be slower than Go, so perhaps that's a deal-breaker.


> The more code you have, the longer it takes to compile. On medium-to-large-sized TypeScript projects, this compilation is extremely slow.

Or you could, you know, use the JS typechecker already built with large scale performance in mind (and in native language) - Flow.

I know Flow hasn’t had the bandwith to invest in supporting open source, and Typescript pretty much “won”, and there is now a large amount of library definitions without Flow support, but: If your codebase is large enough to suffer from TS’s architecture, you might have the resources to deal with the hurdles of adopting Flow. Flow still has some better designs such as nullables (?T).

That said, Flow is now being heavily optimized for large, strictly typed codebase, which means there is now no typechecker geared towards checking JS with minimal type annotations. I think that would still be useful for the prototyping stage of a project.


Last time I heard news about Flow was facebook/jest replacing it with TypeScript.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: