Hacker News new | past | comments | ask | show | jobs | submit login
PR that converts the TypeScript repo from namespaces to modules (github.com/microsoft)
338 points by Kyza on Nov 2, 2022 | hide | past | favorite | 196 comments



After this change, the TypeScript compiler will now be compiled with esbuild. I feel like thats probably the best endorsement esbuild could get, hah.

Surprising they call out the 2 space indent level that esbuild is hardcoded[1] to use as a benefit. Why not save even more bytes and re-format the output to single tab indentation? I wrote a simple script to replace the indentation with tabs. 2 indent size: 29.2MB, tabbed size: 27.3MB. 2MB more of indentation saved! Not significant after compression, but parsing time over billions of starts? Definitely worth it.

[1] https://github.com/evanw/esbuild/issues/1126


I can see this is probably a calm point that will definitely not escalate, programmers don't really care about tabs and spaces that much, right???


Heh, I always wondered what the big deal was. I'll just use whatever the company I'm working for uses and not even think about it.

To be honest, that's why I love auto formatting cause I never need to think about that stuff, I can just write code.


"Tabs vs spaces" is often misunderstood (and falsely reported) as a problem of preference.

The real problem is that using spaces for indentation is an accessibility issue.

The solution is to use tabs for indentation, and spaces for alignment.


> The real problem is that using spaces for indentation is an accessibility issue.

Not this again

https://github.com/prettier/prettier/issues/7475#issuecommen...

> Blind programmers have no problems with it.


Not everyone with vision or vision processing issues is blind. Being able to configure custom tab stops is an easy way to control what level of indentation is useful and clear.


As with all other types of data: the right approach is for model and presentation to be separable concerns.

We're struggling with the wrong problem if we aren't asking why the editor can't treat blocks as entities that are displayed however we want.



  > why the editor can't treat blocks as entities
That might with with C, but won't work with e.g. Python.


Python has syntactic blocks as well, and editors totally can (and do) muck around with their formatting. I don't see any reason why they couldn't handle customizable indentation.


The primary accessibility argument is that spaces waste precious space on braille displays.


There are perfectly good tools for that which don't require text changes.


> The solution is to use tabs for indentation, and spaces for alignment.

It's the second step that drives me nuts. Why can't/doesn't alignment happen at first step? The inconsistencies of alignment using true tabs, in a monospace plain text environment, grosses me out and decrements my faith in the underlying systems. I appreciate editors w/ settings like `insert spaces when pressing tab` and `the number of spaces a tab is equal to`.


Why would you ever mix them?! Even when I started programming 15 years ago it was already accepted wisdom that that was a terrible idea.


This is one of those things that started out as good advice, and then got turned in to an oversimplified parody of the original argument.

If you have something like:

  var (
    x     = foo
    other = bar
  )
Then clearly you should use spaces to align those "="s, no matter if you use spaces or tabs for indentation. Similarly, "hanging indents" like:

  coolFun(arg1,
          arg2)
Can only be done correctly with spaces, and it doesn't really matter if you use tabs or spaces for indentation; this will still align correct no matter the tabstop size:

  ->coolFun(arg1,
  ->        arg2)

  --->coolFun(arg1,
  --->        arg2)

  ------->coolFun(arg1,
  ------->        arg2)
If you had used tabs, changing the tab size would make it align all wrong.

The problem is when you start doing stuff like:

  if (one) {
  ------>if (two)
             bar();
  ------>else
             xxx(); 
  }
And then, depending on the tabstop size, things can start looking very misaligned and confusing; this is why Python 3 forbids mixing tabs and spaces in the same function, because it's so easy to make something appear wrong and there are no braces to clarify things.

That's what people mean with "don't mix tabs and spaces", not "if you use tabs then the space shall never appear in the file under any condition".


> If you had used tabs, changing the tab size would make it align all wrong.

This is clearly false. You said this after giving a perfect example of how it's done with tabs.


So you are saying that using a tabstop of eight to align this:

  -------->coolFun(arg1,
  -------->------->arg2)
Would still look nicely aligned with a tabstop of 4? and 2? Clearly that will not look right.


To understand tabstops properly, you need to throw away the "They always expand to N spaces." idea. If you have access to a mechanical typewriter with tabstops, then go and look at how it works (with tabs being set by protruding pins on the carriage). Tabstops in terminals in the Unix and Linux worlds work this way.

https://superuser.com/questions/710019/why-there-are-11-tabs...


If you use it as indention then "tab expands to n spaces" is actually accurate though.


You're misquoting yourself.


I don't know what your point is then, but this is turning in to a very trite conversation.


> > If you had used tabs, changing the tab size would make it align all wrong.

> This is clearly false. You said this after giving a perfect example of how it's done with tabs.

I believe the post you're replying to meant "used more tabs". That is, if you used any tabs beyond the code indentation, they would get altered when you change the indentation size and ruin the alignment.


This was indeed the misunderstanding.

When I made my original comment I thought what I was replying to was an example of "tabs to indent and spaces to align" immediately followed by a statement that the only way to use tabs is to both indent and align.


The basic idea is that you use tabs to align "blocks" (ie, multiple lines at the same indent level), and then spaces from there to align line "elements")

Aligning lines in the same block

    ____var a; // indented with a tab
    ____var b; // indented with a tab
Aligning elements of a line with the previous line

    ____var a, // indented with a tab
    ____----b, // indented with a tab, then aligned using spaces
The idea being that tabs are used where it makes sense that you could change them for preference and not have things look wonky... and then spaces are used in situations where a specific number of characters is necessary.

Another alternative is elastic tabs, where tabs are used for both of those, but are converted to an indentation/alignment number of characters semantically. I like the idea, but I've yet to see a good implementation.

Personally I'm a fan of all spaces, but that's mostly because every company I've worked at has used that.


> The real problem is that using spaces for indentation is an accessibility issue.

The problem of leading spaces not changing their size can be solved through programming.

Those programmers who have such a problem and need leading space to be rubbery, if those programmers are worthy, will solve that problem instead of complaining that everyone should adapt to them.

The use of tabs is detrimental to code bases. Whenever tabs creep in, invariably people use different tab settings and make a dog's breakfast out of the code.

Code exhibits indented sub-structures that are internally indented, but not themselves not aligned to a tab stop, but aligned with something. One kind of example is something like:

  function_name_of_random_length(lambda (a, b) {
                                     if (a < b) {
                                         indented();
                                     }
                                 },
                                 arg2);


Sometimes it can make sense to align arguments but I strongly disagree with pushing over a multi-line lambda that way.

So I'm not convinced the problem you're describing should ever happen.


While that may be true, "spaces for alignment" is nigh unsupported by all editors I've seen. They insist on replacing 8 (or N) spaces with tab even if in an alignment region.

int foobar(int a,

___________int b) // should only ever have spaces (using _ here because HTML)

But good luck finding an editor that won't insert a tab when you use the tab key here (willing to be wrong).

The other issue I have is that diffs break alignment between undented code and anything with tabs because the tabs "eat" the leading space, but unindented lines are offset by one.


I'm curious what editors you've experienced that with.

I've had some editors that made tab-space conversion an option, but it was never mandatory.


When I'm working in Git, Go, or Linux code, Vim (and NeoVim) uses `expandtab` in conjunction with `softtabstop` and converts any `tabstop` spaces into a tab (with `noexpandtab`) regardless of whether or not it is "alignment".

I've not done lots of development with other editors, but Kate didn't do it and the few times I used Visual Studio, it also had similar behaviors (I'm assuming VSCode inherited that stuff).


With a .editorconfig[1] file with the correct configs you can get most code editors to insert spaces instead of tabs.

[1] https://editorconfig.org/


What configuration is necessary for editors to insert spaces in code like this:

https://lore.kernel.org/keyrings/20221109025019.1855-2-linux...

If HN mangles the link, it is the addition of the `key_create` function in this thread: https://lore.kernel.org/keyrings/20221109025019.1855-2-linux...


Rustfmt, for one, understands how to mix tabs and spaces on the same line. I think the complication is overstated.


All JetBrains editors will put spaces.


I used to tout "tabs for indentation, spaces for alignment" until I started working primarily woth lisps. Idiomatic lisp indention isn't compatible with regular tab-stops, so the only solution is to go with spaces (maybe losing accessability) or, preferably, to go elastic.

https://nickgravgaard.com/elastic-tabstops/


In soviet python, runtime cares

(if you accidentally use a tab in a file that otherwise uses spaces, you get a runtime exception, or vise versa)


Tabs


> Why not save even more bytes and re-format the output to single tab indentation?

For your answer search "programmers who use spaces make more money"


Correlation != causation

Spaces are simply inferior to tabs since the latter conveys the meaning of "one level of indentation" while the former does not. It's also better for accessibility and file size. There is not one single logical reason to ever use spaces for indentation, not one.

For some very fucking stupid historical reason someone in the 80s made the idiotic decision of spaces being the default in editors and people just went with it. The people earning more are doing so because those are the seniors who have given up on common sense and just go with the flow of the masses who are unable to grasp "tabs for indentation, spaces for alignment" yet insist on keeping alignment so the (terrible) compromise is just using spaces. And I strongly question whether "alignment" is worth anything, in almost all cases it's just useless and in the rest you're drawing ASCII diagrams in the comments which doesn't affect your code at all.

Also see the top answer at https://www.reddit.com/r/programming/comments/8tyg4l/why_do_...


Spaces define by themselves the spacing convention unambiguously, whereas tabs is environment dependent and need other tools like editorconfig to guarantee consistency


Thats the point.

I like four character width indentation. Linus prefers eight. With tabs VIM shows me what I like and shows Linus what he likes.


How do you work with max-char-per-line limits of coding conventions ? Does code you write with tabs as 4 spaces follow the convention but then break on Linus computer ?


I personally use a vertical monitor, in Jetbrains that gets me about 130 characters on a line before scrolling, and in VIM it get me about 125. I'll break a long line into shorter lines if they begin to approach the end, but I'm not strict about it. From what's open right now I do have a few lines that get close, and one that goes over. So what? All my coworkers use horizontal monitors and even with gutters and panels open they're not complaining about my line length - and I've not had a complaint about line length since I've started in this industry in 1999. Back then there were no vertical monitors, it was all 4:3 CRTs.

As an interesting note, I now work with the second hard-of-sight developer that I've worked with in my life. They both use normal Jetbrains IDEs, with font sizes and interfaces made huge. Neither of them has ever said a word about my line length. Nor my function length. Nor my long variable and method names. Nor my explicit comments, nor my detailed git commit messages, nor my frequent git commiting, nor my branching preferences, nor my README files, nor my obsessive security practices such as validating types and character length limits on input fields and sanitizing and filtering input and my rejecting of any PR with concatenated outside-data in SQL (even if that outside-data came from the DB itself, for a recent example), and all the other idiosyncrasies that one dev suffers from another.

Line length? Not a problem. Forcing the hard-of-sight dev to indent however _I_ see fit? I wouldn't dream of doing that.


Is there really a value of a max-char rule these days? Editors can scroll, softwrap and usually do neither as screens are large anyway. A max-indent rule makes sense to highlight when you're nested too deep and should refactor, but a long line of code can either be done based on "this looks unusually long" or a formatter that has a set tabwidth.


Well stated.


No, it's incredibly stupid and is missing the point. You use tabs to not be consistent. So that the crazy js person can have a 2-wide indent and the slightly visually impaired person can have an 8-wide indent without reformatting the entire codebase


Are you visually impaired of have you received or read criticisms of this nature from visually impaired persons? Genuine curiosity, because if this is a matter of accessibility I would vouch for tabs.

Thing is that with spaces you can also predict consistently the line length and not have code being clipped on someone's screen that set tabs for 8 spaces for some reason.

In any case, I like spaces but I am impartial to either one. You seem like you have very strong opinions about the matter so I won't push the subject further.


  > Thing is that with spaces you can also predict consistently the line length and not
  > have code being clipped on someone's screen that set tabs for 8 spaces for some reason.
Spaces force you to worry how that other dev will see the code. Tabs are semantic - you set the editor to display how _you_ like to see it and that other dev sets the editor to display how he likes to see it.


I've recieved.

But I'm also the crazy person who uses an 8 wide indent anyway


Testimonial: I experienced a significant increase in earnings about 4 months after abandoning tabs for spaces. Cause not known but that’s what happened.


The decision to use a character that is 8 chars wide for indentation of a highly nested code was not a brightest idea either.

I’d love to set up terminal tabstop size, textview tabstop size, github tabstop size, IM tabstop size, HN tabstop size, git gui plugin tabstop size, issue tracker tabstop size, tsv file format tabstop size if I used tabs. It makes you so productive.


Tabs are denoted by arrows and I don't like being told what to do! :)


The arrows are not instructions. They're the LHS attacking the RHS. Code is war.


English is a bad convention too but we use it because humans don’t need the theoretically best systems to be productive, they need something good enough. Spaces are good enough, tabs are good enough, and anybody who gets emotionally invested in one vs. the other is wasting time.


> Tabs are simply inferior to spaces...

I think you wrote this the wrong way around?


Ups, yes. I had it switched originally but then for some reason switched it again. Apparently commenting late at night is not a good idea.


Also for blind programmers, tabs are much nicer for a screen reader.


What's the difference between indentation and alignment?


Alingment is leading whitespace that is expected to match the width of some non-whitespace text above, indentation is leading whitespace that is just expected to match all other indentations (* level). When indentation is scaled through editor reconfiguration, alignment should stay the same.

What I think gp is underestimating is just how much alignment there was in the old days, and how little (compared to now) indentation. From today's perspective, indentation is the norm and alignment is the rare exception. Your question is a good illustration: sounds like you never met alignment, or at least never noticed it. But back then, alignment was a very regular occurrence and the extra diligence required for getting all the tabs and blanks right to look nice on different tab widths, or the ugliness from failing to get the mix right would have been a considerable cost. Avoiding unpredictably wide tabs was a reasonable call.


I wish they had broken down that survey question further to find out _how many_ spaces the highest paid developers use. Then I could finally have a data-driven answer to put in my prettier config!


Surely, on average, it's 3 spaces of indentation. Feels great to finally be able to derive objective answers to these ages-old conundrums!


Average is actually 3.27 spaces, which is what I now set for my tab spacing so as to conform with best practices.


TLDR "Programmers who use spaces are more likely to respond to StackOverflow surveys soliciting information about their income."


That's not how statistics work, unless you assume there is a correlation between tab users salary and their willingness to respond to surveys. Which would also be an interesting find


I'm curious if you've actually worked with surveys before, because this is how statistics work. All self-reported data has a bias based on a user's willingness to reply, and considering that predisposition as a variable is required.

This is the exact reason that crime statistics are so crummy, even from anonymous surveys. Very few are compelled to admit to a crime that they haven't been convicted of (and some won't even to admit to that). Or why post-sale NPS scores have a negative bent - you're more likely to respond if you have a complaint.


This was actually a significant issue in a large PHP codebase I used to work on. Client hired a new guy who insisted that we convert everything to spaces, and suddenly it took about twice as long to check the thing out from Subversion.


Someone who comes onto a project and actually wants to charge money to sit there and convert tabs to spaces or vice versa. Incredible.


My attitude is generally "Which one, pick one, this one, classic"


Or like, cool, this is the code style. I don't care if it's sublime or stinks to heaven. When we get to a total rewrite we'll address that. Which features are you hiring me to implement? Which bugs need to be fixed? How can I not waste my time or yours?

Seriously, this takes yak shaving to a whole new level.


That doesn't make sense that spaces vs tabs would result in a 2x longer checkout. Something else is at play if that is the case.


He did mention it was subversion


I see your confusion. They said Subversion with a capital S. So the source control system, not the covert destruction of a dev team from within. Easy mistake to make. (... Stupid git.)


Maybe the codebase was one giant index.php file with an average of 20 levels of nested conditionals and open curly braces on a new line.


Why not suggest this in the PR? Seems like it could be used in pipeline to decrease size further down


Why a tab when 1 space will do?


Tab is ASCII 9 while space is 32. Tabs, having the lower number, are therefore obviously cheaper.


Tabs require 2 set bits. Space requires only a single set bit. Spaces therefore requires less electricity.


Well, just a bit...


lol


Reminds me of Silicon Valley (HBO) where Richard uses “we are a compression company” to justify using tabs over spaces. Ironically once gzip compressed I doubt it would make any difference.


Why have and white space at all in that case?

Even on an absolutely gigantic codebase using tabs or spaces will make almost no difference to build or type-checking times. Building an AST is much more overhead than white space considerations and once it’s an AST tabs or spaces are not included in the running of the code.


Does that mean they are not using type checking? That’s the really really slow part of writing TS and es build doesn’t include it, which is why I’ve never seen the point of using esbuild as a compiler.


We are still type checking, it's just not needed as a dependency for our JS outputs. Type checking still happens in tests, and I have CI tasks and VS Code watch tasks which will make sure we are still type checking.


Thanks for the reply!

So when your team is compiling locally they don’t type check? It only runs in the IDE and during tests?


It's not perfectly cut and dry, but mostly. We still need to emit d.ts files for our public API, and the only thing that can do that is tsc, which will type check.

But I tried my best to make the build have fast paths to minimize the development loop as much as possible.


> Finally, as a result of both of the previous performance improvements (faster code and less of it), tsc.js is 30% faster to start and typescript.js (our public API) is 10% faster to import. As we improve performance and code size, these numbers are likely to improve. We now include these metrics in our benchmarks to track over time and in relevant PRs.

> [...]

> The TypeScript package now targets ES2018. Prior to 5.0, our package targeted ES5 syntax and the ES2015 library, however, esbuild has a hard minimum syntax target of ES2015 (aka ES6). ES2018 was chosen as a balance between compatibility with older environments and access to more modern syntax and library features

I'd be curious as to what percentage of the improvement comes from modules vs comes from a different target.


I'm curious about that too.

From my superficial knowledge of compilers, "modularization" itself should not make code faster, if anything slower. There'll always be some overhead of loading modules and communicating between them, not?

I presume, from my own experience when building software (not compilers), that modules allow for a much easier to reason about, much better isolated (cohesion, loose coupling). And therefore for much easier improvements inside the module. I would presume that, here too, modules allowed them to improve the inner workings much better, allowing for the performance increase. Or am I completely misunderstanding this feature?


There are some key things here that maybe weren't clearly stated in my writeup.

Firstly, the old codebase is TS namespaces, which compile down to IIFEs that push properties onto objects. Each file that declares that namespace is its own IIFE, and so every access to other files incurs the overhead of a property access.

With modules, tooling like esbuild, rollup, can now actually see those dependencies (now they are standard ES module imports) and optimize access to them. In this PR's case, the main boost comes from scope hoisting.

For example, in one file, we may declare the helper `isIdentifier`. In namespaces, we would write `isIdentifier` in another file, but this would at emit time turn into `ts.isIdentifier`, which is slower. Now, we import that helper, and then esbuild (or rollup) can see that exact symbol. All of the helpers get pulled to the top of the output bundle, and calls to those helpers are direct.

That's why modules gives us a boost. There's also more (modules means we can use tooling to tree shake the output, and smaller bundles are faster to load), but the hoisting is the big thing.


Maybe they could update esbuild to be aware of TS namespaces instead?


I think it's the case that modules are the future; but the main focus of this change was not actually performance at all. We've been wanting to be able to dogfood the modules experience (used by most TS devs) for a long time. The fact that it turns out to be so much faster is a really great side effect.


TS namespaces are a really interesting relic of Typescript pre-1.0 in the bad old "jQuery era" before ECMAScript Modules and even before Node CommonJS modules were that dominant and the two most common "module" formats in the browser were "no module at all" and less beloved AMD [RIP]. Typescript namespaces were based on one of the IIFE approaches to "no module at all" smashing a sometimes large codebase into a single global variable, jQuery style.

Most Typescript projects today wouldn't use TS namespaces if you paid them too. It's a backwards "module format" that the modern web and modern Node (and Deno) is trying to leave behind. Several issues and PRs have been filed on Typescript to drop namespaces as a first-class syntax altogether because it unnecessarily confuses newcomers and shouldn't be used in new code in 2022, but there are some major pre-1.0 Typescript projects that still need them for legacy reasons. Typescript itself bootstrapped itself with itself and was one of those large projects with such a legacy dependency, hah. (From the PR you can see precisely how much tech debt that this has left in Typescript's own codebase!)

So, long story short: esbuild doing a bunch of work to support jQuery-era IIFE patterns is maybe not the best use of esbuild developers' time in 2022.


There are two performance implications of "modularization": initialization-time and run-time.

You are correct that initializing many modules is usually slower than initializing one module [1]. However, bundling puts all modules into one file, so this PR doesn't actually change anything here. Both before and after this PR, the TypeScript compiler will be published as a single file.

At run-time, switching to ES modules from another JavaScript module system can be a significant performance improvement because it removes the overhead of communicating between them. Other module systems (e.g. TypeScript namespaces, CommonJS modules) use dynamic property accesses to reference identifiers in other modules while ES modules use static binding to reference the identifiers in other modules directly. Dynamic property access can be a big performance penalty in a large code base. Here's an example of the performance improvement that switching to ES modules alone can bring: https://github.com/microsoft/TypeScript/issues/39247.

[1] This is almost always true. A random exception to this is that some buggy compilers have O(n^2) behavior with respect to the number of certain kinds of symbols in a scope, so having too many of those symbols in a single scope can get really slow (and thus splitting your code into separate modules may actually improve initialization time). This issue is most severe in old versions of JavaScriptCore: https://github.com/evanw/esbuild/issues/478. When bundling, esbuild deliberately modifies the code to avoid the JavaScript features that cause this behavior.


> From my superficial knowledge of compilers, "modularization" itself should not make code faster, if anything slower. There'll always be some overhead of loading modules and communicating between them, not?

I think this is a misunderstanding of what actually happened.

TypeScript has a thing called “namespaces” and a thing called “modules”. Both provide modularization. The TS repo is not being modularized, instead, the namespaces are getting converted to modules.

Namespaces are an old-school approach to writing a module in JavaScript. You pack all of your exports into a JS object, and then access the object from somewhere else. This works, but JS is dynamic, and the runtime has no way to guarantee that you won’t mess with this object (replace functions or whatnot).

Modules don’t have this object. You just call the function, instantiate the class, or do whatever else with the names you imported. They are resolved statically, so certain optimizations become more “obvious”, like inlining.


For ES6 modules, the exports object is frozen (made read-only) so the JIT can make some extra assumptions and optimizations. With bundles, unless the bundler inserts `Object.freeze` around `module.exports`, they have to be treated as dynamic objects.


A little while ago I asked (https://news.ycombinator.com/item?id=33051021):

I’m curious, how many people are using TSC only for type-checking, and a different system (eg esbuild or ts-node) to actually compile/bundle/execute their code?

Looks like my suspicion was correct; not even tsc uses tsc!


Probably the majority since popular frameworks like NextJS do it with SWC now.


The default configurations for Create-React-App and others use babel for type stripping today.

This seems to me like a great "win" for Typescript that so many tools just natively handle TS type stripping and that so much Typescript today only needs type stripping and doesn't need other parts of TS emit processes (or tslib).


> a change in the indentation used in our bundle files (4 spaces -> 2 spaces)

I find it interesting that one of the reasons given for the reduction in package size is due to such a simple indentation change from 4 spaces to 2 spaces.

Not interesting that 2 bytes are less than 4 bytes, rather, TypeScript is a large project and it would be interesting to know how much size was saved from this one specific change? Seems like a trivial change, so why not do it sooner? And assuming readability isn't required in the bundle output why not bundle with no indentation at all and put everything on a single line, would this not be even smaller again?


> Seems like a trivial change, so why not do it sooner?

Re: indentation: Literally, no one thought of it, as far as anyone can tell. Linus's law appears to have its limits.


Most minifiers already put things on one line, though.


TS has some unique restrictions due to downstream patching of our package; my PR description briefly talks about this as something we can try to improve in the future. Minification absolutely would save a lot more size out of our package, but I was not willing to change that in this PR.


Kind of funny because one of the benefits of tabs vs spaces that people laugh off is that it saves space.

I think it's probably correct to laugh this off though. Why would you care about the non-minified/gzipped size this much?


TS devs still debating this in 2022? And i thought php devs are ridiculous for still debating setters and getters.


They aren't debating it. And I don't think why you'd think setters and getters are not worth debating. There are significant downsides to them (e.g. they make Dart's sound nullable support more annoying).


Bravo. This must have been painful. Super excited to use a faster tsc. That will make a huge difference in our products. Thank you.


I absolutely hate how with Typescript and ES Modules, if you have a file

utils/foo.ts

you have to import it as

import Foo from "utils/foo.js"

Even though there is no .js file on disk, and you might be running ts-node or whatever that doesn't build a .js file.

Importing a file that "doesn't exist" is so counterintuitive.

In addition all code breaks because you have to change all your imports, and /index.ts or /index.js won't work either.


Every TypeScript project I have worked on either:

1) enforces no extension, e.g. “utils/foo”, or

2) allows TS extensions, e.g. “utils/foo.ts”

I have never imported a TS file using a JS extension. Maybe your woes could be fixed with a configuration change?


No, you were using non-standard ESM modules (compiled to CommonJS defined by babel) Typescript recently added support for ESM compatible with node.js see "module": "node16"[1][2]

The Whole ESM saga is clusterfuck, not much better than python 2 -> 3 migration. Large node.js codebases have no viable path to migrate, and most tools still cannot support ESM properly[3]. Stuff is already breaking because prolific library authors are switching to ESM.

As someone that maintain large part of TS/JS tooling in my day job, I absolutely despise decisions made by node.js module team. My side projects are now in Elixir and zig because these communities care about DX.

  [1] https://nodejs.org/api/esm.html#differences-between-es-modules-and-commonjs
  [2] https://www.typescriptlang.org/docs/handbook/release-notes/typescript-4-7.html
  [3] https://github.com/facebook/jest/issues/9430


Yeah it's pretty ugly. This whole thing is a prime example of those cases in which maintainers for mostly arbitrary reasons decide on something and then absolutely ignore all the massive negative feedback they get for this.

They'll cite some nebulous technical reasons of why it has to be this way, but if you offer a PR that actually solves the issue that the community complains about, they'll reject it.

In this case they decided that tsc won't transpile imports. They just did and it's "policy" and it can't be changed. It doesn't matter if it is awful for compatibility, developer experience, etc. It's just the policy. Issues will be closed. And no, even an optional flag to transpile imports is off the table, even if you write the PR for it.

There are many many issues opened related to this in github, but to give an example

https://github.com/microsoft/TypeScript/issues/16577


It is complicated but most user anger should be directed to node.js module group[1]. TS is forced to follow node.js standard.

[1] https://github.com/nodejs/modules/issues/323


Hmm, what is it that Node is doing that’s so bad? I don’t understand why that issue is a big problem.

This explanation in the comments makes sense to me:

Transpilers can add the ability to add extensions at compile time, so a specifier like './file' can be rewritten to './file.js' during compilation along with whatever else is getting converted by the transpiler.

It seems sensible for Node to expect fully-qualified imports, just like a browser would. And (to me) it seems sensible that in a language like TypeScript you should be able to import “foo.ts” and it have it transpiled to the correct filename.

Now, that does not work in TS because they adamantly refuse to modify any of the emitted JavaScript code at all, with no clear explanation except that it’s long-standing policy. Instead they expect you to import “foo.js” in TypeScript, even though that file doesn’t exist until after compilation. That’s a problem, and it seems like it’s caused by the TS team, not Node.


Browsers never required extensions, see https://unpkg.com/ You can load scripts in browser just fine without extension.

> Now, that does not work in TS because they adamantly refuse to modify any of the emitted JavaScript code at all, with no clear explanation except that it’s long-standing policy. Instead they expect you to import “foo.js” in TypeScript, even though that file doesn’t exist until after compilation. That’s a problem, and it seems like it’s caused by the TS team, not Node.

I think they should provide a better explainer. But node.js resolution algorithms is already incredibly complicated, and adding path rewriting to typescript is not going to make it better. There are things like dynamic import and third part libraries. Typescript would need to either analyze whole of project node_modules or bundle custom runtime resolver like webpack breaking compact with deno and friends.

Imagine situation:

  import lib from 'somelib/subpath'
How TS would know that some lib have extension in subpath and it need to add/remove js ext? https://nodejs.org/api/packages.html#extensions-in-subpaths What if typescript is running in deno/bum or wasm?

> Hmm, what is it that Node is doing that’s so bad? I don’t understand why that issue is a big problem.

My conclusion is that successful projects without BDFL are prone to corporate takeovers. You have people that working in corporations without writing code and want to make political career as "core" team member of project.


TS can already perform complicated refactorings, such as renaming all imports in a project when you rename a file in vscode. So they definitely have all the information they need already.

Node requiring the .js is more understandable, as that actually affects runtime performance: if the name isn’t fully qualified in source then node needs to stat() more paths during resolution.

> What if typescript is running in deno/bum or wasm?

Every TS project already needs a tsconfig.json to specify what permutation of module system configurations it is using.


> Node requiring the .js is more understandable, as that actually affects runtime performance: if the name isn’t fully qualified in source then node needs to stat() more paths during resolution.

Performance is affected only during startup by maybe few milliseconds. This is a major breaking change to module resolution made by node.js. Why TS should paper over change that was made by node.js.

> TS can already perform complicated refactorings, such as renaming all imports in a project when you rename a file in vscode. So they definitely have all the information they need already.

Some things are supported by tsserver not typescript compiler. Fixing this in TS is only addressing the problem partially and is shifting the problem into tools developers.


The complication here is that Node in part decided on needing fully-qualified imports not just to better align with the browser, but also to use multiple file extensions in a complex signaling process (ETA: which the browser uses mime-types and metadata like type="module" for rather than file extensions): the file extension could be any of .js, .cjs, and .mjs, and then depending on package.json and some other metadata, Node may load the various files in all sorts of different ways.

Typescript decided that they don't always know what output file extension you may need because Node made that logic way too complex and they don't want to just reimplement their own buggy version of Node's loader mechanics but "backwards" in a terrible "guess the file extension" game, so they went with the easiest option which was "user now has to tell us the file extension".

Even if Typescript didn't have a policy to reduce the number of modifications it emits, it's still Node's fault that the file extensions now have three options with an extremely complex dance between them and getting a "guess the file extension" game right would be an extended PITA.


If I’m in a TypeScript file and I import “foo.ts”, why can’t TSC just rewrite that to whatever filename TSC will emit when it compiles foo.ts?

If you’re just using TSC to typecheck and not emitting code (which is what most people actually do in practice), it’s even easier -- just let me import “foo.ts” if that’s the name of a file that exists. The popular bundlers can all handle that just fine.


Because TSC isn't your bundler. It's job is different from a bundler. A bundler runs under the assumption that everything you import it has to find and handle. Typescript only needs to find a definition for an import.

Typescript may have a complete view of a single project, in which case yes, it should know what file type it is emitting in that project, but then it has to track "in project" imports differently from "out of project" imports and needs two different behaviors for those.

All of that gets further complicated by incremental builds and multi-project references and multi-project references with incremental builds.

Which isn't to say that it isn't technically solvable, and maybe "two behaviors" is an alright developer experience even if it would confuse so many new users, it's just that there are a lot of obvious complications in the face of it.

It's also not like they haven't been trying to work on it. Other comments in this thread have pointed to at least one Typescript issue on it. At one point Previews supported a version of this but rather than using the file-type of the current package's emit it relied on the new paired TS file types: .ts => .js, .mts => .mjs, .cts => .cjs. If you imported a ".ts" it always assumed you were importing ".js" and if you imported a ".mts" it always assumed you were importing a ".mjs". There were a lot of complications even with that simple "one experience", but even that experience was terrible, fell down in complications with Node's loader and various bundlers, and had too many bugs. So it was pulled from Previews.


I realise it’s not trivial, but it’s really hard to believe it’s that difficult!

When I look at the multiple(!) issues on TS GitHub asking “please, can we just import .ts files with a .ts extension? That would make life a lot easier”, the comments from the developers pushing back on it aren’t about Node integration issues, they’re about the unshakable principle that TS must never rewrite syntactically valid JavaScript at all.

Make it work within a single project, at least, and leave external projects for later.

Edit to add:

There were a lot of complications even with that simple "one experience", but even that experience was terrible, fell down in complications with Node's loader and various bundlers, and had too many bugs. So it was pulled from Previews.

Do you have a link handy for that discussion? I’d be interested to read it.


> they’re about the unshakable principle that TS must never rewrite syntactically valid JavaScript at all.

Which is starting to be a very useful principle of Typescript. Typescript knows that today it is not your bundler and tries to leave almost all rewriting to your bundler. That leaves bundlers able to strip types without even needing a direct dependency on Typescript. This is why tools like swc and esbuild written entirely in other languages now type strip as well.

This is also why the Typescript team has been a proponent for a proposal to add type stripping (or something like it) to the entire web platform. (There's a Stage 1 proposal in TC-39's process.)

You may not find that useful, but there's a growing ecosystem around "Typescript is just for types, not also for deeper transpilation". It is not just TS developer being "high and mighty" in the face of things you think they could make the developer experience easier on.


I think we’re violently agreeing?

If TypeScript isn’t the bundler, it shouldn’t complain (as it currently does) when you import a “.ts” file. It shouldn’t care at all!


Elixir + Phoenix is so much better :) we have some apps running in production, some are 6 years old.


And that's how it should work imo. But if you enable esm (which you might need in the future because of packages being esm only) you can't use those, only .js.

That's because typescript developers are dead set that they don't want to transpile the imports, they just want to copy paste them into the resulting file when running tsc.


This could change with an ongoing work to allow ts extension [1].

[1] https://github.com/microsoft/TypeScript/issues/37582


You'll only be able to use that if you are using the non-default `--noEmit` configuration, which requires the use of some other tool to strip types. Even after that change, using `tsc` to strip types will still force you to write `.js` in your imports.


That doesn’t look like an issue that’s going to be resolved any time soon. Lots of comments which read like people digging in their heels to preserve the current behavior.


It was in the 4.9 iteration plan [1], and is now in the 5.0 iteration plan [2]. They also talk of this in a design meeting [3] and a member of TS team seems committed to it.

[1] https://github.com/microsoft/TypeScript/issues/50457

[2] https://github.com/microsoft/TypeScript/issues/51362

[3] https://github.com/microsoft/TypeScript/issues/51302


Well, I would love to see it, so I hope I’m wrong and you’re right!


From my understanding, this is a fairly new predicament, for projects that target ESM (type module in package.json) instead of the default CJS


have you worked on Typescript projects using ES modules? What you’ve described is the status quo for CommonJS modules, but doesn’t work when you switch to ESM (afaik, at least)


No, it’s new, to comply with new stuff from nodejs.

You can likely change it with a config, but do note that importing using .js will be the new standard way of doing things and by changing it through configuration you’re chosing to not follow the new standard.

It’s a complete mess IMO. Every project uses a different way to handle modules and there’s a lot of rough edges.


Agree completely. It also makes interop between Node and Deno more painful. But there is hope on the horizon. :)

See https://github.com/microsoft/TypeScript/issues/37582 which is referenced in the 4.9 Iteration Plan as "Support .ts as a Module Specifier for Bundler/Loader Scenarios": https://github.com/microsoft/TypeScript/issues/50457


But that doesn't seem like it would fix the typical flow, there still would be no transpiling of imports. So yes you could use .ts in ts-node, but you would have to use .js in tsc. Which is pretty awful (you want your code to work with both).


Actually looks to be tabled for TypeScript 5.0, release date of 'March 14th' next year.


A thousand times this. It's not only the dumbest thing I've seen a programming language do, it's also dumbest thing I've seen in the JS ecosystem. Ended up having to implement an AST-based post-processor to fix packages before publishing them.


Where are you seeing this? The PR that this thread is about doesn't have this quirk. Maybe your setup has some issue...?


Their complaints are unrelated to this specific PR.

See https://www.typescriptlang.org/docs/handbook/esm-node.html for details about how import paths work in CommonJS vs ESM. In both cases the import path you write in your source code is the same import path that is used in the emitted JavaScript. What's different is that NodeJS's ESM implementation doesn't allow extensionless import paths (but its CommonJS implementation does).


Does anyone have any insight about how to coordinate this kind of change to a large project? This kind of change touches literally every file, so every branch will have merge conflicts. The best idea I can think of is to announce the date ahead of time and make every contributor rebase their branches on the day of the merge. But there has to be a better way.

> | edit [–] | on: PR that converts the TypeScript repo from namespac...


Surprisingly, at least for this PR, solving merge conflicts turns out to not be too hard. By not squash merging it, we can have a single commit that unindents the codebase all in one go (and the commit is in the tree), which means that every line has a clear path back to the current state of the main branch. (And crucially, we can make git blame not point every line to me...)

Potentially, an approach like this might be applicable to other changes; I have a commit in my stack which moves the old build system config to the new build system config's path (even though it's wrong), as git does a much better job understanding where the code is going if you help it.


Jake Bailey was one of the best TAs I ever had in college. It's great to see his name behind this.


I miss the Piazza days...

Thank you for the kind words!


Thanks for doing this!

I reported typescript install size issue back in 2018 and changing to modules seemed to have the biggest impact here!

https://github.com/microsoft/TypeScript/issues/23339

For anyone curious, TS 1.0 was 7MB and today it’s 65MB.

https://packagephobia.com/result?p=typescript%401.0.1%2Ctype...

Really excited to see this number move in the downward direction for a change :)


These days, it's tracked at https://github.com/microsoft/TypeScript/issues/27891.

I have a gameplan to drop this by another 7 MB (by turning our executables into ESM), probably for 5.0 as well if we decide that Node versions older than 12 are worth dropping.


This opens up the door to going in there and replacing the slow parts with Rust.

There is a lot of interest in making typescript fast but all of them want to do a rewrite. Swapping the slow parts seems a lot more viable


Sucrase is proof that JS is not the problem when it comes to slow performance. JS is not slow. NodeJS is not slow. It's the code that is slow. All these people wanting to write it in Rust or Go or XYZ programming language need to acknowledge this.

Yes, multithreading is awesome and really helpful but it's the cherry on top, not the whole thing. If the same amount of effort was put into optimizing the TSC codebase as it is being spent to rewrite it in Rust, I have no doubt that it can become faster. Perhaps it'll require some big changes but it won't create compatibility concerns and it won't be a cat-and-mouse race between the Rust version and the TS version.

I don't think "Write it in Rust" is always the solution to fast programs. Rust itself can be pretty damn slow if you don't keep performance in mind. That is why you have to optimize and profile and optimize over and over again. Can't the same be done for TSC?

I think the biggest reason devs don't do this is because no one likes profiling and optimizing since it is a slow and boring task. Rewriting is so exciting! It's the thing you do when you are tired of maintaining the old codebase. So just ditch it and rewrite it in Rust.

I have nothing against Rust, mind you. I love what it has done but I don't think rewriting everything is either feasible or even the solution. And waiting for that to happen for every slow tool out there is utter foolishness.


Yes, you technically can write high-perf JavaScript code...

https://github.com/alangpierce/sucrase/blob/153fa5bf7603b9a5...

But freaking SIGH I don't want to. I prefer coding in environments that don't require unrolling loops by hand.


There was a back and forth on that subject a few years back: Mozilla rewrote their sourcemap parser from JS to naïve Rust (/WASM) for a 5.89x gain, mraleph objected to the need and went through an epic bout of algorithmic and micro-optimisations to get the pure JS to a similar level, then the algorithmic optimisations were reimported into the Rust version for 3x further gain (and much lower variance than the pure JS).

https://hacks.mozilla.org/2018/01/oxidizing-source-maps-with...

https://mrale.ph/blog/2018/02/03/maybe-you-dont-need-rust-to...

https://fitzgeraldnick.com/2018/02/26/speed-without-wizardry...


Sucrase consciously and deliberately breaks compatibility. Which, to be clear, isn't necessarily a bad thing for some use cases. But you can't really generalize from that to a tool like tsc where this isn't an option. There might be a performance ceiling here that can only be surpassed with a different language.


I suspect you have a point, given this line from the Sucrase readme:

> Because of this smaller scope, Sucrase can get away with an architecture that is much more performant but less extensible and maintainable. Sucrase's parser is forked from Babel's parser (so Sucrase is indebted to Babel and wouldn't be possible without it) and trims it down to a focused subset of what Babel solves. If it fits your use case, hopefully Sucrase can speed up your development experience!


I rewrote a card game engine, openEtG, from JS to Rust. It was a pretty 1:1 rewrite. 2x improvement even when I was using HashMap everywhere. I've since entirely removed HashMap, which has only further improved perf

As a bonus, the code is now statically typed


I suppose it depends on your use case, but I don't really consider 2x to be a significant difference. Between programming languages we often speak in orders of magnitude.

If JS is only half the speed of a compiled language like Rust, that shows remarkably optimized performance.


Twice as fast is a big deal for damn near any program except like...unimportant, already slow background tasks or things that are already exceptionally fast. Cutting the frame render time in half could give you twice the framerate. Twice the performance on a server could let you handle twice as many concurrent sessions (and possible run half as many servers!). People regularly fight tooth and nail to squeeze 10% performance boosts on critical tasks, doubling it would be incredible


Plenty of game engines are already spending less than a millisecond of CPU time per frame in their own code, so 2x one way or the other makes almost no difference.

Things don't need to be "exceptionally" fast to be in the area where programming language doesn't really matter.

> Twice the performance on a server could let you handle twice as many concurrent sessions (and possible run half as many servers!)

Which might matter, or it might not. Very situational.

> People regularly fight tooth and nail to squeeze 10% performance boosts on critical tasks, doubling it would be incredible

That kind of task is a small fraction of tasks. And often you're best off using a library, which can often make its own language choices independent of yours.


JIT/interpreted languages (Java, JS, Python etc.) cannot compete with optimized code from compiled languages.

The tradeoff is that Rust is lower-level, so it is harder to write. If performance were the only point of comparison for a language, then we'd all be using assembly. We choose to trade performance for productivity when we're able to.


Average node.js app is slow. It requires godlike understanding/experience to make JS perform well. That why you see people re-writing JS to rust and go and zig, for example SWC, turbopack, esbuild and rome. For most use cases JS is plenty fast but average go code will be faster and easier to maintain.

As I am getting older, I do not want to spend my weekend learning about new features in next.js v13[1]or rewriting tests from enzyme to RTL[2]. I want to use programming language that value its users time and focus on developer experience.

[1] https://www.youtube.com/watch?v=_w0Ikk4JY7U [2] https://dev.to/wojtekmaj/enzyme-is-dead-now-what-ekl


When you write Chrome itself in JS, then you can talk to me about performance.


Not sure if serious or not, but Firefox is a long-standing example of this. It has always been a mixed C++/JS codebase. (Since before it was even called Firefox, that is, though nowadays, it's also Rust, too.) I routinely point this out in response to complaints about the slowness attributed to e.g. "Electron". JS programs were plenty fast enough even on sub-GHz machines before JS was ever JITted. It's almost never the case that a program having been written in JS is the problem; it's the crummy code in that program. When people experience Electron's slowness, what they're actually experiencing is the generally low quality of the corpus that's available through NPM.

Arguably, the real problem is that GCC et al are enablers for poorly written programs, because no matter how mediocre a program you compile with them, they tend to do a good job making those programs feel like they're performance-tuned. Today's trendier technology stacks don't let you get away with the same thing nearly as much—squirting hundreds or thousands of mediocre transitive dependencies (that are probably simultaneously over- and under-engineered) through V8 is something that works well only up to a point, and then it eventually catches up with you.

Besides, there's no such thing as a fast or slow language, only fast and slow language implementations.


AFAIK all of the major browsers (and other JS runtimes) have implemented some performance-sensitive APIs in JS, specifically because it performs better than crossing the JS<->native boundary. Granted that’s usually specifically about JS API performance, but that’s a lot of where performance matters in a JS host environment.


How far can you get with JS/other interpreted things in e.g. optimizing for cache access etc? Sounds like you're at the mercy of the JIT compiler (which may go far, but still).


The interesting question to ask is whether these Rust rewrites are really taking advantage of cache optimization or if they are making simplifying assumptions that the canonical implementation cannot. In the latter case, Rust isn't the root of the performance difference and a JS rewrite can make most of those simplifying assumptions


JavaScript does not have any input/output (IO/syscalls) all those functions like reading a file, socket, etc needs to be implemented in the runtime language, like the browser, Node.JS, ASP, etc. So you are at the mercy of the runtime executable. The JS JIT slows down startup time as the JavaScript first have to be parsed and compiled before running, but the runtime can then cache the compiled version for faster startups. When JavaScript was new it was slow, for example string concatenation was slow, but most JS engines now implement string concatenation with a "rope" structure making it very fast. v8 (Chrome) and Spidermonkey (Firefox) have got a lot of optimizations over the years, so if you are doing things like incrementing a number in a loop it will be just as optimized as something written in a systems level language.


This can vary a ton as well. I was talking to a friend yesterday who said he was pounding a native browser interface with an iterator and experiencing slow performance. He switched to buffering the entire thing into memory first and experienced huge performance gains.

The aspect of the language you're using, if optimized, is virtually always optimized for the most common use-case. If your use-case isn't the most common use-case, you must account for this.


With stuff like TypedArrays pretty far. Where JS has problems is optimising in the face of the things you can do with the language and the current difficulty in multithreaded implementations.


As someone that contributed to swc-cli, surcease benchmarks are pretty bad. SWC run in sync mode, blocking main thread, in addition they are not using benchmark.js or isolated tests.

See in my fork, swc is winning :P https://github.com/chyzwar/sucrase

Once swc-cli is re-written in rust with better IO results could be even more impressive.


which proof? their faked benchmark?


> TL;DR: The TypeScript compiler is now implemented internally with modules, not namespaces. The compiler is now 10-25% faster. tsc is now 30% faster to start. Our npm package is now 43% smaller. More improvements are in the works.


The headline is pretty misleading. The TypeScript compiler isn't implemented with modules. This is an open pull request which would make it that way.


It's a PR that will be merged within the next week or so, and has been scheduled in the TS roadmap for quite a while. It's happening.


Hi Ryan, thanks for chiming in ! I just wanted to make it clear that the current HEAD doesn't implement this but maybe I'm just nitpicking.


Oops, I didn't mean to mislead. I updated the title to be more clear.


Thank you


[flagged]


I think tooling of one programming language should be written in the programming language. Otherwise the community will hardly be involved.

All these rust tools may be slight faster, but typescript developers will not learn rust to improve the typescript compiler.


If you think of typescript as "tooling" that makes sense.

But if you think of it as compiler, or transpiler, not so much. Python, Ruby etc "compilers" (runtimes, really) aren't written in Python or Ruby. Or rather, there are alternatives around, but they aren't the fastest nor the best. Such "tools" are built in C. As are compilers for C++, Rust or most languages really.


I understand your point. Compilers and interpreters are traditionally written in C/C++, but we don't have to do the same than before.


What's the conversion rate on that though?

For smaller languages that are much less supported that makes much more sense.

For bigger languages we treat it much more as a black box.


> but typescript developers will not learn rust to improve the typescript compiler.

Why not? TS has much more in common with Rust than with JS, in terms of typing. Compiler engineers are not that prevalent in the community, but they are much more prevalent in the systems engineering world which Rust is a part of, so I'd assume it's actually more beneficial to have a compiler in Rust than in TS because it would mean more people interested in contributing.


> TS has much more in common with Rust than with JS, in terms of typing.

TS has much more in common with JS than it does with Rust in every other comparison

Every TS dev will know at least a little bit of JS. Not every TS dev will know Rust at all. They're very different languages.


Compiler engineers will be far more familiar with Rust, C++, C and other similar languages than with JS, so it doesn't really matter whether TS and JS are similar as long as compiler engineers can use their preferred language to build the compiler.


I think a performance improvement in anything existing is great. Especially for something as big as TSC. It has such a large amount of usage that many people will appreciate. While it would be great to have it reimplemented effectively in something inherently faster, not improving what we have while it's clearly still the best option wouldn't be helpful.


> I don't care. I care when it's implemented in Rust.

What does this add to the discussion? This comment serves nothing except to be negative to an author who has made a great improvement.


In every single one of these cases, it begs the question whether "JS" is the problem or "backwards compatibility". There are a huge number of inefficiencies that can be fixed with a full rewrite.

There was a recent conversation over ViteJS (a pure JS bundler) vs rust-based tooling, and when you dig into the numbers the real difference is SWC vs Babel. It raises the question whether a new transpiler written in JS can be competitive with Rust, but it's unclear if anyone tried.


Sucrase is written in JS and boasts the highest line throughput of any competing transpiler

https://github.com/alangpierce/sucrase

            Time            Speed
Sucrase 0.57 seconds 636975 lines per second

swc 1.19 seconds 304526 lines per second

esbuild 1.45 seconds 248692 lines per second

TypeScript 8.98 seconds 40240 lines per second

Babel 9.18 seconds 39366 lines per second


but the benchmark is stupid: https://github.com/alangpierce/sucrase/blob/main/benchmark/b...

> Like all JavaScript code run in V8, Sucrase runs more slowly at first, then gets faster as the just-in-time compiler applies more optimizations. From a rough measurement, Sucrase is about 2x faster after running for 3 seconds than after running for 1 second. swc (written in Rust) and esbuild (written in Go) don't have this effect because they're pre-compiled, so comparing them with Sucrase gets significantly different results depending on how large of a codebase is being tested and whether each compiler is allowed a "warm up" period before the benchmark is run.

(worse it disables esbuild and swc's multi-threading... https://github.com/alangpierce/sucrase/blob/main/benchmark/b... https://github.com/alangpierce/sucrase/blob/main/benchmark/b...)

fake it till ya make it.

it's like saying "if I disable everything and wait for 5 minutes it's faster"


Hi, Sucrase author here.

To be clear, the benchmark in the README does not allow JIT warm-up. The Sucrase numbers would be better if it did. From testing just now (add `warmUp: true` to `benchmarkJest`), Sucrase is a little over 3x faster than swc if you allow warm-up, but it seemed unfair to disregard warm-up for the comparison in the README.

It's certainly fair to debate whether 360k lines of code is a realistic codebase size for the benchmark; the higher-scale the test case, the better Sucrase looks.

> worse it disables esbuild and swc's multi-threading

At some point I'm hoping to update the README benchmark to run all tools in parallel, which should be more convincing despite the added variability: https://github.com/alangpierce/sucrase/issues/730 . In an ideal environment, the results are pretty much the same as a per-core benchmark, but I do expect that Node's parallelism overhead and the JIT warm-up cost across many cores would make Sucrase less competitive than the current numbers.


I don't see how this is a rebuttal to the claim. 636,975 is more than 2x 304,526, so assuming the quoted paragraph is correct, sucrase is still the highest-throughput transpiler even during its warm-up phase. Probably this isn't true for the first 100 milliseconds of execution during the very first warm-up stages, but if the transpile phase is that short, it's basically irrelevant anyway.


the warm-up phase is the whole reason why esbuild and swc exists btw. and also the sample is basically a small hello world. any real project would do a little more and the jit would not optimize as much.

also 2x 300k is not really the way how multi-threading/concurrency/parallelism works... especially not golang, which should not be run with GOMAXPROCS=1....


As far as I can follow your argument, it seems to be that the native tools perform better on the small inputs (that only ever take a second or three to complete), and that the creators ("maintainers"?) of Sucrase have their thumb on the scale or something, since they emphasize in their results the very large inputs that will involve longer running jobs where their tool has had a chance to warm up. In other words, the tools you're defending only do better when it doesn't really matter, and the tool you're downplaying outperforms the competition on the jobs where it actually does matter. If anything is backwards (or "stupid", as you put it) then that seems to be it.

Maybe I'm misunderstanding. In that case, please provide a better benchmark.


"Tools run in single-threaded mode without warm-up."

Ithought it was 2022. I have a 12 core machine and my next machine will probably have 22 cores.

But I'm amazed, transpiling 636975 lines in <1 second is nice.

[Edit] What I do not understand is "Sucrase does not check your code for errors." So it's not a type checker? Or does it check type errors? Why would I used it for Typescript when the reason to use TS is to add types to JS to prevent errors?

Is this more like Rust check for continous work and then use tsc from time to time to check for errors?


Like any transpiler, Sucrase can be run in parallel by having the build system send different files to different threads/processes. Sucrase itself it more of a primitive, just a plain function from input code to output code.

> What I do not understand is "Sucrase does not check your code for errors." So it's not a type checker?

That's correct, Sucrase, swc, esbuild, and Babel are all just transpilers that transform TypeScript syntax into plain JavaScript (plus other transformations). The usual way you set things up is to use the transpiler to run and build your TS code, and you separately run type checking using the official TypeScript package.


SWC is written in rust, and Babel is written in javascript, so you've proven the OP's point. Vite can be configured to use SWC.

The recent conversation was about Vercel benchmarking a tuned Turbopack+SWC vs. a default Vite+Babel, not really an apples to apples comparison. When Vite is configured to use SWC as a compiler, Vite's HMR gets faster; but it's not the default for compatibility reasons.


Sucrase is faster or really close to SWC (see rhe benchmarks https://github.com/alangpierce/sucrase). Everyone still uses Babel because of the transforms.

And yes, Babel can also be made faster if enough effort is dedicated into it. it's not an impossible feat.


For single core benchmarks only, correct?


thats pretty harsh. I dont even like TypeScript/JavaScript, but these are some pretty good numbers:

> The compiler is now 10-25% faster. `tsc` is now 30% faster to start. Our npm package is now 43% smaller. More improvements are in the works.


500k lines changed, oof. Imagine the merge conflicts.

Could this not have been done incrementally?


This is as incremental as it really could be; the entire build had to change, all of the code needed to be unindented one level, etc.

I have tested the merge conflict problem, and thanks to the way the PR is constructed (in steps), git actually does a good job figuring things out.


I'm curious on how to make such a change in such a large and living code base.

Did you consider _not_ re-indenting the entire code base and rely on the auto formatters of contributing people? Did you consider re-formatting parts of the code step-by-step, beginning with code that's not frequently changed?

Didn't know about the feature that ignores commits in git-blame, thank you for that.


No formatter would support having code indented like that at the top level for no reason, so that's not an option. Though, I have been looking into getting us to use a formatter, period (right now we don't).

I didn't consider doing it step by step, no, but I'm not sure how I would really achieve that effectively. The reality is that the bulk of PRs are submitted by the team, and it's straightforward to get everyone to get their code in before this change, pause merges, rebase, and then continue as normal. I'd rather go for the "rip the band-aid off" method.


> pause merges

That's what I was worried about, but I guess you can't have merge conflicts if you freeze the code.

Suppose you couldn't freeze the code, then what would you have done?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: