That’s an interesting discrepancy, I might have had the same gut reaction. But you’re using that assumption to cast slippery-slope doubt on the whole project without knowing anything specific. It’s possible you’re right, but maybe find out which algorithms are being used first? Perhaps there is a reason that TypeScript actually is an order of magnitude slower than JavaScript. The Benchmark Game project was specifically set up to allow people to scrutinize the algorithms used and make fair comparisons, so it might be unwise to start with the assumption that the most basic aspect of the project failed spectacularly and that something really obviously stupid and easy to fix is happening. I am at least giving benefit of the doubt and wondering what might be wrong with TypeScript. It’s not implausible since other languages take just as long.
You can write literally any JavaScript program with TypeScript (falling back to the `any` type if you really need to), so this doesn't really work in this case.
Also, typical TypeScript programs are faster than typical JavaScript programs, because JavaScript JITs like predictable object shapes and monomorphic functions for the same reasons that other language implementations require them. Libraries like lodash and bluebird take advantage of this fact without using TypeScript, but TypeScript steers you towards these patterns.
>typical TypeScript programs are faster than typical JavaScript programs
Person who works on js engines here :)
You would think this is the case, but in actuality, js and ts are about on par in performance (assuming similarly written code). This is in fact due to the fact engines optimize for idiomatic js patterns, not idiomatic typescript patterns. often these will align, but in some cases (usually revolving around generics and inheritance) well written js will actually fall through the optimization pipeline faster due to following more patterns that have specific optimization checks.
Any articles/resources on writing high-performance JS that you would recommend? I don't use JS for work, but just for random fun stuff. So just curious to learn more about it.
Check out gl-matrix.js, that’s one designed for being fast. It’s a simple math library, but the main way it achieves fast is by not allocating memory.
After having worked on JS for some large web apps that need good graphics performance, the two rules of thumb in my head for making JS fast are: 1- avoid dynamic memory allocation, and 2- avoid using the functional primitives like map.
The first one is more or less true in all languages, memory allocation always costs a lot, and if you can pre-allocate memory and/or re-use memory along the way, the code will run faster. This means paying attention to what things in JavaScript will allocate memory under the hood, use of dicts, use of 3rd party library and framework functions, etc.
The second one is a bummer; I love using the functional primitives. But map is slower than a for loop, all else being equal. It has gotten relatively faster over time. I mostly use functional everywhere, and only resort to for loops in performance critical code.
> 2- avoid using the functional primitives like map
I understand that map requires creating a new array, and that is already included in point 1. What overhead are functional primitives subject to apart from memory allocation? e.g. forEach
Using continuations. The function call itself and the local scope wrapped up into it can be varying degrees of expensive. When little state, because it’s inline or nearby for example, it might be optimized down to near what a for loop does. But if there’s a function call at all once it’s executed, that alone is slower than the for loop. Remember it’s a function call per element. If the function is further away with more scope, it can be more expensive both memory wise and time wise.
BTW, it's easy to test the basic primitives. I use Chrome snippets.
test = (name, fn) => {
const timeLimitMs = 1000
let start = Date.now(), count = 0
while (Date.now() - start < timeLimitMs) { fn(); count++ }
console.log(name, count)
}
var N = 1000000
let a = new Array(N)
test('for loop', _=> { for (var i = 0; i < N; i++) a[i] = i })
test('map', _=> { a = a.map((x,i) => i) })
test('forEach', _=> { a.forEach((v,i,a) => a[i] = i) })
for loop 941
map 37
forEach 69
This is on my Mac in Chrome. So forEach is faster than map, but for loop is more than 10x faster than forEach. That's for loops with trivial work, of course. If the inside of the loop is expensive, the loop/map ratio will be lower.
If I modify that code to pre-initialise an array with the values 0 to N - 1, and then copy the value from that array to a new array (rather than using the loop index), then both map and forEach are faster than the for loop for me.
That sounds fairly surprising, considering map has to allocate, and allocate is very expensive, but please share your code & I’ll try it.
Like so?
const N = 1000000
let a = new Array(N), b = new Array(N)
for (let i = 0; i < N; i++) a[i] = i
test('copy loop', _=> { for (var i = 0; i < N; i++) b[i] = a[i] })
test('copy map', _=> { b = a.map((x,i) => a[i]) })
test('copy forEach', _=> { a.forEach((v,i,a) => b[i] = a[i]) })
I get: copy loop 973, copy map 38, copy forEach 49. Same as before, but this time I tried Chrome in Windows.
Are you using a different browser? I know that sometimes other browsers have very different results.
In any case, it's somewhat irrelevant if there are cases that optimize and cases that don't. When idiomatic functional code is sometimes up to 30x slower than a for loop, it can't be used in perf critical sections. Even if it's only Chrome and only certain cases. The forEach perf needs to be always reliably performant before I can use it without worry.
My best guess: probably written by someone that doeant know how to write performance TS. I say this with no skin in the game. I write neither JS nor TS. What I have observed in language benchmarks over the years, is that the benchmarks are rarely written by an expert, but usually by someone with cursory knowledge of the language. E.g. just enough to be dangerous.
Often times, these sorts of benchmarks are done with prejudice (not necessarily malice). The benchmarks are written by someone with something to prove: my chosen tech stack performs better, and let me show you why. A favorite of mine is Perl vs Python comparisons, where you see an idiomatic Perl implementation vs a non idiomatic Python implementation (or other way around). Typically in a head-to-head comparison, the benchmarks are developed by the same individual whom likely has above average knowledge in their favorite and below average in the target they're trying to show as inferior.
You'll see this time and time again in internet benchmarks comparing performance. Unless you can see the code from all benchmarks involved, my suggestion is to avoid them. I mean, for all I know, the author of the benchmark was unaware of the built in sort and instead bubble sorted.
This is unfortunately pure speculation on top of pure speculation, which is the problem I have with the top comment. You’re assuming incompetence when you could just go look it up. Why assume it’s someone who doesn’t know? Why use that to wander off into rant land about prejudices and make broad claims that internet benchmarks are bad, when you admit to having zero idea what the actual specific problem here is?
The test that lowered TypeScript’s score in the paper is called fannkuch-redux, and here are the sources in question:
They are both contributed by the same person, and there is no bubble sort involved. So now you know.
I don’t see an obvious reason one would be slower, but they’re also quite different. Maybe the algorithmic complexity is different. Maybe the cross-compilation is doing something bad with memory allocation. Note the input sizes for this test are very small, it would be easy for a difference in temporary variables the compiler injects to cause a serious problem.
What is not obvious is any prejudice, malice, or incompetence.
> Why use that to wander off into rant land about prejudices and make broad claims that internet benchmarks are bad
What are you talking about? Where did that happen?
> when you admit to having zero idea what the actual specific problem here is?
This is the comment section for a submission about an article referencing the paper. I brought it up for discussion. It is perfectly valid to bring up a question that you don't know the answer to.
> What is not obvious is any prejudice, malice, or incompetence.
Please stop.
Edit: From another comment, and some deeper digging of my own from that, you might find the archived results of the fannkuch-redux interesting. From 2017-08-01[1] to 2017-09-18[2], the benchmark changed from a running time of 1,204.93 second to a running time of 131.39 seconds. The paper was released in October 2017.
> What are you talking about? Where did that happen?
I was responding directly to @hermitdev. Did you get your threads crossed? What I'm talking about happened immediately above in the parent comment, beginning with "Often times, these sorts of benchmarks are done with prejudice" https://news.ycombinator.com/item?id=19527057
"You'll see this time and time again in internet benchmarks comparing performance."
> It is perfectly valid to bring up a question that you don't know the answer to.
I agree. It's a bummer that's not really what happened here.
>> What is not obvious is any prejudice, malice, or incompetence.
> Please stop.
The parent comment explicitly stated an assumption of both incompetence and prejudice and I responded directly to that.
From the HN guidelines: "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."
If you'd like me not to call out speculation, then please assume good faith and don't speculate next time.
> From 2017-08-01[1] to 2017-09-18[2], the benchmark changed from a running time of 1,204.93 second to a running time of 131.39 seconds.
Yes! Now we are getting somewhere. It appears that would change the outcome of the paper. Perhaps it was a mistake. That might mean it was nothing more than an oversight that already got fixed. It doesn't mean there is any other coloring of the study at all, nor that there was any intention or agenda to make TypeScript look bad, right?
> The parent comment explicitly stated an assumption of both incompetence and prejudice and I responded directly to that.
Perhaps I misinterpreted what you said. You started the paragraph referring to the top level comment, which is me. I took the "you're" in "You’re assuming incompetence when you could just go look it up." to be a general "you", and commentary on my original comment.
> From the HN guidelines: "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."
I actually looked this up before the GP comment, and almost included it myself. I can see now that you were implicating the comment you replied to. I didn't think that was the case, because I apparently didn't interpret that comment remotely in the same way you did.
> Perhaps it was a mistake. That might mean it was nothing more than an oversight that already got fixed. It doesn't mean there is any other coloring of the study at all, nor that there was any intention or agenda to make TypeScript look bad, right?
I never implied it was. For that matter, I didn't really interpret the comment in question as stating that either. The more charitable interpretation is not that they are trying to make another language look bad, but that they are trying to make their favorite language look good. That doesn't require purposefully tanking one benchmark, it just requires them to be much better versed in optimizing one language than another and a lack of awareness about this. As they say, never attribute to malice what can be explained by incompetence. In fact, if you read the comment carefully, they even call out to this with the "not necessarily malice" remark.
> The more charitable interpretation is not that they are trying to make another language look bad, but that they are trying to make their favorite language look good.
The project doesn’t talk about favorites or seem to want to make certain languages look good. Jumping to the conclusion that bias is involved isn’t the good faith interpretation, even if you state with a positive sounding framing. The good faith interpretation is to take the stated project goals at face value, and assume that the participants have done a good job.
> The project doesn’t talk about favorites or seem to want to make certain languages look good. Jumping to the conclusion that bias is involved isn’t the good faith interpretation
I didn't see anywhere that the comment in question called any project bias into question, but instead noted that in a situation where work is crowd sourced, people with their own intentions and motivations will put out bad benchmarks, either in the case of the benchmarks game, or a specific benchmark or comparison put forth in an article or blog. I've personally been witness to the latter multiple times just from HN submissions.
I just want to end with, as someone that's brought up viewing comments in an uncharitable light, you seem to have done a lot of that in this discussion. You've repeatedly taken your interpretation of a comment, rephrased it in a harsher way, and the stated it as what the other person was saying as fact, and then responded to that. I would think actually trying to find a charitable interpretation should at least include a question at the beginning to confirm whether what you think is being said is entirely correct. Note that I started with that when I thought you were attributing statements to me that I did not say. My first words were a solicitation "What are you talking about? Where did that happen?" to confirm what was going on. You've been doing this from your first response to my top level commend, when you stated "But you’re using that assumption to cast slippery-slope doubt on the whole project without knowing anything specific." That's a very uncharitable rephrasing of what you think I was doing, and it certainly wasn't my intention. I've already outlines in specific exactly what I was trying to do and why, and in doing so I also stated that I felt you were misinterpreting me. There's a clear trend here as I see it, and you repeatedly bringing up good faith assumptions just puts it into clear highlight.
I think we've covered about all there is to say on this (these) topics. I'll let you have to the last word if you wish. I'll read and promise to consider any points you raise, but I don't think me responding would be very fruitful, and this discussion has digressed far enough.
> Do you agree that those very different times were measurements of the same TypeScript fannkuch-redux program?
Yes.
> How should we now assess your "suspiciously like entirely different algorithms were used in each implementation" comment?
The suspicion was incorrect. That's why it was presented as a suspicion, not as fact. I have no reason to defend it if it's incorrect, but I still defend that it was valid to raise questions, given the facts on the ground. We've now shown there was something that changed very drastically at that time, and while it's less likely it's the benchmarks themselves (unless one or both of those are fairly out of date Node versions)[1], it still points towards something to be aware of in the results presented. Namely, they rely on a lot of underlying assumptions which should be looked at if you care about the numbers.
1: Also, I imagine the V8 devs probably considered the performance of TypeScript in that case to be a bug, given how horrible the performance regression from JavaScript is and that it's still javaScript running. It's possible that TypeScript was doing something really odd, but given the exposure and Microsoft's backing and developer time, I think that's a less likely scenario than some optimization that should have been triggered was missing, which happens quite often.
Please add a correction to your original comment, to prevent readers from being misled. (If it's closed to edits, I'm sure HN staff will open it when you ask).
> I still defend that it was valid to raise questions
Of course, it's valid to question a measurement that looks strange but your comment went further than that -- your comment, without evidence, assumed a cause; and, without evidence, implied that assumed cause led to widespread problems with the analysis.
> Please add a correction to your original comment, to prevent readers from being misled.
Corrections are for facts. I put forth a theory. People being misled by a theory are not something I have limited power to affect. People representing theories read on the internet as fact have larger problems that that will solve.
This discussion is the correction, and a better one than someone would be willing to read. Were it within the 2 hour edit window, I would through in an edit, I've done so numerous times in the past. I will ask Hn to amend it's rules so I can correct a statement I made about something I suspected.
> Of course, it's valid to question a measurement that looks strange but your comment went further than that -- your comment, without evidence, assumed a cause
This is incorrect. I had evidence, I had numbers that did not line up with my understanding of how things should have been given my knowledge of the subject. I presented that as a theory, by using the word "suspect". All I implied is that if that theory was correct, which I made sure to not assert as fact, then it might affect some other languages. I did not assume a cause, I assumed a possible cause, and presented it as such.
I am very particular with my language. I try not to state things as fact when they are not. I try my absolute hardest (and I believe I succeed) to always speak in good faith, where I'm trying to raise a point I think is worthwhile or ask a question where I think there is benefit. I'm actually rather bothered by how some people interpreted my words and intentions, and that includes you. I'm bothered by how you've interpreted my words. Since you're not the only one (although I do believe you're in the minority), I'll assume there's something I could have done better to represent my point. I don't think all the blame lays with me though. There should be some way for me to posit a question and advance a theory without people assuming bad faith, so my question to you is, what way is that? How could I have expressed concern over the results without triggering that interpretation from you? Because I don't think doing personal research on a problem is an acceptable prerequisite for raising a question. In this case, I could have spent hours looking into something I was unfamiliar with and come away with more answers, but many people may not have the knowledge to do so but have enough to think something is wrong. Should they just keep their mouths shut? Are we in a time where raising a concern that turns out to be unfounded (or in this case, just more complicated and slightly misdirected) is unacceptable under any circumstance? I refuse to accept that.
I thought the benchmark game is set up so every language's advocates can tune their language's programs. The only chance at a fair comparison is if every language gets the best implementation it can find for the challenges. There's nothing else that approaches fair comparison of apples and oranges.
> you’re using that assumption to cast slippery-slope doubt on the whole project without knowing anything specific.
No! I think this is a very useful project and analysis. I just think that some languages might have extremely optimized versions (or possibly more likely, some languages don't quite yet have that extremely optimized version that has propagated throughout the others) and that might be affecting specific languages in the analysis.
I think the first 5-10 entries are probably very accurate, as they are generally with very performance centric and often optimized for performance languages. As the languages and VMs/interpreters do more and are optimized less, it's much easier to miss a performance difference caused by an benchmark submission and attribute it to an inherent aspect of the language.
> I am at least giving benefit of the doubt and wondering what might be wrong with TypeScript.
I did no such thing. Note how I used the phrases "Theoretically, I would assume" and "That looks suspiciously like". I simply raised an issue of concern, in a way where it was obvious that I did not know if my concern was correct, and wondered that if it was, what else it might affect.
> I did not know if my concern was correct, and wondered if it was, what else it might affect.
That’s exactly what I mean by casting doubt. If the concern might not be correct, why lead into speculation about further concerns?
Let’s find out what the actual reason that TypeScript is measured slower, rather than guess as what else could be wrong if unverified theoretical assumptions might be correct.
> rather than guess as what else could be wrong if unverified theoretical assumptions might be correct.
You mean, I should not have explored the ramifications of what my suspicions might mean to so people might think it's worth actually looking into? What's wrong with that?
I can't help but feel that you feel compelled to defend your original position that I believe was based on a misinterpreting my point and intention.
I saw what I believed might be a problem. I noted it. I noted why. I noted what it might mean to the analysis because if my suspicion as to the reason was correct, it might not be isolated and other items might need a closer look. I did so in a way where I was sure not to claim something as factual when I wasn't certain. What part of that do you think is an inaccurate assessment of what I did or was unwarranted?
Edit: Changed misconstruing to misinterpreting, as that's what I was trying to express, and misconstruing might be interpreted as a purposeful action, which is not what I was trying to express.
You seem not to have considered the possibility that the authors may have simply made a mistake, unrelated to the origin of the programs.
The authors presented at an Oct 2017 conference. Archived benchmarks game web-pages from 2017 do not show the 10x fannkuch-redux differences that the authors report --
You saw what you believed might be a problem. You did not investigate whether or not it was a problem. Instead you speculated about what it might mean to the analysis.
The part that was unwarranted was the negative speculation.
Many benchmarks have no TS implementations, TS/JS results are about the same except for fannkuch-redux which is about a zillion times slower in the TS implementation. When looking at that kind of massive discrepancy in similar languages with the same runtime, the guess kbenson made was a perfectly sensible one. The authors of the study should have examined that kind of crazy outlier more closely and it's, as pointed out, not that unusual when using the benchmark game as a starting point.
I’ve noticed similar discrepancies between Perl performance in the real world and in the suite they are using.
The issue is that one of the metrics in the suite is lines of code, so people write fantastically obscure and concise functional programs in Perl when the imperative one would be 2x the LOC, but much, much faster.
(This is from a spot check years ago. Maybe they’ve fixed this somehow).