Hacker News new | past | comments | ask | show | jobs | submit | bnoordhuis's comments login

It's in the works: https://code.google.com/p/v8/issues/detail?id=811

You can tell by the number of CLs that a lot of work has been done already but there are a few more still in progress. It's probably still a few weeks out.

Disclaimer: I don't speak for the V8 team, I'm just an interested onlooker.


I've been going back and forth on this.

On the one hand, I feel that positive discrimination is just as bad as the other kind and that its proponents are not thinking things through as clearly as they should.

On the other hand, board members are humans too. Given the choice, and whether consciously or unconsciously, they will typically elect people much like themselves, because that's the kind of person they can understand and relate to. It's an understandable human trait but it does perpetuate the status quo.

I do wish the focus was less on gender and more on diversity in general. We have sizable ethnic minorities in the Netherlands but they are quite underrepresented in upper management and I'm not sure it can be fully explained by the socioeconomic and education gap.


> Given the choice, and whether consciously or unconsciously, they will typically elect people much like themselves, because that's the kind of person they can understand and relate to.

The issue is at what point does the need for diversity kick in? If I start a startup right now with a friend, a person who is much like myself, very few people will criticize us for diversity issues at 2 people. But if we continue to hire friends, when does it become a diversity issue that needs addressing?


I think a good moment to start thinking about this is when you are legally required to have an Ondernemingsraad [1]. Which is at 50 employees.

It's an arbitrary threshold, but seems like a good point in the life of a company where it transitions from a small company to a larger one, where you need to start making various informal policies and company customs and values more formal.

[1] http://nl.wikipedia.org/wiki/Ondernemingsraad_%28Nederland%2...


Interesting results, thanks for sharing. I can perhaps shed some light on the performance differences.

> Buffer 4.259 5.006

In v0.10, buffers are sliced off from big chunks of pre-allocated memory. It makes allocating buffers a little cheaper but because each buffer maintains a back pointer to the backing memory, that memory isn't reclaimed until the last buffer is garbage collected.

Buffers in node.js v0.11 and io.js v1.x instead own their memory. It reduces peak memory (because memory is no longer allocated in big chunks) and removes a whole class of accidental memory leaks.

That said, the fact that it's sometimes slower is definitely something to look into.

> Typed-Array 4.944 11.555

Typed arrays in v0.10 are a homegrown and non-conforming implementation.

Node.js v0.11 and io.js v1.x use V8's native typed arrays, which are indeed slower at this point. I know the V8 people are working on them, it's probably just a matter of time - although more eyeballs certainly won't hurt.

> Regular Array 40.416 7.359

Full credit goes to the V8 team for that one. :-)


I can explain what happened to Array case. 100000 used to be the threshold at which new Array(N) or arr.length = N started to return a dictionary backed array. Not anymore: this was changed by https://codereview.chromium.org/397593008 - now new Array(100001) returns fast elements array.

I will check out what happened to Buffer/TypedArray. Should not degrade that much unless something really goes south here.


Ok reporting back. There are two issues here.

The first major one is related to mortality of TypedArray's maps (aka hidden classes). When typed array stored in the Data variable is GCed and there are no other Uint8Array in the heap then its hidden class is GCed too. This also causes GC to find and discard all optimized code that is specialized for Uint8Array's and clear all type feedback related to Uint8Array's from inline caches. When we later come and reoptimize - optimizing compiler thinks that cleared type feedback means we need to emit a generic access through the IC (there is reasoning behind that) because this is potentially going to be a polymorphic access anyways. I have filed the issue[1] for the root cause (mortality of typed array's hidden class).

Now there is a second much smaller issue (which also explains performance of the Buffer case) - apparently there were some changes in the optimization thresholds and OSR heuristics. After these changes we hit OSR at a different moment: e.g. I can see that we hit inner loop one that loops over `j` instead of hitting outer loop which leads to better code. In V8 OSR is implemented in a way that tries to produce optimized code that is both suitable for OSR and as a normal function code - this is done by adding a special OSR entry block that jumps to the preheader of the selected loop we are targeting with the OSR. This allows V8 to reuse the same optimized code without optimizing it again for the normal entry - but this also leads to code quality issues if OSR does not hit the outer loop because OSR entry block inhibits code motion. This is a know problem and there are plans to fix it. The hit usually is quite small unless you have very tight nested loops (like in this case).

Disabling OSR (--nouse-osr) not only "solves" the second issue but it also partially fixes (hides)the first issue: 1) we no longer optimize with partial type feedback - so we never emit generic keyed access but always specialize it for the typed array 2) we no longer emit OSR entry - hence no code quality issues related to it.

[1] https://code.google.com/p/v8/issues/detail?id=3824


Very interesting. After reading your comment, I tried allocating another Uint8Array and keeping it allocated throughout the entire test as a workaround for the issue you mentioned. Time for Node.js was unchanged, but io.js was down to about 5.5s now. Almost the same time as Node. Only about 10% slower.

The same happens when I use the --nouse-osr parameter that you mentioned.


Is it 10% slower even if you keep array alive and apply --nouse-osr (to both node.js and io.js)?

On my machine results are fluctuating within the same ballpark (though I am on Linux and benchmarking 64-bit builds).


Ok, I hadn't tested with both before. Keeping the array alive and using --nouse-osr makes io.js only 2.3% slower than my original measurement for Node 0.10.35. Median of 5058ms.

And Node 0.10.35 shows basically the same results as before. I see less than 1% difference. Maybe just random fluctuation. Even if not. 1% is irrelevant.


I just posted a follow-up blogpost, comparing Node 0.11.15 and io.js 1.0.3 which were both released yesterday.

In that post I also benchmarked the various fixes for the typed-array slowdown you mentioned. BTW --nouse-osr makes all three tests run faster.

http://geekregator.com/2015-01-21-node_js_0_11_15_and_io_js_...


Thanks for the update.

I posted this reply on your site, but I will duplicate it here for the sake of HN readers:

> BTW --nouse-osr makes all three tests run faster.

As I tried to explain above: OSR at it is implemented now impacts code quality depending on which loop OSR hits. Which in turn depends on heuristics that V8 uses. These heuristics are slightly different in newer V8. As a result of these changes V8 hits inner loop instead of outer loop. This leads to worse code.

Code that benefits from OSR is the code that contains a loop which a) can be well optimized b) runs long b) is run only few times in total. The Sieve benchmark is opposite of this and as a result it doesn't benefit from OSR - you get bigger penalty from producing worse code and no benefit from optimizing slightly earlier.

Not using OSR for Sieve also hides the other issue with mortality of typed array's hidden classes. I say "hides" not "fixes" because one can easily construct a benchmark where the mortality would still be an observable performance issue even if benchmark itself is run without an OSR: https://gist.github.com/mraleph/2942a14ef2a480e2a7a9


Does the dramatic speed difference between the "non-conforming" implementation and V8 mean that current Node typed arrays are not memory-safe and you may get C-style buffer overflow vulnerabilities when using them?


"Non-conforming" only means they didn't completely adhere to the ES specification. There should be no possibility of buffer overflow.


FWIW I also did a test of Node:master and that performance was within 2% of what I measured for io.js.

Interesting background about typed-arrays. I didn't know that. Thanks!


> FWIW I also did a test of Node:master and that performance was within 2% of what I measured for io.js.

It would have been a good thing to include that comment to the article as well.


I thought about that. But it would have diminished the point I was trying to make: Always test with different versions as performance may differ by a LOT.


Well, if the point is that they are different then I see what you're saying but in actual fact the point seems to be they're almost equal.


He was talking about differences between point versions. 0.10 has dramatically different performance to 0.11. io.js is using 0.11, as is node:master, but older node was using 0.10.

I.e. the difference isn't necessarily node vs io, it's one point release of V8 to the next as used by node and io.


Supremely bad title then.


Yes I think it needs to be noted that V8 in node 0.10 is very very far behind when you take into account how quick the pace of development is. I would be interested to see these comparisons with bleeding edge node vs stable node.


Thanks for the details! It makes a lot of sense.

Now I wonder how node 0.11.x compares to iojs :)


Libuv author here. Libuv doesn't try to be all things to all people - its main users are Node.js and Rust - but if you have suggestions on how to improve the API or the implementation, please file issues[1] or join us in #libuv on irc.freenode.org. We welcome outside input.

As a bit of history, the reason why uv_loop_new() mallocs memory for the struct (and it's something of an anomaly in that respect, most other API functions don't) is that the thing that came before libuv, libev, worked like that. It's something we can change if there is demand for it.

[1] https://github.com/joyent/libuv/issues


And Julia! :)


thanks. great reply. :)

tbh, i was angling for 'my criticism isn't great because i am a specialist in a specialist field'.

(also, it does look like a genuinely useful library for most use cases - im just lazy and want everyone else to do my job for me :P)


On and off kernel hacker here. Compile and test the release candidates Linus puts out every fortnight. When the inevitable breakage happens, use `git bisect` to track down the offending commit.

It's tedious and time consuming but you will make people very happy. The only way to test something as big and critical as the kernel is by having users actually run it.


That's a regression we'll probably have fixed in 0.6.1.

Note that I/O read performance is up 35%, from 19.7 mb/s in 0.4 to 26.6 mb/s in 0.6.


Any numbers for node on ARM? I was told that recently v8 releases have had significant performance regressions for ARM devices by a JS dev at a meetup.


Can you give more information about this?

If you know that JS dev personally please ask him to file the bug (http://code.google.com/p/v8/issues/entry) or send any information he has about regressions to v8-users mailing list.

I don't think there were any bugs opened recently about perf regressions on ARM devices.

Thanks.


Is there a ticket to follow to track this regression?


Is there a description of your benchmark machines / benchmark process anywhere?

Keep up the great work, btw! I'm pretty excited to move up to Node 0.6 (probably-statistically-insignificant regression on IO-write regardless)!


I suspect that it's a jab at ALSA which, for reasons unknown, ships with a full LISP interpreter[1] that it doesn't actually use...

Lennart Poetering is the author of PulseAudio. He probably has a bone or two to pick with ALSA.

[1] http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/precise/...


Whoa. That file contained the absolutely most naive string hash function I've ever seen:

    static int get_string_hash(const char *s)
    {
	int val = 0;

	if (s == NULL)
	    return val;

        while (*s)
	    val += *s++;

        return val & ALISP_OBJ_PAIR_HASH_MASK;
    }
It seems to cleverly avoid everything that's supposed to go into a string hash function, and makes me quite wary of the rest of the file's code quality. I stopped reading after checking that this function is indeed used (it is).

Also, the file itself was last modified on 2011-10-18, which seems to be pretty recent for something that's not used. Perhaps they're gearing up to start using it, and if that's going to involve any kind of even remotely performance-sensitive, I hope they replace the hash function first. :)


I'm curios, what is (under normal circumstances) supposed to go into a string hash function?

The biggest problem I see with this hash function is that it will produce very many collisions (because most common strings will probably produce only small numbers).

I have seen other very simple string hash functions which just take the 4 first bytes (or 4 last or some sort of other pattern) of the string and use them (interpreted as 32bit integer) as the hash.


I'm no expert in hash function design, but one thing that seems bad is that it will generate colliding hashes for all permutations of the same characters. Also, it will not "spread" the bits very well, due to just using addition. Most "real" hash functions tend to multiply the hash by each new character, thus causing it to "churn" more.


That's probably still better than the string hash in really early versions of Java, which only looked at a relatively small number of characters at the start of the string.


> Isn't Node.js single threaded ?

Yes (for now, we might take up V8 isolates).

> Would it not under-perform , say compared to Erlang or Netty, in a multi-core CPU.

No. You can spin up multiple processes and handle the load with (for example) cluster[1].

[1] http://learnboost.github.com/cluster/


> No. You can spin up multiple processes and handle the load with (for example) cluster[1].

Which is what I speculate they're doing, as they say they're running just four instances. Probably, they have servers capable of running 4 threads at the same time.


It's been fixed in master. 0.5.2 comes out tomorrow (if all goes well) and will also contain the fix.

https://github.com/joyent/libuv/issues/112 for details.


Thank you - that's great news. I'll keep an eye out for it.


linux doesn't have kqueue. But kqueue is only an API, you could emulate it with io_submit() and io_getevents().

A bigger obstacle is that not all file systems support asynchronous I/O, the io_*() syscalls won't help you there.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: