Hacker News new | past | comments | ask | show | jobs | submit login
Napa.js: A multi-threaded JavaScript runtime (github.com/microsoft)
419 points by rimher on Oct 18, 2017 | hide | past | favorite | 204 comments



This looks great, but it's important to bear in mind the architecture you intend to run on. I recently made an application blazingly fast by - among other things - parallelization using the node cluster module. On my 4-core laptop it flies. Imagine my surprise when I deployed to the cloud environment and found the typical virtual server in our cluster has only a single CPU core. The worker threads just sit around waiting for their turn to run, one at a time. On the other hand, the platform for serverless functions has 8 cores. At a minumum, before you jump into multi-threading, know what `require('os').cpus()` tells you on your target platform.


Great reminder of just how expensive of cloud compute cycles are.

Imagine buying a new desktop computer, not the most expensive but a good performing one, and setting it up at home to serve some kind of cloud services. With its cpu fully utilized 24x7, I bet buying equivalent compute at AWS would be crazy expensive per month.

Of course there are many reasons most people don’t use desktop systems as home servers, but I bet there are a few scenarios where it could payoff.

One example might be a bootstrapped startup tight on cash flow, with cpu biased workload, so ISP bandwidth and local disk throughput didn’t bottleneck before the compute did. And it couldn’t be a mission critical service, something where some maintenance windows wouldn’t kill your reputation.

Finally you’d need a way to really minimize admin/devops costs. That kind of work doesn’t take many hours to kill your savings not to mention opportunity costs.


Hetzner basically uses cheap desktop computers as dedicated cloud computing servers. Just keep it backed up. If anything catches on fire, they replace it. Cheap hardware mitigated by excellent customer service. If you can deal with that, the prices are great. I've used them for years for my personal dedicated server, and not had any problems. The current EX41 line starts at 39 EUR / month. You have to buy the "flexi pack" if you want luxury services like extra ip addresses.

https://news.ycombinator.com/item?id=4063929

https://www.hetzner.com/dedicated-rootserver/matrix-ex

https://www.hetzner.com/flexipack/


Extra IP addresses don't require Flexipack, maybe subnets do.


I use a laptop for my CI server. 8 cores and 16gb ram, hides in my basement. Would easily be $400 a month for those specs.


An m4.2xlarge on AWS EC2 provides 8 cores and 32 gig of RAM for 40 cents/hour. That'd be $288/month. You could also get an m4.xlarge with 4 cores and 16 gig for $144/month.

Of course, that doesn't include storage and bandwidth.

I mean, sure. I've got a 5 year old laptop that will outperform the t2.micro I'm pay $8.35/month for. But I don't trust my home internet to be stable or fast enough. Not to mention that my primary usage is an IRC bouncer, so I need it to not be on my home internet connection so some script kiddie doesn't DDoS me after I ban them from a channel because they were spamming racial slurs. Yes, that has actually happened.


In the US you can't really get a reliable network connection to your residence. The entire shift towards the cloud is in no small part due to crappy internet. The large ISPs really missed the boat on this.


Depends how reliable you need.

I average probably a single 5 minute hiccup each month. That's 99.988% uptime. For someone wanting to run their WoW guild's voice chat server, or just a toy server, or a development/staging environment, that's plenty.

But I mean, my home internet is only 35 mbps anyways via Frontier FIOS. I can get 150 mbps through Comcast, but I refuse to give that company a penny of my money. In either case, I'm not going to be running any major production servers at home anyways.


I have Time Warner, and experience about 10-15 minutes of downtime a week on my home wifi and once a year it'll go down for an entire evening. That's annoying, but fine for home wifi and maybe a hobby website, but you couldn't run a company like that.


This is not at all true for me at least. I get very reliable internet, both at home and at my office. At my office I have static IPs and am allowed to host stuff..

Now I would not host my main customer site there. But dev servers? Beta servers? QA servers? Hey why not.. save some massive bills.


Exactly... you won't host a main customer site at home. If ISPs were smarter, they would make this easier for you and provide the tools to make it extremely easy. ISPs could have been AWS or Azure, instead they preferred to be a flaky bundle of wires.


>An m4.2xlarge on AWS EC2 provides 8 cores and 32 gig of RAM for 40 cents/hour. That'd be $288/month

used ThinkPad W540 with same config goes for ~1.5 months of your AWS rent.


And it even comes with a "free" UPS :)

Laptops are surprisingly good as little dev servers. In fact, you can find ones with broken screens for even cheaper, which is fantastic!

Few hundred bucks can get you a nice i5 or i7 processor


Which platform for serverless functions has 8 cores? I have a CPU intensive data deployment script (Node.js) that takes 12 hours on a single thread but can be chopped up to take advantage of more cores. Our build server on ec2 has 2 cores so it's about 6 hours. It would be great to know if we could push the job into serverless and get it done a lot faster.


Time limits on serverless functions are 9 minutes on GCE and even less on AWS, so for long-running stateful tasks they're probably not suitable, unless you can divide the work into serial as well as parallel subsets.


Thanks for the reply. The task is not stateful. It's just pulling data from a db, performing some transforms, and then pushing it into elasticsearch. I suppose that I could slice the tasks up to be arbitrarily small though having a longer window would be helpful. I guess I will take a look at GCE. Thank you!


FWIW I think Azure's Functions are 10 minutes and I hear they're talking about upping it again.


Not serverless, but we host our build servers on Scaleway. 8 cores and more importantly 32GB for 25$/m. Can't complain. Clean up is done by the ci agents from VSTS.

Project is a React client and express backend. It's built and tested dockerized. We use testcafe and Chrome headless, so more memory is always useful for parallel builds.


Different ec2 server types have different numbers of cores. You might be able to just change the server type to get some easy performance boost.


Just use EC2's r4.16xlarge instance.


How can a "serverless function" have cores if it doesn't have servers and "it is just code" :/


Because "server" started taking on the meaning of "configurable box" to people who were frustrated with configuration, so "serverless" means "unprovisioned/unconfigurable" machines.

Now if we started talking about "computeless" architecture I'll be confused. (Though maybe that'll be the trendy name for serverless data sources/sinks in a few years...)


> so "serverless" means "unprovisioned/unconfigurable" machines

I am pretty sure in English serverless means no server and "unprovisioned/unconfigurable" machines means you didn't provision them and you cannot configure them. Even in analogical sense this makes no sense. Something i could relate to is something like "Pay as you use" or "configurationless servers".

But that is just me, and if you think it is ok to randomly change the meaning of words that means me personally and randomly don't need to accept your new meaning (not giving out, just trying to explain my rationale)

Downvote all you want, but please do point out where i am wrong.


> please do point out where i am wrong

Because speech (and writing) can be figurative, not just literal. And because the term has reached wide adoption (at least, in the subset interested in discussing such things) and so not using the term makes conversation difficult and litigating the issue every time it's discussed adds no value to the discussion whatsoever.


In a very literal sense, you're not wrong. But the concept of a serverless architecture (as of common parlance today in 2017) has a lot of nuances which are hard to convey in any single word.

So eventually people picked one. Today, the most common are "serverless architecture", "FaaS", or simply "Lambda" (borrowing from AWS).

You don't have to do anything. But it's simply a fact that many people know what you're talking about if you say the word "serverless". And that's what language is, a (kinda) agreed upon set of words which let you communicate with other people. If everyone but you understands a word, and you are crusading that they change it to something else, what is the point?

If you're interested, the concept of "prescriptivism" may be enlightening.


My point is that i was not aware what "serverless" meant when i first came upon it and the word itself did not convey any meaning without a lot of context as it does not lend it self to analogies such as "server", "container" or even "cloud" for that matter.


True. I regularly encounter words whose meanings in whatever context I don't understand. Usually, if a word sounds confusing or used in the wrong context, a quick Google search will clarify what I was missing. Sure its a bit irritating when words are recycled to have a different meaning, but its rarely an issue in practice, in my experience and opinion.


I get you. Neither was I. But at a certain level, words are arbitrary and imperfect anyway. In your own example, "cloud" is often regarded as a bad term, because it tends to abstract away too much that it's just "somebody else's computer", as it's often defined.

Lots of discussions have been had on how many people think "the cloud" is a sorta-magical thing, which "is just there". Just more recently some interesting aspects of "using the cloud" have been more thoroughly discussed (e.g. the jurisdiction it's hosted in, data breaches, etc). If the concept was described in a less abstract way, would these discussions have happened sooner? Later? Would it have become less of a "buzzword" amongst executives?

So, is "cloud" really a better term than "serverless"?


The ship sailed a while ago for common nomenclature for “function in the cloud” services. You can’t really fault someone for using the term the way the entire industry uses it.

For what it’s worth, I don’t think calling them “configuration-less” or “pay as you go” servers is any more accurate. You’re really just buying processing time.


I'm also pretty sure most cloud systems contain very few condensed water droplets. The containers I work with are not 20 or 40 foot long, and while I ship stuff with their help, no actual ships are involved.

Could there have been a better word than serverless? Probably, but that is the one that is currently used for that general kind of architecture. I would have called that PaaS before, and sometimes still do.


I explicitly stated that by analogy it does not make sense, while containers contain something and works as analogy. But you chose to disregard that part of the answer.


The "serverless" in this context means that they may as well be invisible to you, since you don't need to care about them. In the case of Lambda, you upload your code somehow Amazon runs it on a spare server somewhere.

Language is all about context. The meaning of words changes depending on it, even in plain English settings.


You're not wrong. "Pay as you use" doesn't quite capture it, and "configurationless servers" isn't quite buzz-wordy (eg. short and memorable) to make marketable.

Wide spread terminology suffers from the evolutionary pressures of marketing. Only the catchiest, most marketable terms propagate.


> please do point out where i am wrong.

In reality, words are often used in ways that don't necessarily meet the dictionary definition in the strictest sense.

For example, I complained to my local advertising authority that mobile providers are using the word "unlimited" to mean "limited by our fair usage policy" and I was told that this is fine as long as 95% (I don't remember the exact percentage, maybe it was 99) of customers will never reach the limit so its effectively unlimited. That's not really what the english language word means, but hey... that's life. Same thing applies here: words are recycled to have different meanings.


Its a bad example because saying something is unlimited when its not is kind of bullshit not just semantics.

There are situations when I think its best to just go with the flow of how people are commonly using words but simply accepting what others say in a blanket sense is not right either. In the case of "serverless" it doesnt really matter much to me, but if you think of the word "gyp" for example thats something many people have had to make a conceited effort to stop using. So in some cases, with effort we can improve the language we use and not always be swimming against the tide.


"Serverless" means "I can run code without having to maintain my own server or execution environment". Using the term "serverless" for that might be confusing and non-literal, but that's how we use it, and the term has stuck. Arguing about that several years ago may have made sense, but that ship sailed long ago.


You're not wrong. I usually call them "application-less" architectures. That's really what's going on here.


I get it, I do. But that ship sailed loooong ago


While you're at it, Don Quixote, please convince X-Windows to stop abusing the words client and server, too.


Though the functions are serverless in the sense that you needn't allocate virtual infrastructure, the platform obviously utilizes CPU resources to execute your function. That platform underlying the serverless function is, as it happens, multi-core in GCE.


> Though the functions are serverless in the sense that you needn't allocate virtual infrastructure, the platform obviously utilizes CPU resources to execute your function.

So it does allocate virtual infrastructure?


Someone does, but not you. Obviously, there's a server somewhere which runs your code. But since you don't deal with it directly, it's mostly as if there was no server - since it was abstracted away form you so you could focus on what you really want to do.

If you were to go down deeper, a server is just an electrical machine which shuffles eletrons around. So, you could say "there's no point in talking about 'servers', if we're just using transistors when you really think about it".

But it would convey no useful information if someone asked you "what are you using to run your service?" and you replied "well, I just move electrons around", would it?


No, i am not advocating reductionism, server has a analogical meaning as it serves something. Processing has analogical meaning as we do processes something, serverless implies there are no servers involved.

As far as managing goes, most of the server are not managed by you anyway. But yeah i get it, everyone likes it so...


On the management side, I'm curious what you mean by "most of the server are not managed by you anyway". In my experience, unless you're in a very big company, you are either managing or at least minding the server, or someone within earshot range is.

In that scenario, not having to even consider if the error you're getting is because your server is restarting/out-of-memory/needing-update/broken-by-a-coworker like 99% of the time is as close to "serverless" as it gets, in terms of day-to-day work activities and worries.


The distinguishing factor between multi-processing on one machine and multi-"machining" is latency of communication.

Likewise, the distinguising factor between multi-threading and multi-processing is shared memory, i.e. again, the speed of communication.

Multi-threading does well for some problems, but often, multi-machining or multi-processing is sufficient, which is why so many runtimes don't really do multi-threading: Node.JS, Python, Ruby.


Very true, and we've seen better performance from multi-process than multi-threading in some high-traffic (10Gbps) systems (those were C++ however, not node). It was puzzling, but I put it down to the OS scheduler; I imagine that a single multi-threaded process would be pre-empted in the typical fashion, but that multiple single-threaded processes would, in aggregate, spend less time in a pre-empted state.


Yes, multi-threading has high overhead to provide the illusion of parallel operation, but when all the cores are saturated you are in the same boat, whether you have 1 or 100.

The benefit to programs that don't use threading and use event loop and shared nothing multi-process is that they don't have the overhead when things are maxed out.

This is why virtually every high performance server (nginx, redis, memcached, etc) is written this way and things like varnish (thread per request) are multiples or orders of magnitude slower.

Funny people criticizing nodejs for using the same architecture that all the best-in-class products use.


Context switching is expensive and very tempting to forget about in multi-threaded development. You eliminate that overhead constraining the execution code to a single thread context.


Note that Python does do multithreading - just the actual Python interpreter will not interpret Python code from two threads at once. If threads are busy running code that doesn't require the interpreter, whether it is CPU bound work or not, then as many threads can run as you have cores. In most CPU bound applications, the performance critical parts are not pure Python - they're calling out to some extension code such as numpy or cython or something. Such code runs in parallel in CPython if you start multiple threads.

This might have been what you meant when you said the runtime isn't really multi-threaded - but since the CPython ecosystem rests so heavily on C code, in practice multithreading is a good solution a lot of the time.


Right. And Ruby also has a GIL.

Node.JS, Python, and Ruby are in C/C++ so you can do multithreading in all three, if you are willing to write native modules.


Yes. Python gets all the heat for the GIL, but most similar languages have something with more or less the same effect.


If you're careful you can release the GIL when writing Cython code.

Better than C/C++ sometimes.


This is also a painpoint for me, I have an app that doesn't use much memory, but can use multiple cores/threads.. but most providers sell you on RAM, not CPU.


Also many providers charge by CPU usage, so it behooves customers to spec machines as lightly as possible and fall back on auto-scaling to spin up additional, equally lightly-spec'd VMs, on demand. Rarely are there many idle cores just sitting around.


good point. I’ve used pm2 to manage node apps and like how clear it makes what’s running on what.


An alternative approach to concurrency is the one that Erlang (and your operating system) take: memory-isolated processes, preemptively multi-tasked, where neither IO nor computation in one process can adversely affect the others because the scheduler prevents it.

I wrote this up: https://news.ycombinator.com/item?id=15499629


Ryan Dahl wanted Node apps to use a multi-process model. Hence the name. The vision wasn’t as evolved as Erlang, but conceptually closer than what Node looks like today.

“I believe this to be a basis for designing very large distributed programs. The “nodes” need to be organized: given a communication protocol, told how to connect to each other.” ttps://www.americaninno.com/boston/node-js-interview-4-questions-with-creator-ryan-dahl/


You can get rid of the need to multi threading by deploying more containers in the same machine or via orchestration.

I don't understand why people keep insisting that the lack of multi threading support is a Javascript problem when there are better and more scalable ways of using your machine resources.


> You can get rid of the need to multi threading by deploying more containers in the same machine or via orchestration.

What if you have a large shared in-memory data structure that you want to update with lots of irregular translations in parallel? Like many graph problems? How are you going to do that with multiple containers? As in industry we just don't understand how to distribute that kind of problem effectively.


Shared data structure among multiple threads... this sounds utterly fimilar and evil! Redis is single-threaded, probably one of fastest,has data different structures, can handle high loads, code is easy to reason, something that just works.

One of the reasons Node is successful is the simplicity of single threaded code. Way easier to reason, I would question the usage of Node if you are doing something CPU bound with it. You can use golang or C# with tasks for that.


Think outside of web workloads! Not everyone is writing a web app.

Think about something like Delaunay triangulation or mesh refinement. These are critical path bottlenecks for a great many applications and in practice very parallel, but they're irregular so we cannot easily distribute the data structure. The best results we have are for shared memory thread models. We don't know how to do it any other way!


That's why the post you just replied to suggested that Node may not be the right tool for those kind of tasks.


The problem is that I see a lot of projects written in languages with single-threaded runtimes (python more often than node) that become difficult/expensive to scale and extend down the road. I loathe the idea of a rewrite, but sometimes the initial language choice lacked forethought to the point where it makes sense to rewrite in something that actually can make use of all of a machine's processing resources.

Things like greenlet and gevent (and likely napa.js) are band-aids over the underlying problem.


For these workloads, I would consider a compiled language with great parallelism/concurrency. e.g Rust or GoLang.


So just because shared memory is hard you are ok with sacrificing performance and replacing memory access with io hops? That sounds like an overkill and not suitable for every task.


I like node.js and use it very often (had my first package reach over 100 stars, woohoo :) ) but I don't understand why it needs to be suitable for every task.

If you really want to do something creative with the shared memory, I guess you could do that in a "native module" written in c++ or even Rust[1].

I'm not saying that it's not doable with JS, it's just that it's already been done (as in, has a solution that works).

[1]: https://github.com/neon-bindings/neon


Why should i learn a new language for that? It's good to have as many options as possible in js and you take the one that fits you best.


Because JS isn't good at everything just like C++ and Rust aren't good at everything.

Right tool for the job.


But if you take that to an extreme, you end up with a hundred tools. I think it's good to have the option to do parallel computing in JS, for those times when it's worth the tradeoff versus having to adopt a completely new language/platform.


Sure, anything taken to an extreme is bad. I wasn't suggesting to do that.

I think if there is a sensible use-case for parallel computing in JS, it would be good to have. However, trying to make a solution before we have a (clear) problem is foolish.

I'm not saying there isn't already a use-case, but I haven't seen one that isn't already covered by languages better suited to solving those problems (e.g. Rust).

Edit to give a different example: parallel computing in JS is like trying to write a web framework in Rust. Sure, you can do it, but Node is already better suited to doing that. At best, you're making a worse version of something that already exists.


> parallel computing in JS is like trying to write a web framework in Rust. Sure, you can do it, but Node is already better suited to doing that. At best, you're making a worse version of something that already exists.

I agree with you, but my point is it's not black and white. For a sufficiently small or simple project, it might make sense to write a web backend in Rust, or do parallel computing in JS, if the cost of learning a new platform outweighs the cost of using the "wrong tool".

In most circumstances, yes, you probably shouldn't use Node.js for parallel computing tasks, just like you shouldn't use C++ for web development, but for some use cases it might be useful. And maybe those use cases don't exist (I don't have much experience in this area, so I don't know), but I just don't like when blanket statements like "use the right tool for the job" dismiss the work other people have done. Surely if Microsoft created this, they have a use case in mind for it?


We have been down this road before. When you have a lot of options "for those times when" you get a lot of abuse. All good frameworks remove choice to prevent a spiraling string of fuckups by people who don't understand what is going on behind their code. For the few of us who do know what is going on it is not a problem but you have to consider all of the code monkeys who are going to be using a given framework without supervision. What would happen if we made everyone program enterprise CRUD applications in C++ from scratch....unmitigated chaos that would lead the business to disavow technology and go back to paper filing cabinets.


You call it "abuse", I call it resourcefulness. I get where you're coming from, but as someone who is not very comfortable with lower-level languages like C++ and Rust, it's great to have the option of whipping together a parallel program in a language/platform I know, rather than having to invest in learning a new one. For longer-term projects, that investment is generally worth it, but I don't think it's useful to be dogmatic about "the right tool for the job".

> What would happen if we made everyone program enterprise CRUD applications in C++ from scratch

I don't think that's a good analogy. A better analogy would be if we started writing everything in JS/Ruby/Python instead of using lower-level languages where performance matters. Except, this is regularly done with great success by many, many companies, so I don't think that helps your point. Sure, you may have to eventually port it to a more performant platform when you hit massive scale, but that point may also never come.


Dealing with threads and shared memory properly is way harder than getting acquainted with modern C++ or Java.

Most cases, when you need to parallelize and multiprocess doesn't cut it, you'll likely need to go to the C++ module route anyway.


When all you have is a hammer...


... everything looks like a nail!


>Why should i learn a new language for that?

You make it sound like it was difficult to learn. Underneath, C++, Java, Pascal, C#, Javascript and Python, have many similarities and jumping from one of those languages to another in the list is very easy; compared, for example, to something like jumping from any of those languages to Forth, PROLOG, SQL, ML, Haskell, or Lisp.

Some of them are also really similar syntactically, for example this group: [C, C++, Java, C#]; or this other group: [Pascal, Algol, Go], so even the syntax doesn't get in the way when jumping from one to other.

Thus, usually, software engineers do know more than one language and they apply what better suits the program.


Because languages are tools, and you should learn to use more than one tool. If you know JavaScript, you basically already know most C based languages syntactically, it's very little effort to at least learn one of them for tasks JS isn't suited for.


But what about dealing with ffi, build dependencies and toolchains? Sounds like it is just shifting complexity in to a different place, not actually solving it.


Have you read the docs? Just go read this https://github.com/Microsoft/napajs/blob/master/docs/api/mem... and tell me you don't feel uncomfortable. This is the kind of baggage you are bound to get with such solutions and once you end up writing a steaming pile of code, then you need a thread safe logging and debugging story.


> Shared data structure among multiple threads... this sounds utterly fimilar and evil!

This seems like a bit of a FUD.

With multiple threads and shared data, you don't necessarily have to share all the data structures with all other data structures and all the threads. You can setup your things such that minimum or nothing is shared. That's (also) what access control and immutaibility is for in programming languages, apart from other features.

Of course, different languages support these features in different ways, I don't want to get into the specifics, but in pretty much all mainstream languages you can create a similar share-nothing or share-almost-nothing design and it's not even hard, it might even be easier.

I really don't understand modern web/JS developers. They seem to ignore traditional solutions and/or proclaim them as evil, and then they go on to employ a 'new' solution that is 3× as complex, performs 5× worse and requires 10× as many dependencies/tools/frameworks/etc. Why? I suspect there's a LOT of largely irrational fear of concepts and languages that are unfamiliar. "Fear driven developement" in fashionable lingo.

TL;DR you don't need to be scared of threads, you just need to be scared of threading architectures that share too much.


>I really don't understand modern web/JS developers. They seem to ignore traditional solutions and/or proclaim them as evil, and then they go on to employ a 'new' solution that is 3× as complex, performs 5× worse and requires 10× as many dependencies/tools/frameworks/etc.

It is, perhaps, because a significant amount of Node.js developers came from front-end-only development, thus unfamiliar with the traditional approaches (in this case, using threads). An example is the many cases in which a document store as MongoDB is (wrongly) used for data that is mostly relational.

Simply put, they never were taught the traditional approaches first.


Basically your argument boils down to "it's easier to write single-threaded code than multi-threaded". Well no shit, but the benefit is in many cases colossal, so I'd say that's not a good argument to dismiss this complaint.


> Redis is single-threaded, probably one of fastest,has data different structures, can handle high loads, code is easy to reason, something that just works.

Redis probably isn't a great example here. I've worked on projects where a single Redis instance was not enough (would easily peg its single CPU to 100% and have query latency in the multi-second range). In the end, sharding the data among several Redis instances was successful, but also brought its own problems. The ideal is that we just have languages, runtimes, data stores, etc. that abstract these details away from us so we can focus on our application logic, not on how to make it faster.


To be fair though, one of Redis's biggest weaknesses is its single threaded nature in instances where you, e.g., have huge sets and need to compute expensive set intersections/etc...

Redis also might not be the best choice if thats your primary use case...but still.


Once upon a time nobody seriously thought JavaScript would ever play any role outside of the browser, and even the role in the browser was small enough many people preferred to disable it.

Then we got a very fast JIT, and suddenly you could do reasonable compute heavy stuff very fast, and then it became viable to also write the server side in JS, because of programmer efficiency and library reuse and other reasons.

The "right tool for the job" can seriously change when tools improve and develop, and just because there already are other tools for the same job should not stop anybody from trying.

I can not think of a better example for that than JavaScript.


Do it in C++ because CPU bound performance is terrible in JS anyways


That's reasonable, but I'm refuting the claim that 'you can get rid of the need to multi threading by deploying more containers in the same machine or via orchestration', not asserting that JS is the right language in the first place.


Yeah, I agree that isn't always the right choice. If all you need is horizontal scalability, then more processes is fine, but it won't work when you actually need multiple cores working on the same task


> What if you have a large shared in-memory data structure that you want to update with lots of irregular translations in parallel

Use C++.

If you ever decide to scale and distribute it for real save the state in a database and orchestrate containers. For this solution i would recommend using Node.js or Go.


My point is not that JS is the correct language for all tasks.

My point was that this isn't true:

> You can get rid of the need to multi threading by deploying more containers in the same machine or via orchestration.

If you have a large shared memory data structure and irregular updates then this approach won't work, no matter what language you are using. If you say use a database instead, well then the database just has to solve exactly the same problem, and they'll use shared memory parallelism as well.

You can punt the problem further down the stack, but some, somewhere at some point is going to need to solve the problem, and they're going to use shared memory parallelism to do it.


A container per thread sounds like the least efficient solution possible.


Assuming everything you do is non-blocking, this effectively means a container per core.

Not ideal, but not too bad either.


And if it isn't non-blocking? What if I'm writing a game and want to run parts of the rendering, AI, physics, etc. in parallel? Do I create a bunch of containers for each subsystem of my game and have them all communicate via IPC? This doesn't sound like a recipe for good memory/performance characteristics.


But we’re talking about javascript here, a language that is async about almost everything.

I completely agree with what you’re saying for those other problems you describe. But for the typical node.js webapp, doing the one-container-per-core is perfectly fine.


Its not up to me to know how to manage processes or threads. That's the SO task, not mine.

My app should be stateless and scale by replication. This is the most efficient solution in every possible way.


Sounds like your app should be.... functional.


A thread is just a process that shares memory; a container is just a process that has a different network/filesystem/etc than the rest of the processes.

Containers don't cost meaningfully more than threads unless you create expensive unique resources for each one.


Network/Filesystem/Memory are not expensive resources? It's a lot of overhead. If you're going to claim that you can share memory between containers than arguably you don't have multiple containers but rather a single one.

This is way more overhead than threads which can share all of those resources.


Network namespaces, virtual ethernet interfaces, iptables rules, & union filesystems are all very cheap and have little to no overhead for normal use cases. N processes in 1 container isn't a perf win over N processes in N containers.

Shared process memory isn't the easy memory-consumption win it sounds like, locking is hard to get right, potentially very destructive to the parallel performance that was the point of the whole exercise, and marries you to a single physical box.

Even if you want to take advantage of shared-address-space shared memory you probably want to do it in a more principled way than fork()

One-copy-per-thread and share-by-communicating both give you braindead simple scaling without dealing with that.


This is actually very similar to your idea. Each worker is executed in a separate v8 instance and napa provides a way to communicate between workers. A bit more efficient since you dont carry container runtime around.


So, Perl ithreads then?

It works for specific types of work, but not necessarily as a low cost abstraction unless you pre-thread. In other words, it works well for some cases where threads are used, and horribly for others.


Why would you need containers to run multiple instances?


I think this has two major downsides:

1. Setting up containers and the communication between them is really complex - it also only makes sense for a deployed, long-running, server process. 2. Communication between containers is very expensive. You're never going to beat an in-process pointer to shared memory.

Web workers would make a lot more sense for most javascript programs.


1 - How is REST complex?

2 - That is only true if the your bottleneck is on communication somehow. Usually it is not.

And a huge advantage: You will write stateless and easy to scale apps that do not care how the machine resources are being handled.


You need to run multiple applications side-by-side and they need to be able to talk to each other. With containers that means figuring out the network layer and service discovery.

It also means accounting for each system failing in your own app, retries with exponential backoff, timeouts, logging errors, circuit breakers, plus all the exotic ways a network layer can fail - if your RPC protocol has arbitrary limits (message size, timeouts, etc)

You will probably want to use kubernetes with istio, not raw docker. All very do-able, but definitely not simple.

I agree that services make sense, but there's a level between single-threaded and micro-services where having concurrency within your application is useful.


I tested this with docker and could not observe a big performance-penalty on the cpu usage. Of corse you need more memory and the biggest thing is that you have to build your application in a way that you can do this later.


I will circle back in a year or so to see if this 'sticks' -- very few Microsoft open source projects provide enough value / demonstrate enough inertia this early in their lifecycle to justify any investment on my part.

https://github.com/MicrosoftArchive/redis/issues/556 (Jun-Sep 2017)

>Why do MS always do this??? Start something, announce it aloud "we are now open source, we are now this and that blah blah" then quietly do a 360 and moonwalk away


No doubt, but this API is something to behold. Either it's going to make it, or Facebook is going to ape it and that version will make it. I mean they probably use at least some variant of this for their multi-threaded flow-bin. I'll be happy either way, tbh.



Perhaps a 180? :-)


To be clear for non-Michael-Jackson-fans: 360 + moonwalk works out the same (by sliding away backwards).

https://en.wikipedia.org/wiki/Moonwalk_%28dance%29

>moves backwards while seemingly walking forwards

It's actually a near-perfect analogy here, where actions speak louder than words re: Microsoft's commitment to maintaining adaptations of existing open source projects. I think the example provided is enough to trigger a careful analysis before jumping in, and I would love to see counter-examples to help balance the evaluation.

(Please note: very specifically requesting counter-examples of Microsoft-official, intended for production, open source repositories demonstrating long-term maintenance of tweaks of/dependencies on established open source projects that for whatever reason were never up-streamed.)


In the case of Microsoft specifically it's a silly meme that started due to the Xbox 360.

'Why do they call it the Xbox 360? Because when you see it you do a 360 and walk away.'


ah, cool, thanks! I didn't get it!


My reaction exactly, but it's actually a joke.

> do a 360 and moonwalk away


It made me think of this Jason Kidd quote: "We're going to turn this team around 360 degrees"


Came here for this :)



They did a 360 with xbox


How do you know if they are still using it internally? It could have wide adoption inside Microsoft.

They built this to scratch an itch, then they opened the source, but that's still not enough?


People in the Microsoft camp still have to adapt to the Open Source mentality.

If the maintainers of a project don't manage the project properly and other people think they can do a better job, Open Source gives you the right to fork. And this social contract is an incentive for the maintainers to do a good job in order to not lose control.

On the other hand the maintainers have no contractual obligation to keep supporting the project.

When maintainers stop supporting the project, the code is still there to be picked up if there's enough interest. That's powerful, but also distributes the responsibility to all interested parties.

If no new maintainers show up to fork the project, then maybe it's OK for the project to die.


sounds like any other piece of software except that with it being open, someone can fork it. Software dies sometimes. That's kind of life. But it is better for them to go this new track than to push out closed items that people legitimately stake their livelihoods on and then kill it off when something else comes along.

When Windows Phone 7 was the new thing, I went to one of the Microsoft dev camps for it. I vividly remember someone in the audience, standing in the walkway berating the guy on stage talking about Windows Phone 7 development because he had been focused on some Microsoft technology (Silverlight? WPF? I can't remember) and they had basically relegated it to the past by switching over to this new framework. At the time, I was kind of just mind blown, but in retrospect I can see his disappointment. If it were open source, either another entity could champion it or it would still go in the heap of bygone frameworks, but at least then it stood a chance.

And I'm fine with Microsoft playing around with something, open sourcing it, and then they decide there is something else out there they want to work with. Mostly in cases like this where it isn't a "product" they're attempting to market. Seems to be an okay process and perhaps someone can look at what they left in their wake to gain some kind of insight from it.


> How do you know if they are still using it internally? It could have wide adoption inside Microsoft.

> They built this to scratch an itch, then they opened the source, but that's still not enough?

I'm honestly not sure where this question is coming from. The issue I linked is on a port of Redis to Windows done back when Microsoft Open Technologies was temporarily spun out as a subsidiary, and is one of many there reflecting frustration at Microsoft's mismanagement of the project, especially lack of communication regarding long-term support. It serves as a recent example of how Microsoft drops a low-priority open source project.

Microsoft is building a strong track record with open source projects that they completely control, but less so when they don't. I believe this is relevant here because of Napa.js's tight coupling with Node.js.


The entire goal for the fork was to try to convince upstream (antirez) to eventually merge the needed changes for a better experience for Windows users. Just as any other distro doesn't want to long term maintain a fork and prefers to merge changes upstream as much as it possibly can. In the case of redis, the fork didn't entirely meet upstream needs/considerations and the Windows distro didn't have the resources to continue a long term fork.

Microsoft Open Technologies had some successes here too: off the top of my head NodeJS and Git both have gotten much better on Windows as a result of work that Microsoft Open Technologies started in forks and eventually got merged upstream. The NAPI work to support both V8 and ChakraCore continues at a reasonable pace precisely because there is upstream engagement, upstream merging, and NodeJS-ChakraCore is less of a full fork more like a distro-specific build flag.

Plus, the priority on things like Redis for Windows shifted with the Windows Subsystem for Linux; there was less need to get open source projects to treat Windows as a supported distro directly when Windows can piggy back off of Ubuntu (or SUSE or Fedora) distro work.

None of that solves lack of communication about the forks left on the vine, and it is a shame there isn't strong redis support on Windows, but everyone has priorities, including antirez, and even if those priorities aren't communicated fully, they seem to at least be somewhat transparent to this humble developer from the outside of either work.


I'd like to see something come out of their OpenSSH fork to support AD and PSRemoting/WinRM.


So what is your prediction on the long-term viability of Napa.js? Also, any feedback on the timeline would be appreciated, specifically how long to wait before evaluating the probability of long-term support from Microsoft.

In my mind, it will have to attract a strong enough community to survive without Microsoft's financial support which will probably shut down in a year or two. It's interesting because if it does hit critical mass that increases the likelihood that it will continue to be funded.


That's a good question. Ignoring past headaches you've had with Microsoft's support of open source, how do you evaluate any open source tool for use in production and/or for long-term viability?

In the NodeJS world, sometimes "long-term" means 9 months it feels like. In open source, sometimes "support" means "file a PR if you care so much about that bug" or "fork it yourself". Are you perhaps expecting a longer term or more support because it is Microsoft behind it?

It looks actively maintained right now, but its published semver is 0.x. There are a lot of 0.x libraries in active use in NodeJS, but it's still semver-obvious grain of salt from the maintainers to keep in mind when considering it for support.

The README tells us that it was directly built to support a production need in Bing today. That seems like as big of a vote of support confidence as you might get from an open source project that a team has vested interest in it.

On the other hand, Bing isn't inside Microsoft's developer division, so they have fewer vested interests in supporting outside developers long term. It's also possible that at some point in the future they get internally sold on a developer division or Windows division or Azure alternative that Microsoft has financial or marketing reasons to commercialize.

If I had a production need for something like Napa.js I don't see any particular red flags to avoid it, it looks like it should be easy enough to migrate to something else down the road if necessary, and would probably consider it.


Looks almost the same as something using fork and message passing, which also scales over several machines. The examples should show use cases that use shared memory or examples where it wouln't be possible to use worker processes.


V8 doesn't support (real) forking, unfortunately



This is far from the first attempt to do bring multithreading/parallelism into JS. A (failed) example:

https://en.wikipedia.org/wiki/River_Trail_(JavaScript_engine...

Another attempt (although a different approach) is here : https://github.com/tc39/ecmascript_sharedmem

Hopefully, Microsoft will learn from the errors of others. Interestingly, napa is build on V8, not on MS own javascript engine.



Also, there was JXCore.


Nice, but:

1. It is exposed as a nodejs module, but some other modules may be at conflict because they may wrongly assume that they are the only thread running. E.g., some modules may be using global variables in C++.

2. Still, it doesn't support fast immutable communication (structural sharing) to other threads through shared memory.


What would it take to get number two to become a reality? Some sort of special module? This is the one main drawback I've found doing "production stuff" with Node.

Does Java offer cross-thread shared memory? Go, Haskell, Python, etc.?


Java does shared memory by default and even has concurrency related concepts as fundamentals of the language (Object.wait(), etc.) .

Python supports threading but is not very useful due to the infamous GIL. However, multiprocessing is usually a decent alternative and it supports shared memory on forking (copy on write, though). Also you can use mmap easily for IPC.


One of the great practical examples of this repository is to serve as a counter-argument against people, who complain that NodeJS is not multi-threaded.

For anything else, IMHO the use-case will be so narrow, that I wonder how MS will justify the development resources for maintaining it.


I know that the history of projects at a giant company isn't always straightforward, but it's interesting that they chose V8, even though Chakra is maintained by the same company (and supports Node APIs).


I think this could be useful but I really hope they get it right (especially the broadcast and other cross-worker communication features). Right now it looks like the broadcast feature cannot target specific workers - That's already a red flag. If they push too hard on thread locking and data consistency across workers, then it's not going to scale and it may lose all the advantages of being able to run on multiple CPU cores.


A key design goal is to make all workers within a zone symmetric, broadcasting/executing to specific workers is an anti-pattern. Please see: https://github.com/Microsoft/napajs/wiki/introduction#zone


I'm excited. My friend is working on extending rpgmaker, a game engine built on node. Say what you will about building a game engine on top of Node but this could be a very useful way to offload CPU heavy tasks from the main render thread


Any informed commentary on the pros/ cons of this vs cluster?

As far as I can see, cluster workers are all uniform (no task-specific pools) and cluster has no broadcast (just `worker.send(message)`.)


Cluster runs multiple processes, this is multi-threaded. So everything is isolated there, communication can happen only through IPC mechanisms, and high memory overhead. In case of threads its typical synchronization with of course shared memory side effects. It’s a typical thread vs process debate.


This looks very similar to the concurrency and isolation patterns of the Dart VM.

"Use Isolates for secure, concurrent apps. Spawn an isolate to run Dart functions and libraries in an isolated heap, and take advantage of multiple CPU cores." -- https://dart-lang.github.io/server/server.html


This is interesting. I see some negativity in the comments, but I think this was really a missing piece of the Node ecosystem, and I hope this solution ends up getting good adoption. On a slightly related note, I wonder if ClojureScript + Node would be the best way to take advantage of this. Has anyone here used ClojureScript with Node?


If you were wanting multithreaded Clojure, you're probably already using the JVM. It's much more mature and performant and tunable for that use case.


True! One thing I have noticed, though, is that a lot of Clojure code is blocking (database drivers, and such), so some subset of problems might be easier to write on Node. In addition, we are currently running Node in production, but not the JVM, so targeting Node with ClojureScript is an easier sell than creating a whole new deployment target.


One of the reasons npm modules are largely compatible with each other is the fact that they share the same threading model, which is what the event loop offers.

Then, this has a lot of compatibility issues, and release-wise it will be hell. I don't think this is a good idea.


This looks a lot like web workers, which Node.js does not have. Hopefully they will add it. https://github.com/nodejs/worker/issues/2


fork works fine in node.js


why down vote ? fork is fine as long as the communication between the two processes is low-bandwidth.


It works fine for many use cases but has serialization overhead.


Shared memory & low overhead sharing of expensive objects is probably the primary reason to use Napa


I don't think Napa features shared memory though, does it?


Since everything is in a single process, they can be shared via native structures. Each JS thread has its own heap, there will be a cost to transfer the shared native structures into JS object.

For complex objects, usually JSON is used thus marshall/unmarshall is needed. But for objects like UTF-8 string or ArrayBuffer, the same layout is used across JS and C++, thus at almost no cost.

Another thing is that between addon-modules, they can pass pointer of native structures (like Buffer) using 2 uint32 through JS. In this case, JS works as a binding language.


It makes you wonder why they choose V8 instead of their own JavaScript engine

https://github.com/Microsoft/ChakraCore


My personal theory is that they're trying to improve the JS engine in Electron so they can improve VSCode. The VSCode people have posted some performance complaints in the past.

I would assume this would benefit desktop JS most, so it'd undoubtedly help Electron apps.

EDIT : There's a better explanation at https://github.com/Microsoft/napajs/wiki/Why-Napa.js


What does this give you over using fork in node? The zone concept seems nice but I'm not understanding why I would use napa when node is designed to easily spawn nodes of parallel execution?


If I understand correctly, it looks like it allows some basic sharing of data without duplicating it in a bunch of heaps. It's not 100% shared memory, in that if several threads are accessing an object, that object will be duplicated in each thread, but it has a store mechanism which essentially can serve as an in-process cache that is shared across multiple threads.


That's correct. Think about a scenario that each worker needs to access a 1GB hash map. In node cluster, we have to load this 1GB map per process, but in the same process, all JS threads can access the map via an addon.


I was expecting something like this for a long time. Thanks!


I dont think JavaScript needs threads.


We should be careful not to be too dismissive. In my opinion, dismissive attitudes have hurt Python considerably in the last 10 years.

As people pushed the limits of Python's performance and were only met with answers like "use C for fast code, the GIL is probably never going to be removed", Go and Swift emerged, clearly intending to provide a Python-like development experience with performance characteristics more similar to Java, including the ability to run real threads.

Other runtime developers should pay attention and be careful not to fall into the same trap.


I on the other hand think JS does need threads, but they should, to a certain extent, be automatically managed by the event loop and transparent to the programmer. That way you get more bandwidth (workers in fact) to process your internal JS event queue.

Threads are great also when you need to do CPU intensive calculations such as processing an audio file or number crunching. That's what NapaJS seems to be geared towards in fact.


> I dont think JavaScript needs threads.

Seriously... when publishing a project like this, the authors ought to give a least a bit of insight into their motivation. Their examples include calculating Fibonocci in parallel, not exactly breakthrough.


I know it's addicting to be dismissive, but:

https://github.com/Microsoft/napajs/wiki/Why-Napa.js


Is this multi threaded Node workers that can only share read access to their parent process' pid?


Nops pure threads inside a process.


Is hiding the mutable shared memory and introducing mandatory serialization for message passing a design decision or technical limitation of v8?


Can we please stop calling things ".js" if they are not actually JavaScript files or libraries written in JavaScript meant for direct inclusion in JavaScript source? Call it NapaJS, not Napa.js.


How does it compare the (discontinued) JXCore?


JXCore is based on node codebase, while Napa.js is written from scratch with multi-threading consideration at its core. We evaluated JXCore before starting Napa.js, but found easier to re-architecture to get performance and memory sharing right.


Sounds great! Now, We can make memory efficient Electron with this kind of runtime. It will make JavaScript to be used in creating desktop application more reasonable.


Not really, multithreaded applications are not more memory-efficient.


Yes in common sense.

But for node cluster's case, if we need to load a data set for a worker to serve request, it has to be loaded on each process. While the multi-threaded model will have only one copy.


There are many, many tools for creating great desktop applications. Why would using JavaScript for this ever be reasonable?


To share the same codebase across Windows, macOS, linux, browser


In-browser is probably a bit of a stretch, but add mobile (also possible with JavaScript) while keeping performance for cross-platform GUIs:

Why I use Object Pascal | https://news.ycombinator.com/item?id=15490345 (Oct 2017,250+points,215+comments)

http://www.lazarus-ide.org/


So, instead of writing great multiplatform desktop app in, say, Qt, and writing great Web frontend, you write half-baked desktop-web hybrid that doesn't feel great anywhere. Awesome.


QT doesn't feel great anywhere either (except Linux, and there only because the UI standards are "whatever goes").

At least it's fast.


Whats wrong with QT on other platforms? If it is about the looks, it looked pretty good on Windows as far as i remember, as for speed, it is mostly down to the developer not the language.


The problem is the "pretty good" uncanny valley.


I wouldn't write something in QT nowadays, mostly for shallow reasons.. it just feels silly to have that bubbly interface running on anything past Win XP.


What "bubbly interface"? The Qt apps I'm aware of on my Win10 system look just as flat as everything native.

EDIT: found one that doesn't, but it clearly loads some weird custom stylesheets, since it doesn't look like anything else.


Atom and VS Code don't "feel great"? Yes, there are lots of shitty Electron apps out there, but don't the good ones prove that it's possible to write good ones too?


I haven't tried VS Code, but Atom definitely doesn't feel that great. Looks like a lightweight editor, feels like a big, heavy moloch.

Atom's selling point to me was always its easy hackability, and that's where Electron tremendously wins: doing hacks, software meant to be set up quick'n'dirty, personal experiments, stuff done because "why not". I'm glad that Electron (and Atom too) exists because of that. However, that's definitely not a reason to write any serious tools in it. Using any Electron app with my 8GB RAM is a nightmare, and most of them look so out of place that (from the UX perspective) I don't know why they were put outside of the browser in the first place.

[eidt] Oh, and the main thing - VS Code and Atom definitely don't share their code with any Web version running in browser. Doesn't that make the original argument moot?


vscode does share code with the browser-based editor used in MSFT sites like VS Team Services and onedrive. It was actually first released there as a browser-based editor before being adapted for use in a standalone desktop application.


I'm with you here. Electron has gotten a bad name thanks to resource hogs like Slack, but it can also be used to create amazing multi-platfotm desktop apps. VS Code is a great example of Electron done right - compared to the full VS it absolutely flies, while still being packed with features and being very extensible.


Do you run Slack on Windows by chance? I've only seen it being CPU intensive twice on macOS.


I would throw GitKracken in the pot, with apps done well using Electron.


Cos by using JS you don't need low level knowledge of how computers work and you don't have to plan too far in advance. People are not big fans of learning, specially the things they deem arcane.


That's a great reason for Electron to exist and I'm all for it. If you want to mess around, do silly stuff just because you can, it's amazing that such technologies exist and you can do magic with pretty low learning curve. Heck, I've been doing stuff in the past like complete smartphone UI with Node and Chromium, just because I could and wanted to experiment.

However, that's a really bad argument when it comes to software development of proper tools that need to be developed and maintained and actually used by people.


Yeah, it was more tongue in cheek argument, but sadly i feel like it is truth.


This is good however IMO if you need parallelism you should not use node period.


Shared memory!


Took long enough for threads to reach JS as well


Wait what? Why are they getting rid of simplicity and model that just works. Once again Microsoft managed to put on some C# programmers on JS. The first thing they thought was why doesn’t it support threads! Let’s build thread support and get a promotion!

Edit: I would question usage of Node if everything you are doing is CPU bound. I think people started down-voting me without going through docs, I mean just read this documentation page and tell me this is right https://github.com/Microsoft/napajs/blob/master/docs/api/mem...


The README expained it clearly

> As it evolves, we find it useful to complement Node.js in CPU-bound tasks, with the capability of executing JavaScript in multiple V8 isolates and communicating between them.

Not everything MS did was evil.


It's not evil it's just dumb. If your workload is CPU bound you probably shouldn't be running it in Node.js.


There are some cases on the threshold that this can be useful for.

For example, let's say your Node application's requirements changed, and now you have to transform a set of five JSON objects to one JSON object, with some big arrays, indexing, and sorting required. The operation takes 10-100ms of cpu time, which isn't nuts, but with thousands of requests per second would normally be a catastrophe. If you can spawn a thread in this case you can save the day.


You say that 10-100ms of CPU time on the hot path of your requests isn't nuts, but we'll just have to agree to disagree about that.


All I meant was "transforming and merging complex documents can really take this much time", not "stopping your whole server application for this long is okay" (that would be nuts).


That memory allocation comment does seem like the wrong way to be doing things in JavaScript.


Not like they did it first. There's been a few multi threaded JS over the years, and shared memory types are part of the standard, so /shrugs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: