Hacker News new | past | comments | ask | show | jobs | submit login

> you would have the same problem as in Erlang

Ah but that's not how fork-join works - you fork multiple jobs, and then you must join them all at the same time - you can't join just one.

You have to do something like

    (a, b) = join



If you have to join all the the jobs at the same time (which is pretty inflexible), how is the ordering of the results determined? My exposure to this model was in the perl threads module (and the forks module which offers the same api with os forking instead), where you join on a specific thread id, so you can easily enforce ordering by first joining a, and then joining b; I assumed a join with no parameters would join a thread that's ready and/or wait for the first to become ready, because that seems like the most useful/basic interface, and anything more specific (like join all the threads, or join the threads in some order) could be added as needed in the context where it's needed.


> If you have to join all the the jobs at the same time (which is pretty inflexible), how is the ordering of the results determined?

The model is that you can start a sequence of jobs to run in parallel, and then you have to wait for them all to finish. You get the results in the same order as the jobs you created. The order can't vary.

Think about a diamond shape - one job create two more jobs, and then they both send their results to the original job which cannot continue until all child jobs are finished.

> because that seems like the most useful/basic interface

Yes useful and basic, but the problem is it makes it easy to cause race conditions, which is where this thread started! You think you'll get some some thread being ready first so you write code assuming that without even thinking about it, and then once in a trillion you actually get the other result first. Yes, it's a programmer bug, but the point is because it's non-deterministic they may not notice until the one time it actually matters and someone dies.


Fork-join is not really a concurrency model, I don't understand why you are trying to push it in every message. But if you insist to treat it like a concurrency model..

> The model is that you can start a sequence of jobs to run in parallel, and then you have to wait for them all to finish.

Yes, that's the model.

> You get the results in the same order as the jobs you created.

No, you don't get the results in the same order. Jobs still finish in random order and store results before synchronization happens. Synchronization happens on join after that. And instead of relying on order you specify exactly from where you are getting the result of each individual job. So, if you have to specify that, why do you need an order then? Oh, you don't need it and you don't get to have it. It's not Erlang, where you can actually have a deterministic order and can wait for messages in any order you want, while it will reorder them for you.


> No, you don't get the results in the same order.

You and I just don't see to be on the same page about what fork-join is, so we probably aren't going to agree on this.

> It's not Erlang, where you can actually have a deterministic order

Can, but my point is you can also not have a deterministic order, which is how Erlang programs can end up being racy, which is the problem with them if you are trying to solve the original problems of threads.


So, if you want this inflexible model, you could easily build it as a library in Erlang, just as you're clearly using someone's inflexible fork-join model library (or an inflexible language implementation); in the mean time you're missing out on things like fork A, B, C, A expected to run much longer than the other two, when B and C return, fork D using the results from B and C , finally wait for A and D to return. Or fork A, B, C and merge the first two that return, then merge that one and the third. Or fork these ten jobs, but only run four of them at any given time (resource constraints). My first example you could probably structure into your model with an extra fork; the second example won't fit in your model, the third fits if you fork four workers and add a shared queuing mechanism, but that feels more complex.

These kind of techniques are key to using parallelism to reduce latency. Always having to wait for everyone to finish at each step makes for a lot of waiting.


Fork-join is not what you think it is then. It's just a behavior, uncommon outside of the shared memory programming. And there is no order of results. There can be, if it is implemented on top of message passing. But then writing code that relies on order instead of specific names actually becomes error prone.

And lack of order of messages is still not a race condition.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: