Hacker News new | past | comments | ask | show | jobs | submit login

> The Go function `XgeoToH3()` allocates a TLS then calls the same functions

And when author 'batches' the thread local storage he changes the code from thread-safe (C version) to not thread safe. The transpiled code is littered with TLS alloc and free calls to emulate stack variables that must be matched in order and that use the heap for temporary storage rather than the stack, with different cache access patterns that can very much affect tiny benchmarks. Contrary to what the author supposed about the performance, the fast Go version can't even be run over multiple threads.

This is analogous to converting locks to NOPs and improving performance on a single-threaded microbenchmark, but doing so isn't an appropriate comparison even though the code may look identical except for one or two lines.




> Contrary to what the author supposed about the performance, the fast Go version can't even be run over multiple threads.

Since go deliberately fully hides the os threads, this TLS emulation would be per green thread (per go-routine).

I don't see why the go code could not spin up say N goroutines, each one with one NewBatch() call to initialize the simulated TLS for that green thread, followed by repeatedly calling XgeoToH3() (which should be cleaning up its TLS usage to be back to 0 bytes used upon return).

Needing to call NewBatch one per thread changes the API a little from the cgo API, but I don't think it is otherwise not thread safe.


You'd allocate one TLS per thread, or have a reusable pool, if you wanted to run this multithreaded. Allocating and freeing a TLS per coord is not how the C code would run. If you created a new pthread per point in C that would be pretty slow too.


Yes, you can do this, but why didn't the code do this in the first place? Because you also want an API that's not annoying and difficult to use.

It's a real drawback to this transpiler that it takes thread-safe, purely functional code and turns it into code with side effects that have to be carefully managed. Probably due to Go not having a fixed in place stack guarantee.


Well, it's not purely functional code. It calls malloc and free. And if you want a multithreaded malloc, you're going to need per thread state.


There are many parts of the code that are functional in C and have side effects in the transpiled version.

Look ultimately what we have is a C version and cgo version that are roughly the same speed and a transpiled version that is 1/6th the speed - and the caller can be in any thread in any of those. Then there's a different API where the caller has to manage storage that's on par with the first two, but that's not the same thing. If you jump through these hoops you can be on par is a different claim from what the blog author made.

Now it's possible that the wrapper functions could be made fast by storing the TLS object in a Go thread local storage so the API is the same, but the author didn't do this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: