That was a great read. All that linker wrangling is sure to break on the next ve...

FiloSottile · 2024-07-31T11:40:33.000000Z

That exist(ed)! c2goasm would compile C and then decompile it into Go asm.

neonsunset · 2024-07-31T09:26:41.000000Z

Imagine picking a language with terrible FFI overhead and weak compiler only to fight these two worsts aspects of it in an attempt to fix them.

C# with zero-cost FFI, none of the performance penalty and ability to statically link everything together is a strictly better choice. Saner type system and syntax too.

kgeist · 2024-07-31T09:50:13.000000Z

C#'s FFI is kind of zero-cost with blittable types (int, float). You still need to do marshalling for anything more complex (strings was a common issue on Linux, UTF8<=>UTF16), also memory pinning. Last time I checked, it also notifies the GC the thread entered native code (can't be preempted, the stack needs to be scanned conservatively etc.) After exiting a native function IIRC there's a safepoint check. I remember years ago it was common knowledge that P/Invoke is not suitable for calling hot functions in a loop, you had to create native helper functions which make all the calls in one go. Maybe it has changed?

neonsunset · 2024-07-31T09:55:20.000000Z

Memory pinning is practically free and GC does quite a bit of work to further minimize its impact on throughput, common practice is to use stack buffers or natively allocated memory for marhsalled or other data anyway (remember - free FFI, so you can always do malloc and free which marshallers do). In practice UTF-16<->UTF-8 conversion rarely shows up on flamegraph as it turns out built-in transcoding is very fast and does not leave dangling data that GC needs to clean up. It is also not as frequently needed - you can just get a byte* for free out of "hello, world"u8 and pass it without any extra operations (the binding generator will do that for you).

On top of that, complex blittable structures are first and foremost expressible in C# - you can easily have C binary tree, or an array of arrays with byte**. Performance-sensitive code that does interop heavily exploits that to give actual C-like experience.

Also short-lived FFI calls do not need to notify GC (which is also cheap it toggles a boolean), libraries that care about extra last nanosecond annotate them with `[SuppressGCTransition]` which further streamlines assembly around FFI call.

For performing FFI in a loop - the cost comes from the same reason a non-inlineable cross-compilation-unit calls are expensive within C++. Because we're talking plain call level of overhead, which is by definition as cheap as FFI gets. Of course that can still be a bottleneck if you have a small function beyond FFI boundary within a hot loop. In that case you might as well port it to C# to make it inlineable which is going to be faster, or have an FFI call for a batched operation.

Try `dotnet publish -o . -p:StripSymbols=false`ing this: https://github.com/U8String/U8String/tree/main/Examples/Inte... and then disassembling it with Ghidra. You will be positively impressed with how codegen looks - there will be simple direct calls into Rust with single bool checks for a potential GC poll after them (and compiler will merge those too after multiple consecutive calls).

aatd86 · 2024-07-31T10:01:38.000000Z

So it's actually about the same situation in Go from what I can understand.

kgeist · 2024-07-31T10:05:30.000000Z

The only significant conceptual difference between Go's and C#'s FFI mechanisms when it comes to call overhead that I can think of is the fact that Go has to switch the current stack to a special C stack. In C#, it runs on the same stack.

neonsunset · 2024-07-31T10:04:08.000000Z

No, Go FFI is so slow it makes Python look fast which has great FFI performance, just being an interpreted language hurts. It needs to perform stack switching and worker thread pinning, and for some reason even that is slow. I don't know why.

Note on tinygo as I've been put in the jail heh:

Realistically no one runs TinyGo in production as in back-end workloads or larger user-facing applications. And when you do run it, at most you match pre-existing FFI performance of .NET which ranges at 0.5-2ns at throughput (which I assume how you are testing it) depending on flags and codegen around specific arguments. Up to 50 times difference with a standard Go, that is a lot, isn't it? All custom runtime flavours in Go usually come with significant performance issues or other tradeoffs. Something I never need to deal with when I solve the same problems with .NET, maybe occasionally addressing compatibility with AOT for libraries that have not been updated if it's a desired deployment mode.

randomdata · 2024-07-31T10:43:50.000000Z

> No, Go FFI is so slow it makes Python look fast

Which Python and which Go? There are so many different implementations and different versions of those implementations that this broad statement is meaningless.

From what I have installed on my machine, CPython 3.11.6 seems to take around 80ns to call a C function, gc 1.22.0 takes around 50ns to call the same C function, and tinygo 0.32.0 only takes around 1ns!

Such benchmarking is always fraught with problems, so your milage may vary, but on my machine under this particular test Go wins in both cases, and tinygo is doing so at about the same speed as C itself.

darby_nine · 2024-07-31T15:34:24.000000Z

> CPython 3.11.6 seems to take around 80ns to call a C function, gc 1.22.0 takes around 50ns to call the same C function

How does this make any sense? C shares a runtime with CPython. What is it doing where it manages to be slower than go?

randomdata · 2024-08-01T17:32:17.000000Z

I'm not familiar enough with the internals to say for sure, but I do know Python has a penchant for dicts. Perhaps it looks up the function each time, whereas Go can be made aware of the function at compile time? That wouldn't be without cost.

Part and parcel with its love for dicts, Python data structures tend to not be shaped like C typically expects. Whereas Go data structures are, at most, lightweight wrappers around C-style structures, and are often even directly equivalent, making passing data about as simple as passing a pointer. Python may be getting bogged down in some kind of marshalling operation. That isn't cost-free either.

gen2brain · 2024-07-31T11:17:36.000000Z

Maybe it just looks slow to you, for example, check the raylib bindings benchmarks, Go is usually at the top, together with Rust, C#, etc. and Python is at the very bottom. Perhaps not the best way to benchmark FFI but it shows that Python is not even usable besides playing.

jerf · 2024-07-31T12:41:12.000000Z

In Go, once you've FFI'd, you're likely to be able to use the Go data directly in C. In Python, you generally have to crawl over the Python-based data at Python speeds, converting it into data that your C library can understand, and then when C is done, convert it back into a Python data type at, again, effectively Python speeds. Unless you're willing for it to just be opaque data that Python effectively can route, but not manipulate. (This works, but is distinctly less useful. Of course sometimes, like for image data, it's the best option.) Combined with the sibling comment that actually times calls and finds Go is faster anyhow, even if it is a microbenchmark, I don't think your argument carries water.

It sounds to me like you've latched on to some propaganda that you like and are happy to share but don't have any personal experience with. Go's FFI is relatively slow for a compiled language, but it isn't even remotely uniquely slow. Many languages have a C FFI with some sort of relatively expensive C conversion operation, and as I point out here, that actually includes Python in many, if not most, uses (it's pretty rare to want to write code to bind to C that just ships over two integers and returns another integer or something, usually we're doing something interesting). It is the languages like Rust or Zig that can do it for effectively free that are the exception, not the rule. Go's FFI costs are not all that expensive compared to programming languages in general, and if you're worried about the FFI performance, Go also generally needs FFI less in the first place than Python because it's a fast language (not the fastest by any means, but fast), and the higher proportion of Go code will generally smoke Python anyhow. Unless you foolishly write a very tight loop of C FFI code in Go, which is a bad idea in most langauges anyhow (again excepting the exceptions like Rust and Zig), Go's going to outpace a Python program that is mostly Python but uses a bit of FFI here and there by a lot anyhow.

The idea that Go is brought to its knees by a single C call that takes 100ms or something reminds me of the people who think that as soon as you use a garbage collected language you've signed up for a guaranteed 250ms stop-the-world pause every three seconds or something. Is it free? No. Is it expensive? In relative terms maybe. In absolute terms, not really. Most programs, most of the time you won't notice, and if you are in a situation where Python is even a performance option virtually by definition you're in a situation where it won't be a problem for Go.

This is not rah-rah for Go or slagging on Python. This is just stuff engineers need to know. Python is a very capable language, but you definitely pay for it. It is not free. Every greenfield project, you need to sit down and calculate the costs and pick a good tool, but you're going to make dumb, project-killing decisions if you're using costs that are multiple orders of magnitude off of reality. I've seen it kill projects, it's not just theory.

darby_nine · 2024-07-31T15:36:14.000000Z

> but it isn't even remotely uniquely slow.

Most languages share a runtime and stack model with C. Go is "unique" among popular languages in that it decided to do its own thing, which results in much slower FFI calls. Like sure, so did GHC, but most people don't expect GHC to behave like C.

TBH it's enough to put me off of the language outside of writing servers. There's just too much to draw from in terms of libc-based libraries and the drawbacks in Go are too severe to make its interesting concurrency ideas generally worth it.

EDIT: Not to mention if you have WASI as a target go is a terrible choice for the exact same reasons—it has its unique memory and stack model that don't work well with web assembly.

jerf · 2024-07-31T17:11:19.000000Z

Again, people throw around "much slower" and it comes off like it's 100ms or something. It's not that much slower, it's not even close.

It's only an issue if you're planning on making tens or hundreds of thousands of FFI calls per second, routinely. That describes a non-zero set of software people may want to write. If you are writing one of them, you need to know that. But it doesn't describe anything like a majority of software cases in general.

It is one of the things you need to know, but you need to have a correct view of the costs to make correct decisions, or, at least, not one that's off by orders of magnitude and leads to people running around claiming Go is "uniquely slow" at FFI and bragging about how much faster Python is when it turns out "uniquely slow" Go is actually faster than Python. Costs aren't a matter of feelings or what reinforces your decisions about what language to use or how much you hate that Go doesn't have sum types. Costs are what they are.

My personal favorite, and bear with me because this is going to be generally a negative for Go, is the number of databases that for some reason are getting written in Go. My rule of thumb is that Go is 2-3x slower than C/C++ (and, increasingly in that set, Rust). On the grand landscape of programming languages, this puts it distinctly towards the faster end in a general sense; there are very popular languages clocking in at "generally 40x-50x slower and also can't use more than one CPU at a time". But if you're writing a database, you're going into a market where it's virtually guaranteed that's going to be a problem. Maybe don't do that. But then deciding that you aren't going to use Go to write your command-line app to hit an HTTP API because it's not the fastest language for databases is not a correct engineering conclusion to draw.

darby_nine · 2024-08-01T12:04:37.000000Z

The idea that a feature isn't that slow if you don't use it much isn't very persuasive.

You could also flip this argument on its head. What is go good at? Writing evented servers. If you're writing one of these you'll know it.

Also, I admit I was wrong; go doesn't have uniquely bad ffi. It's just worse than C++, rust, D, jvm, .net, and any compiler lisp I've ever used. Perhaps go is uniquely well suited to writing stuff other than evented servers in way I just can't see yet.

(And FWIW I am still baffled how python managed to be this slow without needing to switch stacks.)

neonsunset · 2024-07-31T14:26:08.000000Z

Thank you for responding. Indeed, it's not quite as bad as Python in overall performance. But I'm happy it brought attention to the fact that Go is still pretty inadequate at this.

My main point is engineers keep trying to shoehorn it in domains, where, should they not want to use Rust or Zig as you mentioned (which are great), they should have chosen C# (which is also great for FFI, look at Stride3D, Ryujinx or even its Sqlite driver speed, all of which are FFI heavy), but instead they keep attempting to use Go where they have to work hard to counteract its inadequacy, instead of using a platform where their solution would perform great not despite the tool but because of it.

randomdata · 2024-07-31T15:11:52.000000Z

The biggest problem with FFI in Go (gc, at least) isn't in the FFI operation itself, rather FFI functions that block for a long time mess with the scheduler.

It seems C# suffers the same problem as real-world benchmarks often show it to be slower than Go when performing FFI, even if it should be theoretically faster.

As usual, performance can be hard to predict. Benchmarking the actual code you intend to use is the only way to ensure that what you think is true actually is. If you aren't measuring, you aren't engineering.

neonsunset · 2024-07-31T15:28:24.000000Z

Do you have any example of code that can demonstrate how it is possible to meaningfully slow down .NET and its threadpool and GC when performing FFI where Go does not suffer to a significantly greater extent?

(if you fashion a Go example - that'll be enough and I'll make a C# one)

.NET's threadpool is specifically made with the consideration of worker threads being potentially blocked in mind, and has two mechanisms to counteract it - hill-climbing algorithm that grows and shrinks the active number of threads to minimize task wait time in queues, and another mechanism to actively detect blocked threads (like system sleep or blocked by synchronous network read) and inject additional workers without waiting for hill-climbing to kick in. It is a very resilient design. Go's and Tokio threadpool are comparatively lower effort - both are work-stealing designs but neither has the active scaling mechanism .NET has already had since .NET Framework days.

GC implementation at the same time is pinning-aware and can shuffle around memory in such a way to allow other objects to participate in collection or promotion to older generations while keeping the pinned memory where it is. There have been years of work towards improving this, and there is also an additional pinned memory heap for long-lived pinned allocations on the rare occasion where just performing malloc is not appropriate.

I doubt there is any other high-level language or platform that can compete on FFI with .NET, something that has been considered as a part of its design since the very first version. If you want better experience your main upgrade options are literally C, C++, Zig, Rust, and honorable mention Swift (it is mostly a side-grade, with the heavy lifting done by LLVM).

randomdata · 2024-07-31T15:50:36.000000Z

> Do you have any example of code

No better than your own code. Why not put it to the test? In the end you will either know that you made the right choice, or have the better solution ready to swap in. You can't lose.

The key takeaway from the previous comment isn't some pointless C# vs Go comparison, it is that performance can be hard to predict. Someone else's code isn't yours. It won't tell you anything about yours. Measure and find out!

neonsunset · 2024-07-31T16:00:33.000000Z

You did mention there exists an FFI scenario where Go supposedly performs better. It would be interesting to look at it, given the claim.

randomdata · 2024-07-31T16:05:45.000000Z

What, exactly, is interesting about arbitrary benchmarks? It might just be the C# code is slower because the developer accidentally introduced different, less performant, logic. It doesn't tell you anything. Only your own code can tell you something. I am not sure how to state this more clearly.

What would actually be interesting is to see you gain those important nanoseconds of performance that is so critical to your business. We want to see you succeed (even if you don't seem the want the same for yourself?).

Capricorn2481 · 2024-07-31T16:49:46.000000Z

> It might just be the C# code is slower because the developer accidentally introduced different, less performant, logic

That is why the user is asking for specific code. So they can audit whether this is a case when someone claims Go FFI is fine.

As an outsider, all I see are two people saying "no it's not slow," just about different languages. But until I have a production app in either I'll never know.

randomdata · 2024-07-31T17:08:17.000000Z

> So they can audit whether this is a case when someone claims Go FFI is fine.

Of course nobody would claim such a thing. The cost is real. Whether or not Go is fine will depend entirely on what kind of problem environment you are dealing with and what your own code actually looks like. Someone else's code will never tell you this. There is no shortcut here other than to measure your own code.

> As an outsider, all I see are two people saying "no it's not slow," just about different languages.

It is slow, relatively speaking. But does that matter in your particular situation? Random internet benchmarks show that Python is always slower, way slower, yet people find all kinds of productive uses for Python – even in domains where computational performance is very important! And if it does matter for what you're doing, are you sure you actually picked the fastest option? Measure and find out.

It is good to have rough estimates, but all of these languages are operating within the same approximate timescale here. It's not like Go, C#, or any other language is taking minutes to perform FFI. When you really do need to shave those nanoseconds off, guessing isn't going to get you there. Measure!

Capricorn2481 · 2024-07-31T18:02:42.000000Z

I'm in agreement, but it's even simpler than you're making it.

If Go is slow in a certain context, I would want to know what that context is. If there's a certain task that takes a few more ms in goroutines due to some implementation detail, I would know not to use Go if that task needed to be 100,000 times. Perhaps I need to rethink the task itself, or maybe that's not possible for an organizational reason.

It wouldn't be a "random internet benchmark" unless I didn't understand the context. What's random is saying this

>It seems C# suffers the same problem as real-world benchmarks often show it to be slower than Go when performing FFI, even if it should be theoretically faster

How is this better than asking for code examples?

randomdata · 2024-07-31T18:39:35.000000Z

> If Go is slow in a certain context, I would want to know what that context is.

You'll know as soon as you measure it. Not exactly rocket science, just plain old engineering. Measuring is what engineers do. You wouldn't build a bridge without first measuring the properties of the materials, and you wouldn't build a program without measuring the properties of its 'materials'.

You make a good point that it is strange we don't get better datasheets from 'material manufacturers' about the base measurements. That wouldn't fly in any other engineering discipline, but I guess that's the nature of software still being young. As unfortunate as that may be, you can't fight the state of affairs, you're just going to have to roll up your sleeves. Such is life.

> How is this better than asking for code examples?

Cunningham's law explains why it is better.

Capricorn2481 · 2024-08-01T07:56:12.000000Z

> Cunningham's law explains why it is better.

That's better for YOU if you are trying to get answers, but for me the reader, you made up something about C# in the hopes of being corrected, and then lectured people asking for receipts.

randomdata · 2024-08-01T16:32:26.000000Z

> That's better for YOU if you are trying to get answers

Indeed. No sense in breaking the law.

> but for me the reader

I bet they wrote a song about you – or at least, as the song goes, so you think.

> you made up something about C# in the hopes of being corrected

It wasn't made up. The FFI benchmarks I looked at truly did show that. I did not verify exactly what was the cause for the slowness, though – and I clearly maintained the doubt in the original comment in recognition of that. Speculation isn't quite the same as what you are postulating.

Nice execution of Cunningham's law, by the way. Now you're getting it. Welcome to the internet! You're going to like it here.

superb_dev · 2024-07-31T17:35:24.000000Z

Could it be that Go has other benefits that outweigh ffi being a little slower?

aatd86 · 2024-07-31T10:44:44.000000Z

Are there some recent measurements? All I could find is somewhat old and I think there has been some work done since.

fingerlocks · 2024-07-31T09:56:47.000000Z

This is _hacker_ news, doing the thing it wasn’t designed to do is the point. The harder the challenge, the better.

Yes there are tons of better choices for rust interop. Any LLVM language will work. That’s not interesting.