Maybe they are asking the wrong questions? Does Rust need to change to make it e...

lambda · 2024-07-15T14:53:53.000000Z

Calling C from Rust can be quite simple. You just declare the external function and call it. For example, straight out of the Rust book https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#usin... :

  extern "C" {
      fn abs(input: i32) -> i32;
  }

  fn main() {
      unsafe {
          println!("Absolute value of -3 according to C: {}", abs(-3));
      }
  }

Now, if you have a complex library and don't want to write all of the declarations by hand, you can use a tool like bindgen to automatically generate those extern declarations from a C header file: https://github.com/rust-lang/rust-bindgen

There's an argument to be made that something like bindgen could be included in Rust, not requiring a third party dependency and setting up build.rs to invoke it, but that's not really the issue at hand in this article.

The issue is not the low-level bindings, but higher level wrappers that are more idiomatic in Rust. There's no way you're going to be able to have a general tool that can automatically do that from arbitrary C code.

jiripospisil · 2024-07-15T15:00:57.000000Z

There's also cbindgen for going the other way around. https://github.com/mozilla/cbindgen

jacobgorm · 2024-07-15T19:54:14.000000Z

Passing integers around is easy, sharing structs or strings and context pointers for use in callbacks crossing the language barrier etc is typically much harder.

Someone · 2024-07-15T22:38:13.000000Z

For rust code calling C, sharing structs is doable with #[repr(C)]. See https://doc.rust-lang.org/reference/type-layout.html#reprc-s...

(Nitpick: I don’t think it technically is correct to call this “The C representation”, as strict layout in C depends on the C compiler/ABI. I wouldn’t trust this to be good enough for serializing data between 32-bit and 64-bit systems, for example. For calling code on the same system, it’s good enough, though)

varjag · 2024-07-15T15:16:58.000000Z

That's not really "simple", it's on par with C FFI in about any other language (except C++), with same drawbacks.

commodoreboxer · 2024-07-15T16:21:18.000000Z

It's on par with C++, too. In C++ you need an `extern "C"`, because C++ linkage isn't guaranteed to be the same as C linkage. You can get away with wrapping that around it in a preprocessor conditional, but that's not all that much easier than Rust's bindgen.

A lot of C to C++ interop is actually done wrong without knowing it. Throwing a C++ static function as a callback into a C function usually works, but it's not technically correct because the linkage isn't guaranteed to be the same without an extern "C". In practice, it usually is the same, but this is implementation-defined, and C++ could use a different calling convention from C (e.g. cdecl vs fastcall vs stdcall. The Borland C++ compiler uses fastcall by default for C++ functions, which will make them illegal callbacks for C functions).

The major difference between Objective-C and C++'s C interop and other languages is the lack of the preprocessor. Macros will just work because they use the same preprocessor. That's really not easy to paper over in other languages that can't speak the C preprocessor.

spacechild1 · 2024-07-15T22:38:02.000000Z

I think you're confusing some terms here.

> In C++ you need an `extern "C"`, because C++ linkage isn't guaranteed to be the same as C linkage.

`extern "C"` has nothing to do with linkage, all it does is disable namemangling, so you get the same symbol name as with a C compiler.

> Throwing a C++ static function as a callback into a C function usually works, but it's not technically correct because the linkage isn't guaranteed to be the same without an extern "C".

Again, linkage is not relevant here. Your C++ callbacks don't have to be declared as extern "C" either, because the symbol name doesn't matter. As you noted correctly, the calling conventions must match, but in practice this only matters on x86 Windows. (One notable example is passing callbacks to Win32 API functions, which use `stdcall` by default.) Fortunately, x86_64 and ARM did away with this madness and only have a single calling convention (per platform).

LegibleCrimson2 · 2024-07-16T02:00:12.000000Z

> `extern "C"` has nothing to do with linkage, all it does is disable namemangling, so you get the same symbol name as with a C compiler.

extern "C" also ensures that the C calling convention is used, which is relevant for callbacks. It's not just name mangling. This is the reason that extern "C" static functions exist. You can actually overload a C++ function by extern "C" vs extern "C++", and it will dispatch it appropriately based on whether the passed in function is declared with C or C++ linkage.

And I'm not sure the terms are confused, because that's how most documentation refers to it: https://learn.microsoft.com/en-us/cpp/cpp/extern-cpp?view=ms...

> In C++, when used with a string, extern specifies that the linkage conventions of another language are being used for the declarator(s). C functions and data can be accessed only if they're previously declared as having C linkage. However, they must be defined in a separately compiled translation unit.

And https://en.cppreference.com/w/cpp/language/language_linkage

The post you're replying to had it completely right. extern "C" is entirely about linkage, which includes calling convention and name mangling.

> As you noted correctly, the calling conventions must match, but in practice this only matters on x86 Windows.

Or if you want your program to actually be correct, instead of just incidentally working for most common cases, including on future systems.

If you're passing a callback to a C function from C++, it's wrong unless the callback is declared extern "C".

spacechild1 · 2024-07-16T11:38:29.000000Z

> extern "C" also ensures that the C calling convention is used, which is relevant for callbacks. It's not just name mangling.

I stand corrected. I didn't know that `extern "C"` enforces the C calling convention.

However, on modern platforms this doesn't really matter because, as I said, there is only a single calling convention (per platform). And I'm pretty sure that future platforms will keep it that way. Fortunately, if you try to pass a C++ callback of the wrong calling convention, you get a compiler error.

> If you're passing a callback to a C function from C++, it's wrong unless the callback is declared extern "C".

That's certainly not true because `extern "C"` is not the only way to specify the calling convention. In fact, you might need a different calling convention! As I mentioned, on x86 the Windows API uses stdcall for all API functions and callbacks, so `extern "C"` would be wrong. If you look at the Microsoft examples, you will see that they declare the callbacks as WINAPI (without `extern "C"`): https://learn.microsoft.com/en-us/windows/win32/procthread/c...

So I stand by my point that in practice you don't need `extern "C"` for passing C++ callbacks to C functions. You can pass a lambda function just fine, and when it doesn't work the compiler will tell you.

LegibleCrimson2 · 2024-07-16T14:03:58.000000Z

A couple big caveats here:

* cdecl is a platform specific calling convention. There is no standard C ABI. cdecl is a wintel thing, not the standard C calling convention. On Linux, this is the System V ABI for instance. On Windows ARM, it's also not cdecl.

* Specifying calling convention at all is a compiler specific extension. There is no standard way of specifying a C calling convention without `extern`.

So specifying cdecl gets you the right calling convention on some platforms and ties your code to some specific compilers. The only portable way to specify C linkage in a C++ program is extern "C". You will always get the right ABI for your platform and it will work on every compiler.

> So I stand by my point that in practice you don't need `extern "C"` for passing C++ callbacks to C functions. You can pass a lambda function just fine, and when it doesn't work the compiler will tell you.

The compiler will very often not tell you. It will complain if the lambda can't be coerced to a function pointer (because it's a closure) or if the argument or return types are wrong. An incorrect ABI will usually be accepted and will just do the wrong thing or crash at runtime. The C++ standard says that language linkage is part of a function's type, but very few compilers actually support this.

Your position works sometimes for some compilers and some platforms. I assert that it's better to use standard C++ features and just work everywhere.

spacechild1 · 2024-07-16T15:29:47.000000Z

> * Specifying calling convention at all is a compiler specific extension.

Yes, because the calling conventions themselves are platform/compiler specific.

> There is no standard way of specifying a C calling convention without `extern`.

Well, on modern platforms you don't need to because there is only a single calling convention that is shared between C and C++. For legacy platforms with multiple calling conventions, you need compiler specific extensions by definition.

> The only portable way to specify C linkage in a C++ program is extern "C". You will always get the right ABI for your platform and it will work on every compiler.

Again, on platforms with several calling conventions `extern "C"` absolutely won't give you the appropriate calling convention all the time. See again my Win32 API example.

> The compiler will very often not tell you > An incorrect ABI will usually be accepted and will just do the wrong thing or crash at runtime.

That's absolutely not my experience! Functions with different calling conventions have different types, so a C++ compiler must reject such code. See https://godbolt.org/z/6EnncE5v5. (Note that for the lambda case MSVC is smart enough to automatically add __stdcall whereas MinGW refuses to compile. The free function is rejected by both compilers.)

Can you show me an actual example where a C++ compiler silently accepts a function with the wrong calling convention?

> Your position works sometimes for some compilers and some platforms.

It has always worked for me so far and I write software for many different platforms.

LegibleCrimson2 · 2024-07-16T15:59:27.000000Z

Ah, yeah, you're right. I was spacing the fact that C as well as C++ can have multiple calling convention. I blame early morning brain.

As far as the wrong calling convention goes, I'm basing it on the fact that an extern "C++" function can be passed as a callback where an extern "C" is demanded. Even if they're the same calling convention, that should fail, but it doesn't. Looks like it doesn't fail at runtime, which is a small comfort, but given the different permissiveness of different compilers, it still makes me very nervous to pass a C++ function as a C callback and just hope that it works, given that it isn't guaranteed in the standard.

spacechild1 · 2024-07-16T19:55:55.000000Z

> Even if they're the same calling convention, that should fail, but it doesn't.

It's an interesting question. According to the standard, functions with different language linkage are indeed considered different types. As a consequence, <cstdlib> should declare two overloads for qsort() that only differ in the type of the sort function. However, modern compilers don't seem to care:

"The only modern compiler that differentiates function types with "C" and "C++" language linkages is Oracle Studio, others do not permit overloads that are only different in language linkage, including the overload sets required by the C++ standard"

https://en.cppreference.com/w/cpp/language/language_linkage

In practice, extern "C" does two things (as you correctly pointed out):

1. disable name mangling - This only affects the symbol name and is not relevant for callback functions

2. enforce the (default) C calling convention - On all (modern) platforms I know, C and C++ have the same default calling convention for free functions.

This means that from the view of a C++ compiler, pointers to `foo()` and `extern "C" foo()` have the exact same type.

Anyway, no need to be nervous. Even if the compiler treated these as different types, you would get a compiler error because C++ disallows implicit casts between different pointer types.

LegibleCrimson2 · 2024-07-16T20:21:53.000000Z

As long as I can't silently get wrong behavior or runtime crashes, I'm happy enough. Is it guaranteed that an incorrect calling convention will always cause a compiler error? I wasn't aware the calling convention was considered part of the pointer type.

Anyway, thanks for engaging with me so earnestly. I guess I had some assumptions about calling conventions that needed to be straightened out, which is important, as I'm doing work in this territory right now.

spacechild1 · 2024-07-16T21:54:25.000000Z

> Is it guaranteed that an incorrect calling convention will always cause a compiler error?

A standard-conforming C++ compiler must not allow implicit pointer casts, so yes!

> I wasn't aware the calling convention was considered part of the pointer type.

Some well-designed C APIs define a macro for the calling convention that they add to all API functions and function pointer declarations. The user can then use the same macro when supplying their callbacks, which guarantees that the calling conventions match. (On modern platforms, the macro would be typically empty.)

Here's an example: https://github.com/Celemony/ARA_API/blob/1f68fba7a374b14df19.... As you can see, it is part of the function pointer type: https://github.com/Celemony/ARA_API/blob/1f68fba7a374b14df19...

Another famous example is, of course, the WINAPI macro in the Win32 API.

That's also what I tend to do with my own C APIs.

> I guess I had some assumptions about calling conventions that needed to be straightened out

I also learned a few things in this discussion, so thanks for that!

kelnos · 2024-07-15T19:43:21.000000Z

How is that not simple? You just declare the function and then call it. I find it hard to imagine how it could be any more simple than that.

varjag · 2024-07-15T21:09:41.000000Z

Now imagine a hundred or two functions, structures and callbacks, some of them exposed only as CPP macros over internal implementation. PJSIP low level API is one example.

lambda · 2024-07-16T03:30:11.000000Z

But... that's what bindgen is for. Which I mentioned.

I said it "can be quite simple"; for simple use cases, just using extern and translating the declarations by hand is perfectly viable.

For more complex cases, you use bindgen.

varjag · 2024-07-17T08:16:01.000000Z

Bindings generators exist in most other languages with same limitations.

I would love to see how bindgen would handle a function call defined as a preprocessor macro that I mentioned. Because most likely it won't.

googh · 2024-07-16T07:05:38.000000Z

Can someone shed some light on why the parent comment (by varjag) is downvoted?

gizmo686 · 2024-07-15T15:33:26.000000Z

... And? Most languages make C interop simple.

varjag · 2024-07-15T15:41:43.000000Z

They quickly become unwieldy on non-trivial APIs, with hundreds of definitions across dozens of files and with macros to boot. Naturally people would still get the job done but it's beyond simple.

mcronce · 2024-07-15T16:30:33.000000Z

That's what bindgen is for, as was mentioned in the original comment you replied to.

varjag · 2024-07-15T19:16:41.000000Z

How well does it handle preprocessor macros in APIs?

marshray · 2024-07-15T22:33:02.000000Z

I have used it successfully against header files for Win32 COM interfaces generated from IDL which include major parts of the infamous "windows.h". Almost every type is a macro.

This is an extremely well-understood space.

Just open the docs and do it.

varjag · 2024-07-17T08:20:37.000000Z

Not types, functions. Where the macro is essentially a forward declaration but the implementation is deep inside the code and is not exposed via headers.

tupshin · 2024-07-15T13:49:21.000000Z

This is not a notable challenge in rust, nor relevant to the article.

The article is about finding ways of using rust to actually implement kernel fs drivers/etc. Note that any rust code in the kernel is necessarily consuming C interfaces.

Bindgen works quite well for the use case that you are thinking.

https://github.com/rust-lang/rust-bindgen

moomin · 2024-07-15T14:48:57.000000Z

Yeah, the Rust proponents are being significantly more ambitious. Not just the ability to code a file system in Rust, but do it in a way that catches a lot of the correctness issues relating to the complex (and changing) semantics of FS development.

duped · 2024-07-15T14:32:44.000000Z

It's actually pretty easy. All you need is declare `extern "C" fn foo() -> T` to be able to call it from Rust, and to pass the link flags either by adding a #[link] attribute or by adding it in a build.rs.

You can use the bindgen crate to generate bindings ahead of time, or in a build.rs and include!() the generated bindings.

Normally what people do is create a `-sys` crate that contains only bindings, usually generated. Then their code can `use` the bindings from the sys crate as normal.

> in contrast, in C++ and Objective C, all you need to do is include the right header

and link against the library.

Smaug123 · 2024-07-15T17:37:19.000000Z

The point is that Rust can model invariants that C can't. You can call both ways, but if C is incapable of expressing what Rust can, that has important implications for the design of APIs which must be common to both.

gwbas1c · 2024-07-15T21:29:19.000000Z

That's not how I interpreted it: There is a clear need to be able to write filesystems in Rust, and the kernel developer(s) who write the filesystem API don't want to have to maintain the bindings to Rust.

Smaug123 · 2024-07-16T07:30:30.000000Z

They say this in almost every paragraph! For example, five of the first seven paragraphs:

> The first is to express more of the requirements using Rust's type system in order to catch more mistakes at compile time.

> Almeida showed an example of how the Rust type system can eliminate certain kinds of errors.

> … it was exactly that kind of discussion/argument that could be avoided by encapsulating the rules into the Rust types and abstractions; the compiler will know the right thing to do.

> … All of that is enforced through the type system.

> the whole idea is to determine what the constraints are from Viro and other filesystem developers, then to create types and abstractions that can enforce them.

More explicitly:

> The object lifecycles are being encoded into the Rust API, but there is no equivalent of that in C; if someone changes the lifecycle of the object on one side, the other will have bugs.

> As those changes occur, "we will find out whether or not this concept of encoding huge amounts of semantics into the type system is a good thing or a bad thing".

gwbas1c · 2024-07-16T10:59:39.000000Z

> In addition, when the C code changes, the Rust code needs to follow along, but who is going to do that work? Almeida agreed that it was something that needs to be discussed.

FWIW: I shipped a Windows file system driver in 2020. The api hadn't changed in years. Does Linux's API for kernel-space filesystems really change so rapidly that keeping the rust bindings up-to-date would be a considerable amount of work, in the long run?

kelnos · 2024-07-15T19:42:21.000000Z

> Does Rust need to change to make it easier to call C?

No, because it's already dirt-simple to do. You just declare the C function as 'extern "C"', and then call it. (You will often need to use 'unsafe' and convert or cast references to raw pointers, but that's simple syntax as well.)

There are tools (bindgen being the most used) that can scan C header files and produce the declarations for you, so you don't have to manually copy/paste and type them yourself.

> Maybe Rust as a language needs to bend a little in this case, instead of expecting the kernel developers to bend to the language?

I think you maybe misunderstood the article? There's nothing wrong with the language here. The argument is around how Rust should be used. The Rust-for-Linux developers want to encode semantics into their API calls, using Rust's features and type system, to make these calls safer and less error-prone to use. The people on the C side are afraid that doing so will make it harder for them to evolve the behavior and semantics of their C APIs, because then the Rust APIs will need to be updated as well, and they don't want to sign up for that work.

An alternative that might be more palatable is to not make use of Rust features and the type system in order to encode semantics into the Rust API. That way, it will be easier for C developers, as updating Rust API when C API changes will be mechanical and simple to do. But then we might wonder what the point is of all this Rust work if the Rust-for-Linux developers can't use Rust some features to make better, safer APIs.

> I've done a bit of Rust, and (as a hobbyist,) it's still not clear (to me) how to interoperate with C.

Kinda weird that you currently have the top-voted comment when you admit you don't understand the language well enough to have an informed opinion on the topic at hand.

codetrotter · 2024-07-15T14:29:35.000000Z

I’ve written Rust code that called C++

It wasn’t completely straightforward, but on the whole I figured out everything I needed to within a few days in order to be able to do it.

Calling C would surely be very similar.

emporas · 2024-07-15T20:07:58.000000Z

If you like to see some examples of C bindings:

https://github.com/tree-sitter/tree-sitter/blob/25c718918084...