Here is sample code that illustrates how it works.
The approach to using inline functions in C is a bit counterintuitive. Here is how you might define a normal function in C:
// file.h
int add_one(int x);
// file.c
#include "file.h"
int add_one(int x) {
return x + 1;
}
Here is how you might define it inline:
// file.h
inline int add_one(int x) {
return x + 1;
}
// file.c
#include "file.h"
int add_one(int x);
Seems backwards to have a definition in the header and prototype in the implementation file, but because the prototype is "extern" (which is default, and can be omitted), and because the definition + declaration merge in certain ways (extern + inline = extern inline), you end up with an "extern inline" definition which means "emit the code in this translation unit, please." If LLVM's behavior is counterintuitive, it's probably because it's there to support counterintuitive parts of the C language.
This frees you from making decisions in a sense, because the compiler can decide whether to use the inline or extern definition at any given call site. Again, the syntax for it is counterintuitive.
The other way is:
// file.h
static inline int add_one(int x) {
return x + 1;
}
This may give you multiple copies of the code (emitted in potentially any translation unit it appears in), but it's less typing.
Any guesses as to why C chose such crazy semantics for inline functions? "The compiler can decide to use the inline or extern definition" is begging for trouble.
"inline functions cannot have their address taken and cannot have static variables" seems so natural and obvious.
> "The compiler can decide to use the inline or extern definition" is begging for trouble.
IMO this behavior seems like the obvious choice, but I've been writing C long enough that I'm sure my perspective is distorted and my sense of what is obvious is completely out of whack.
C++ made the choice that linkers should be able to coalesce duplicates, C made the choice that linkers do not need this feature. If you want to inline, you want definitions in every translation unit. If you don't want to have to inline, you want to pick a translation unit which gets the code.
The compiler is what chooses whether a function is inline, and programmers should not think of the "inline" keyword as affecting that choice.
> "inline functions cannot have their address taken and cannot have static variables" seems so natural and obvious.
Yes, obvious that they should not have static variables. There's nothing wrong with taking the address of an inline function, though. It behaves just like a normal function. Pointers to a function will compare equal if and only if they point to the same function--a function with external linkage is the same function in every translation unit, a function with internal linkage is a different function in every translation unit. Whether or not the function is inline does not matter (why would it?)
This means that you can take functions in an existing C code base, and you can make them inline, without any change in functionality. This should always work, barring... you know... those weird unexpected consequences. Isn't that nice though? Whether a function is inline is a detail that you don't have to care about, unless you're defining the function.
Remember that "inline" does not mean, "this function will be inlined". Instead it means, "the compiler may choose to inline this, or not, at its discretion." Which really just means, "the definition for this function is visible in the translation unit where it is called", because the compiler can generally choose to inline any function if the definition is there.
So, the keyword could be renamed to reflect what it means. Instead of:
// file.h
inline int add_one(int x) {
return x + 1;
}
// file.c
int add_one(int x);
You could choose a different name for the keyword:
// file.h
do_not_emit_code_in_this_translation_unit_by_default
int add_one(int x) {
return x + 1;
}
// file.c
override_default_and_omit_code_in_this_translation_unit
int add_one(int x);
> C++ made the choice that linkers should be able to coalesce duplicates
Yeah and shared libraries are where it gets ugly. For example, libstdc++ has an empty string singleton `_S_empty_rep_storage`. Its headers compare a std::string's storage against this, by address, so the dynamic linker is on the hook to coalesce the empty string symbols. It doesn't always work, and then things break mysteriously when you have two copies of the same variable.
> There's nothing wrong with taking the address of an inline function, though. It behaves just like a normal function.
I think you're proposing that inline functions should have a single definition? This breaks header-only libraries, and also gets sticky with shared libs; whose symbol wins?
If everything is one big static link then it's probably fine. If your compiler is deciding to emit code now, or link to future code, dependent on optimization level...it's painful!
> I think you're proposing that inline functions should have a single definition? This breaks header-only libraries, and also gets sticky with shared libs; whose symbol wins?
You can have the definition in as many translation units as you like, it's just that only one of the definitions can have extern linkage. This works right now, today, both with header-only libraries and across shared library boundaries. If you take the address of an inline function defined this way, you just get the address of the extern linkage definition. The C compiler will do the right thing, as long as you keep in mind the limitations:
- Anything static will get duplicated in each translation unit. (Regardless of whether something is inline--the "inline" keyword isn't relevant here.)
- No two translation units have an extern definition for the same symbol. (Again, regardless of whether the symbol is an inline function.)
All "inline" lets you do is two things:
- You can define a function without creating an extern linkage symbol.
- Static inline functions do not cause warnings if not used.
So, with your shared library, you have to pick one translation unit in one library which gets the extern definition. This is the same restriction you have with non-inline functions--if you are linking a static library into two different shared libraries, and then combining these shared libraries, you will run into problems regardless of "inline".
Or put another way... none of these issues are related to the "inline" keyword in C. The "inline" keyword does not create any new problems.
I'm confused. You would presumably have header guards for file.h. Why do you care about inline then, the function is included verbatim in all compilation units having file.c.
This subverts the notion of what "static" means here. "Static" means "private to this translation unit".
Usage of the linker in modern C has been moving in the opposite direction, IMO, towards keeping the interface between the compiler and linker simple. For example, it looks like the trend is towards eliminating the use of "common" variables--GCC now defaults to -fno-common.
You can still get all sorts of fancy stuff with LTO turned on. But if you want no duplicates, you can express that intent by choosing a specific translation unit to contain the duplicates.
Yes, that's true... but often, visible symbols don't have enough information to get deduplicated anyway. Often, the symbol is just an address within a section in the object file. The section contains other code, and you can't remove things from it... by default, on most systems.
E.g. if you have file.o, the linker will see something like this:
section .text: [...16kb of data follows...]
section .data: [...2kb of data follows...]
my_function = .text + 0x1f3a
This is simplified, but it just illustrates the core of what an object file looks like during linking.
It's just not enough information to go on, if you want to deduplicate a function. C runs on weird embedded systems. You might think, "Just use LTO" and well, those weird systems don't always have LTO. You might think, "If you care about code size, don't use inline functions!" and well, sometimes, inlining a function results in smaller code!
The most common use of inline in C++ is to define a function in a header file that is potentially #included in multiple source files. Without the inline specifier, this would result in multiple definition errors at link time.
It's an example (there are several in C++) of reusing syntax/semantics to replace something no-one does (trying to get the compiler to inline code) with something that is useful.
Right, the inline keyword in C++ is mainly used to dodge ODR. When you really care about inlining, you want an attribute like always_inline, which emits an error if the function cannot be inlined (e.g. is recursive).
> There can be more than one definition in a program of [...an] inline function [...] as long as all of the following is true:
> - each definition consists of the same sequence of tokens (typically, appears in the same header file)
> [...]
> If all these requirements are satisfied, the program behaves as if there is only one definition in the entire program. Otherwise, the program is ill-formed, no diagnostic required.
Doing a function call in C/C++ is not expensive. I wish people would stop saying this, it's just not true, it's one of the most annoying myths in programming. A C function call is on the order of a tiny number of nanoseconds. Here's a benchmark to compare [1]. It's an infinitesimal amount, and it is absolutely swamped by something like a cache miss. If you eliminate one cache miss but add 50 function calls, that's probably a net benefit.
The only sense in which function calls in C are "expensive" is during compilation: if a compiler can inline a function call, it can potentially do tons of new optimizations that it couldn't do before, and that can yield a huge improvment in performance. But the function call ITSELF is almost never the problem.
But your point still stands. In my experience the greatest benefits of inlining is when range or null checks can be removed in the inlined function, not the actual call.
Ah, good point, I missed that my version became a tail call, good catch. Still, a C function call is pushing a couple of values onto the stack then performing an unconditional jump, so it's not much worse.
That site is indeed very neat! It just uses Google Benchmark in the background, but it's excellent for these kinds of discussions (and lovely to have it link directly to godbolt if you want). It's a shame that it doesn't give you the actual latency numbers like regular Google Benchmark, but I suppose that is to be expected when you're running in server VMs, those numbers aren't necessarily meaningful.
The only legitimate way to influence the compiler's inlining decisions is by profile guidance. If you aren't providing a profile at compile time, you demonstrably do not care about performance.
But it's mostly true. Trust the compiler. They are very, very good. They have a better understanding of the performance characteristics of the current platform with the current code than the programmer. Another benefit is you do not need to maintain and update the manual optimisations once one assumption has changed. __force_inline does more bad than good.
(some compilers these days have whole programm optimisation and outlining, things that make the decision when to inline and when not to inline even harder for humans)
In some ways they are, and in some ways they're not.
I recently made a loop 5x faster by writing it slightly differently. Reason? MSVC decided to emit code that messes up store forwarding (very much a microarchitectural detail). Spelling out the pointer derefs produced much better code.
More specifically, the loop was loading ARGB values and storing them as BGR (yes, blitting on the CPU, don't ask). MSVC tried to be clever by storing the lower 16 bits of the ARGB value to the stack and then reading the individual bytes for writing. CPUs of course don't (usually) go to main memory when you write to memory and then read it, due to store forwarding. But that only works if your stores and loads are the same sizes - which 16 vs 8 bits are not. So the compiler somehow managed to make a 3 byte twiddle memory bound.
Profile, and don't be afraid of reading some assembly.
I agree that in an ideal world function boundaries would be primarily for readability and at most a minor hint to compilers. But I also think youthat give compilers way too much credit. Compilers are still generally applying one relatively simple rule after another. Often that will get good results, but it sometimes fails in surprising ways because the compiler will not be able to predict the effect of one optimization on later ones. This is where hints such as "always inline" are useful, and that doesn't even have to do anything with the target platform.
And as for PGO, yes that's useful, but it's also not a silver bullet: a) mentioned above, thresholds for local optimizations (which is what PGO affects) are not always enough and b) getting a representative profile is not trivial and also needs to be kept up to date.
as pointed out in the reply, this is incomplete. there are actually three versions of inline in common use in C-like languages: gnu89, c99, and c++, which are all different.
My (probably unpopular) opinion is that 'inline' should never have been added to the C standard, because it breaks the strict separation of interface declaration (in headers) and implementation code (in source files). Inline in C++ is the source of all sorts of problems, the biggest one being slow compilation because the same inline code in headers needs to be parsed over and over again.
Within the same compilation unit, the compiler will inline any suitable function anyway, and 'static' alone is enough to hint the compiler that there's no separate copy of the function needed if it can be inlined at all call sites.
And for inlining across compilation units, there's LTO these days (which admittedly wasn't a realistic option in the late 90's).
If you're putting an inline function into your header file, the size of the code for that function is part of the information provided by the interface (i.e. the inline code is getting stored with the user). Every user has to know they're duplicating that code locally.
I agree, though, that for the most part there are probably better ways to do it (including trusting in "the compiler is a better optimizer than my wetware").
If I look at a list of Rust projects (awesome-rust, etc), quite a lot of them either replicate something done in C, or create something that probably would have been done in C.
To kill C++, Rust needs to grown an ecosystem for game engines, GPGPU shading languages (Metal/HLSL/CUDA), OS GUI stacks (Qt/WinUI) and composition engines,...
Also, nothing is even close to matching the ability of C++ to build against other C++.
Even if Rust had Rust alternatives for enough of the interesting niches, it will take a loooong time for all the relevant software to either get rewritten or reach end of life.
These are the sort of reasons I’ll never be able to use C or C++ in any real capacity.
Sure I can read it and to a much lesser write it, but there’s way to many crazy things dependent on the compiler/platform that I never run into and only ever see in passing contexts like this. I think I’d trust my self to learn Haskell before I ever trust myself to write production C.
>I think I’d trust my self to learn Haskell before I ever trust myself to write production C.
Hyperbole much? This is just one of "those" things which you only learn when you need it.
In actual usage, myself and most people i know have never used "inline" in C. Macros are the time honoured way of doing such things. There is no mystery to it but just a choice and convention (compared to C++).
Your argument really does not have a leg to stand on.
The concept of "inlining" a function is Programming 101. The mechanism of how to achieve it is what is specific to a language/runtime. You check the syntax/semantics/documentation and figure it out aka problem-solving. Not a big deal and no need to make unwarranted claims/charges against the language as a whole.
C, unlike many modern programming languages, requires that you understand how a computer works. It's just not an appropriate language for most programmers, including many of the more vocal who comment here on HN.
If you're writing an app, or scripting together some apps, you don't need C. If you're programming a computer, you need a tool that depends on the computer to do its job, and that means you're going to have to tolerate crazy things dependent on the platform.
This is what they call a domain problem. Very few app developers are involved in the domain of programming computers. Of course, that does not leave them feeling unqualified to comment on tools and techniques of those who are, because they use computers every day.
Modern computers aren't much at all like the imaginary machine the C programming language is defined against.
The compiler's job is to wrestle what you wrote, for that imaginary machine, into machine code that will run on the real computer somebody actually owns which is quite different. This is a difficult task and it gets harder all the time.
Programmers who believe they're writing "bare metal" programs in C are delusional and are particularly likely to get a very rude awakening when they try to write concurrent software. What the machine really does is too hard for you to sensibly reason about at scale, so C provides SC for DRF and says if you do anything else (and you will) too bad your program's meaning is undefined and you lose.
I have a feeling there’s a world of software outside of your lazy reduction to “apps” vs “computers” which is a pretty nonsensical reduction in the first place.
It’s funny how much hate the Rust community gets when C programmers are just as bad at coming out of the word work to try and insult people when someone takes even the most minor shot at the language they’re so attached to (which was frankly, more or less a shot aimed at myself)
The approach to using inline functions in C is a bit counterintuitive. Here is how you might define a normal function in C:
Here is how you might define it inline: Seems backwards to have a definition in the header and prototype in the implementation file, but because the prototype is "extern" (which is default, and can be omitted), and because the definition + declaration merge in certain ways (extern + inline = extern inline), you end up with an "extern inline" definition which means "emit the code in this translation unit, please." If LLVM's behavior is counterintuitive, it's probably because it's there to support counterintuitive parts of the C language.This frees you from making decisions in a sense, because the compiler can decide whether to use the inline or extern definition at any given call site. Again, the syntax for it is counterintuitive.
The other way is:
This may give you multiple copies of the code (emitted in potentially any translation unit it appears in), but it's less typing.Omitting any discussion about C extensions here.