Personally I don't like when people hide a pointer behind a typedef. If you want to use a typedef, typedef the function and then declare a pointer to that:
typedef int func(void);
func *func_ptr;
Avoids the mess of the function pointer syntax, but still makes the fact that it is a pointer clear.
Last time I checked, Clang doesn't support nested functions although it supports most of GNU extensions (or similar features with slightly altered syntax).
I'm not 100% sure what you mean by "proper closure" but it does capture variables from the outer scope. It has limitations with scopes and lifetimes, of course.
Proper closures require "fat pointers", basically you're storing two pointers, one to the function and one to its context data. (In the case of a nested function that's its stackframe.) They also require that stackframes be generally allocated on the heap.
C doesn't have a type for that, it only has function pointers, which only have space for a single pointer. So what GCC does is actually a horrible hack - it dynamically creates a function (called a trampoline), which calls the actual function with a pointer to its data. But because GCC doesn't have true closures, and only refers to the surrounding function's existing stackframe, which is on the stack, not the heap, this only works until the surrounding function has returned. And since the trampoline is also allocated on the stack, this requires the stack to be executable, which is Not Great for security.
Yes, it's a bit ugly but it still doesn't make it useless. It could be useful for e.g. passing a comparator to qsort using a captured variable to pass the comparison criteria (e.g. compare arrays according to the n'th element).
> So what GCC does is actually a horrible hack - it dynamically creates a function (called a trampoline), which calls the actual function with a pointer to its data.
Well you can call it a horrible hack but it's pretty clever. There's no other way to do this without having a rich runtime system and a language with a built-in concept of a heap (and perhaps a garbage collector).
> And since the trampoline is also allocated on the stack, this requires the stack to be executable, which is Not Great for security.
Yeah - this is pretty nasty. Executable stack is less than useful, although it's not enough to protect against stack/buffer overflow exploits that utilize ROP or other advanced attack methods.
However - in my practical experiments, I have noticed that the optimizer will get rid of most trivial trampolines if the resulting function pointer isn't stored or passed to a function in a foreign translation unit. LLVM in particular is really good in eliminating trampolines.
I wish there was a way to have compile time certainty that no trampolines ever get emitted on the stack. You could still use capturing nested functions with certain limitations.
But yeah - it's not the most useful feature, primarily because it's GCC only and secondarily because, at worst, you'll end up executing a few bytes of machine code from the stack.
Another advantage of typedefing the function is that you can let the compiler help ensure your function definition signatures are correct.
For example, if you typedef a callback type like so:
typedef int callback_t(int foo);
And then declare a callback like so:
callback_t my_callback;
And then later define the callback like so:
int my_callback(int foo) {
...
}
The compiler will produce an error if you screw up the function signature of my_callback() when defining it because it won't match the prototype you defined via the typedef. This only works in C. C++ allows multiple function signatures for the same function name, so you probably won't get a compiler error--though you'll likely wind up with a linker error.
Edit: The downside is that function declarations done this way will look a little odd--possibly mistaken for a variable definition. And, upon further thought, I'm not sure that this pattern really is a huge benefit since function pointer assignment will also produce an error if the signatures don't match. But interesting nonetheless, I guess.
It's never even occurred to me to typedef a function like this, and now that I think about it I'm not sure why. Your way is a lot clearer. Thanks for the tip.
Do you remember where you picked this up? Any particular book or codebase?
I picked it up just reading various C stuff on the internet - I think this one actually came from a Reddit user. It's not a use-case I think I've ever seen used anywhere, except for my personal code. Even then though, I admittedly almost always just type out the regular function-pointer syntax, since I find that function-pointers for me almost always just get declared in one location anyway.
One advantage is that you can declare functions with it, which is useful when you have many different operations of the same type. For example, take a toy calculator:
This is very similar to the 'array-type', which I've written about before (And could talk about if you're interested). Functions (and arrays) degrade into a pointer to themselves when used in most situations (for functions, I can't think of a real case where it doesn't degrade, besides declarations). But the 'func' in this case is the type of a 'function' ('value function' as you're referring). Interestingly, declaring something of type 'func' is the same as forward declaring a function of that type:
typedef int func(void);
/* These two lines are equivalent */
func foo;
int foo(void);
Obviously though, the above is of limited usefulness. It is kinda handy to ensure functions are compatible with a certain typedef, but if it isn't you'll generally see warnings or errors in other locations anyway.
The advantage of my technique here is that it doesn't 'hide' the pointer inside of the typedef, which I consider poor form. Consider these two:
typedef int type1;
typedef int type2(void);
typedef int (*type3)(void);
type1 *var1;
type2 *var2;
type3 var3;
type3 *var4;
All of the above are actually pointers, but from the declaration alone you can't tell that `type3 var3` actually declares a pointer. In fact, `type3 var3` and `type2 * var2` declare the exact same thing (minus the name), but `type2 * var2` makes it clear that `var2` is a pointer and not a value type. I find this to be a fairly nice aide in reading, and if you keep this consistent for all types then you don't ever have to worry about a pointer being hidden, or the usage not matching the declaration (IE. You declare it as `type3 var3` but then do something like `* var3`, which looks incorrect unless you know `type3` is actually a pointer).
Moreover, lots of people that are newer to C (and even those that aren't but just aren't clear or didn't fully check what `type3` is) will attempt to use `type3 * var`, a double-pointer to a function, when what they really want is `type3 var`, a pointer to a function. There's no confusion over what it is if you don't hide the pointer in the first place, and there's really no great reason to hide it besides not knowing you can avoid hiding it in the first place. Even when you're not new to C, keeping track of things when people do `typedef struct foo * bar` and then `bar * foo2` can get to be a headache really fast.
Actually, I used to think this, but I believe it's the other way around. IIRC, the standard says that the `()` operator takes a function pointer before it, and then arguments between the `()`. So when directly calling a function, the name of the function decays into a function pointer which is then used to call the function. Obviously, when you call a function directly a function pointer is not involved internally (The address is just used directly), but the standard still expresses it in that way to make the usage/syntax consistent.
That said, it is possible to see that the function name itself decays into a pointer and isn't a pointer itself. `sizeof(function)` does not return the size of a function pointer, but `sizeof(fptr)` does.
Interesting, function-pointers have the property that they deference to themselves. So `fptr` and `* fptr` are the same thing, as is `* * * * * * fptr` (And also `&fptr`). And since they are the same thing, the `(* fptr) (arg)` syntax works as expected. Personally, I actually prefer to use the `(* fptr) (arg)`, simply because it makes usage of a function pointer clear, but the * really is unnecessary so the benefits are debatable.
The `(* fptr) (arg)` syntax also keeps the "declaration follows usage" pattern intact, since function pointers have to be declared using the * .
I would say that a "function pointer" isn't really a pointer: you can't dereference it or do arithmetic on it. Really, the things we currently call "functions" should be "function literals", and then what's now a "function pointer" could be just a "function".
No, but C will automatically take the address. The same goes for arrays. You can explicitly take the address of a function, and this is different from taking the address of a function pointer.
Pros: beginner confusion; sometimes bugs.
Cons: one less character to type; sometimes able to get more out of a macro
Yes. Once declared, function pointers are similar to regular functions. They both point to a value that cannot be changed. That value is a block of code.
I agree that typedefs are clearer for this but usually you don't have a choice and neither does the reviewer. E.g. coding style for Linux essentially says no typedefs. On the upside I have worked on projects were it's required.
No don't use typedefs there's no real reason[1] and it may cause problems. Also if your peer can't read a function pointer I don't know what help you plan on getting from them anyway, chances are you are helping/teaching them, not the other way around.
That post does not say "don't use typedefs there's no real reason." It says "don't use typedefs without a real reason." Linus even provides a few examples of good uses of typedefs.
Specifically he's talking about typedef'ing structs to a named type to hide that it's a struct. That's different from typedef'ing a function pointer so that you don't need to be a C parsing expert to read the code.
> That's different from typedef'ing a function pointer so that you don't need to be a C parsing expert to read the code.
If you can't read C code you should work on your C programming ability, there is nothing difficult about reading function pointers.
Edit: If people are getting tripped up with basic syntax good luck with the actual hard parts of C, Like lack of memory safety, concurrent memory management, undefined behavior, etc.
It's not just about readability, it's about making a contract with the user. If you typedef a complex struct in your library you're telling your users to treat this type like it's a black-box and not look inside because the next version of the library might contain something different and your code will break if you depend on it.
> It's not just about readability, it's about making a contract with the user. If you typedef a complex struct in your library you're telling your users to treat this type like it's a black-box and not look inside because the next version of the library might contain something different and your code will break if you depend on it.
No need to use typedef, just use an opaque struct if you want something to behave like a blackbox.
Just because Linus Torvalds is a successful programmer doesn't mean he is at all correct about issues of programming. The vast majority of any of the opinions I've seen him express on programming fly in the face of good engineering practice under some guise of "real, macho programmers don't need tools".
I'm not afraid to admit: I need tools. I need lots and lots of tools to do my job. The more the computer can be used to make my job easier, the better. There is no virtue in hard work for hard work's sake.
> Just because Linus Torvalds is a successful programmer doesn't mean he is at all correct about issues of programming. The vast majority of any of the opinions I've seen him express on programming fly in the face of good engineering practice under some guise of "real, macho programmers don't need tools".
Is he wrong in this instance or our you going to ignore his point because he's not always right?
> I'm not afraid to admit: I need tools. I need lots and lots of tools to do my job. The more the computer can be used to make my job easier, the better. There is no virtue in hard work for hard work's sake.
So, as a C programmer I use lot's and lot's of tools. But that doesn't change the point that many view typedefs as bad practice[1].
Most of those are against typedefs for structs, and in the last message, he says that using a typedef for a complex function pointer is "just common sense".
Linus directly contradicts you in that linked thread, for this specific situation:
And as mentioned, there _are_ exceptions. Some types just get _sooo_
complex that it's inconvenient to type them out, even if they are
perfectly regular types, and don't depend on any config option. The
"filldir_t" typedef in fs.h is such an example - it's not really opaque,
_nor_ is it a config option, but it sure as hell would be inconvenient for
all low-level filesystems to do
int my_readdir(struct file *filp, void *dirent,
int (*filldir)(void *, const char *, int, loff_t,
u64, unsigned))
{
...
}
because let's face it, having to write out that "filldir" type just made
me use two lines (and potential for totally unnecessary tupos) because the
thing was so complex. So at that point, using a typedef is just common
sense, and we can do
int my_readdir(struct file *filp, void *dirent, filldir_t filldir)
{
...
}
instead.
But it's really quite hard to make that kind of complex type in C. It's
almost always a function pointer that takes complex arguments.
thing is, in practice it gets used as a type. If you repeat that function definition more than once you are creating a bit of a nightmare to maintain.
For instance if you need to add an extra parameter, you can easily find all usages. If something else uses the same definition, but is used for a different purpose, it gets much harder to distinguish the different usages. especially for a good ol void (callback)(void context) which a number of event's may use, and then later some may callback with more information which can be changed in once place and the compiler will quickly show you what breaks.
It's more readable, but using std::function here introduces a second layer of indirection vs using a plain function pointer.
More specifically, std::function's operator() is virtual, and calls into a subclass that's specialized to function pointers of type void(int). The subclass then performs the actual function pointer call.
Technically function::operator() is not virtual (which wouldn't be very useful as std::function has value semantics), but it does runtime dispatching internally using an unspecified mechanism.
This can be virtual functions, or, more commonly, an hand rolled vtable. In the last case, if std::function is constructed with a function pointer exactly matching its signature it could in principle avoid the thunk and directly point to the function itself. I don't think most implementations bother.
Also this comment by Linus Torvalds (copy pasted because I don't know how to link to Google+ comments):
> I don't think that works. It breaks trivially for consecutive [] or * cases, something that he carefully didn't have in his examples.
> So the examples were made up to make it look like it's a spiral, but type parsing is about precedence, not about spirals. It so happens that the higher-precedence operators ([] and ()) are on the right-hand side, which is why it "works" to start on the right.
All these years I assumed that the 'spiral rule' and 'right-left rule' (linked from the stephencanon comment above, and also described in my second link) are two ways to describe the same algorithm, but, reading closely, they aren't! I guess 'spiral rule' is a stickier name, which is why it's something that people remember even though it's janky.
No, the trick is to remember that declaration follows use. Declare a symbol using (nearly) the same exact syntax you would use to extract a value of the base type from that symbol.
I know this is the subject of holy wars, but once I'd seen the second one my eyes were opened and I had way less trouble. I think that declaration follows use is another of example of the amazing design powers of the patriarchs.
Not the OP, but given that English is historically gender biased, I find it pretty easy to unconsciously fall into the trap of using a gender biased turn of phrase. Even if one is consciously trying to avoid such things, our messy neural nets being what they are, mistakes happen.
Given the almost certainty (in my mind anyway), that the OP meant no disrespect to people who identify with genders that are not male, it would make me very happy indeed if requests for correction could be made in a respectful way. Simply asking something like "Would you mind using the alternative phrasing 'blah blah blah'" would go a long way towards helping everyone to maintain a civil tone.
Thanks for caring about gendered language. I used this term because it's used (humorously) in the Unix Koans[0]. But maybe you're reading to far into it, I certainly don't mean to imply their gender had anything to do with C's design, but for better or worse they did happen to be men and were seen by by some as a "father figure" for the language.
As a variable:
returnType (*variableName)(parameterTypes) = function_name;
variableName is a pointer to a function that accepts parameterTypes and returns returnType
As a static const variable:
static returnType (* const variableName)(parameterTypes) = function_name;
variableName is a const pointer to a function that accepts parameterTypes and returns static returnType (or maybe variableName is the static thing?)
As an array:
returnType (*arrayName[])(parameterTypes) = {function_name0, ...};
arrayName is an array of pointers to functions that accept parameterTypes and return returnType
As an argument to a function:
int my_function(returnType (*argumentName)(parameterTypes));
my_function is a function that accepts an argumentName, which is a pointer to a function accepting parameterTypes and returning returnType, and returns an int
As a return value from a function:
returnType (*my_function(int, ...))(parameterTypes);
my_function is a function that accepts int and other parameters and returns a pointer to a function that accepts parameterTypes and returns returnType
As a typedef:
typedef returnType (*typeName)(parameterTypes);
a typeName is now a pointer to a function that accepts parameterTypes and returns returnType
For anyone unable or unwilling to access that domain name for work purposes or filtering purposes, the linked page lists this alternative:
http://goshdarnfunctionpointers.com/
Thanks! Unfortunately, the page currently uses Hover's "stealth redirect" which embeds the profane URL in an iframe. So, if the profane URL is actually blocked by content filtering, you probably still won't be able to access it.
I'm actively working on the page. Once it stabilizes, I'll consider mirroring a sanitized version instead of using the "stealth redirect".
Not to infringe on your free speech, but you could save yourself some time and do your potential readers a favor if you just fucking didn't swear in your domain name. :)
The easiest and best way to learn the syntax is to not memorise specific cases but the grammar itself, which IMHO is no more difficult than the existing concept of operator precedence. Everyone using C should hopefully already know that multiplication has higher precedence than addition, so likewise function call (and array subscripting) has higher precedence than pointer dereference. Thus this table should make it clear that combining the two operators creates pointer-to-function:
T x; T *y;
T f(); T (*g)();
T pointer to T
function returning T pointer to function returning T
and the alternative, T h(); , is parsed as T (h()); and thus becomes "function returning pointer to T".
The apparent struggle I see with this syntax has always somewhat puzzled me, because I don't see the same level of complaints about e.g. arithmetic expressions (like 6+3*4/(2+1)) which are parsed with precedence in much the same way. K&R even has a section on writing a parser that recognises this syntax, so I suspect it's really not that hard, but the perception spread by those who didn't learn the syntax but only memorised the "easy cases" is making it appear more difficult than it really is.
What we need realize is that simple grammar does not always lead to simple comprehension. Nesting the grammar elements more than a few levels is always difficult for our current biology equipment.
You define a variable the same way you would use it.
int *a -> expression *a has type int.
int *a[10] -> *a[_] has type int.
int (*a)[10] -> (*a)[_] has type int.
int (*a)(int, int) -> (*a)(_, _) has type int.
No need for complicated things like "spiral rules", etc.
Nice explanation, thanks; your formatting got a little messy right after the code block, though, because things between asterisks are rendered cursive.
Every time I have to deal with the declarator syntax in C or C++, I can't help but ponder what K&R were thinking when they designed this. It's not like there weren't other languages back then with a saner approach.
It looks like what they did was take the syntax from B:
auto x[10];
and generalize it such that the type name ended up before the variable name, as in Algol. But in B this worked much better, because it didn't have array types (or pointer types, or function types) - everything was a machine word. So [] in a variable declaration was just to allocate memory to which the variable would refer; the variable itself would still be a word. When they made [] part of the type, and added pointers and function types, the result was a mess.
What they were thinking can be see in K&R C - there is no "typedef" in early C. Without typedef, the syntax of C is context-independent and LALR-1. You don't have to know if a name is a type to parse the syntax. Then came "typedef", which broke parsing.
C parsing became context-dependent. To parse C with "typedef", and especially C++, you must read all the header files first.
With name-first declaration syntax (Pascal, Modula, Go, Rust), parsing is context-independent again. Readability improves. Error messages improve. Syntax-coloring editors with a single-file view don't get lost.
That's interesting -- I was wondering in which cases typedef changes the parse tree, and came across a few [1]:
a (b); /* function call or declaration */
a * b; /* multiplication or declaration */
f((a) * b); /* multiplication or deref and cast */
> With one further change, namely deleting the production typedef-name: identifier and making typedef-name a terminal symbol, this grammar is acceptable to the YACC parser-generator.
С++ takes it all the way to 11 with templates. Here's a program that is parsed differently depending on whether pointers are 32-bit or 64-bit:
template<size_t N = sizeof(void*)> struct a;
template<> struct a<4> {
enum { b };
};
template<> struct a<8> {
template<int> struct b {};
};
enum { c, d };
int main() {
a<>::b<c>d;
d;
}
Depending on which instantiation is used, the first line of main is either a variable declaration, or two operators < and > applied in sequence.
This is especially fun to deal with for C++ IDEs that support semantic highlighting (i.e. typenames are in a different color etc). If I remember correctly, the first one that could handle this right was VS 2012 - it only took 14 years after ISO C++ standard was released...
That explains how they ended up with context-dependent syntax - you're right, in the absence of typedefs, it's context-free to parse because of tags. But it can still be a pain to read, if you run into things like arrays of pointers to functions.
"Unambiguous" is not the same as "trivial". C code generally reads left to right, so a trivial syntax for declaring, say, an array of const pointers to functions that take a const pointer to a char and return a const pointer to a char, would have tokens for those things in that order.
I know the rules, and how to apply them. But they are not trivial. The fact that such documents need to be written in the first place, and the fact that utilities like cdecl exist, is a strong testimony to that.
cdecl is nice and all, but it will fail if there is any type that isn't a simple built-in C type. It's rare that I can just copy-paste a declaration into cdecl.
In C, the easiest thing to do is to just use a typedef. You get to use the same syntax as for normal function calls with no trying to remember where that blasted extra * goes, and typically end up with cleaner code anyway.
I never got used to having variables sandwiched inside a type. I know I am not supposed to suggest out-of-the-box, but why can't we add a new syntax, e.g.:
You'd have to call it something like `_Fn` instead, or you're going to conflict with existing code (the C standard reserves identifiers starting with an underscore followed by a capital letter; it also reserves identifiers starting with two underscores, but it's fairly common for code to ignore that and use identifiers like that anyway).
Another thorn in the C syntax is to allow a single statement following if, for, while, etc.. How many pitfalls, walk-around have we struggled with because of it? Just put the braces into the statement syntax please.