Hacker News new | past | comments | ask | show | jobs | submit login
Basics Of Function Pointers In C (denniskubes.com)
160 points by giis on Aug 1, 2013 | hide | past | favorite | 57 comments



The article is missing at least one useful thing: how to declare a typedef for a function pointer. This can be used both to avoid error-prone duplication of declarations and to simplify excessively complex declarations.

Here's a simple example:

http://en.wikipedia.org/wiki/Typedef#Using_typedef_with_func...

Here's a more complex example:

http://www.devx.com/tips/Tip/13829

Here's a tutorial on how to interpret the complex declarations like the one at the top of the previous example:

http://www.codeproject.com/Articles/7042/How-to-interpret-co...

Here's a discussion about using typedefs for function pointers on StackOverflow:

http://stackoverflow.com/questions/1591361/understanding-typ...

See also: The Linux command "cdecl":

http://linux.die.net/man/1/cdecl

Plus, there's an on-line version:

http://www.lemoda.net/c/cdecl


The most useful education I got on reading function pointers is the spiral rule:

http://c-faq.com/decl/spiral.anderson.html

(I accidentally deleted my reply)


It's also good to note that these links for further reading are applicable for blocks in Objective-C, which use the same syntax for declaration, only replacing the asterisk '*' with a caret '^'.


There's another interesting discussion here:

http://stackoverflow.com/questions/840501/how-do-function-po...


Good examples and great 'blow by blow' breakdown. This is a really tough concept to learn, and the C syntax gets a little complicated. This was well laid out, readable, and had a good progression of concepts. Well done!


This was one of the toughest things for me to learn in C. I think I had to learn it 3 times before it stuck. As more schools depart from teaching C, it's worthwhile to have simple explanations.


But why that whole mess with function pointers? Where do they have the key advantage compared to directly calling the function?


Let's look at the signature of qsort() from libc:

   void qsort(void *base, size_t nmemb, size_t size,
           int (*compar)(const void *, const void *));
qsort is a function inside of libc.so. It's already been compiled. It doesn't know about the "compar" pointer you're going to pass it. That function might not even exist yet.

qsort doesn't know if you're sorting integers, sorting strings, sorting struct foo which is still being ironed out. It knows how to sort arrays. You give it the size of each element, the number of elements, and a function pointer that it will call to compare elements. It will call you back and you can stick your data-structure-specific knowledge into that function.

(This turns out to be a lousy way to sort arrays, by the way. If you used a C++ template instead, the comparison function could be inlined directly in the sort algorithm, which gives you better object code. But you lose the possibility that the caller and callee exist in different modules, for example.)


> qsort is a function inside of libc.so. It's already been compiled.

Unfortunately, this is also why it is so damn slow. Try comparing C++ std::sort with C qsort, the performance gap is HUGE. The reason is that the "function pointer" std::sort gets inlined but qsort will actually invoke a function call via function pointer.

If you'd move qsort to a header file as an inline function, the performance problem would go away.


This is one thing I don't understand about HN sometimes. I always get replies that seen to angrily "disagree". But I've already made your point in my parenthetical remark.

There are of course other times where the function pointer overhead doesn't matter as much and the modularity of not having things as templates is desirable. Sorting is just a bad example.


thanks, that helped me out <3


Function pointers are useful anywhere you don't know what function you're going to call until runtime (and yet more useful still when the number of functions you could be calling is unlimited or extensible.)

Sure, you can get away with, say, implementing a parser with a big switch statement that calls various parse_this() or parse_that() functions, depending on what kind of token you hit. On the other hand, you could instead keep a hash table with token-type keys and function-pointer values. Then instead of O(n) cmp/jnz operations (as a switch statement boils down to), you get O(1) for the hash function, and no processor branch-prediction penalty.

Where it really gains advantage is callbacks: where you write a function for a library (low-level code; mechanism) which can call a user-provided function (high-level code; policy) to do something. The low-level isn't supposed to be aware of the high-level; you shouldn't have to recompile the library every time you have a new callback to pass to it. So function pointers are ideal for this. You call the function, pass it the address of your function, and then it jumps to it to get some answer whenever it needs it.

Think on this for a little bit, and you might be able to invent the traditional higher-order functions: things like map, filter, and fold, which let you do things (adding one to each element; making a new list with just the elements that are even; summing the elements together) by just specifying the policy of what you want to do with each item, without worrying about the mechanism of how it's iterating through the list, keeping temporary values, etc.

Thinking further, you might realize that if you can read a file into executable memory at runtime using some OS function, then you could put native machine code in that file, along with a table specifying function names and the offsets where those functions start in the file. You could read this table in, add the pointer representing where the file starts in memory, to the offset from the table representing where the machine-code is in the file... and bam, you've got a function pointer! Now you can call it. That's a plugin system! (These files containing symbol tables and machine code are called "Shared Object-code" or ".so" files--or Dynamically-Linked Libraries (".dll"s) in Windows--and the function to load them is dlopen(). You don't have to do the math yourself; it's abstracted away by the mechanism--but that's what's really going on there. It's just a plain-old file, mapped into memory, and then you get an address into that memory and call it.)

I would say that the ultimate use of function pointers, though, is a tracing JIT (the JIT being short for "Just In Time".)

In some languages (themselves implemented on top of C), the language's compiler is built into the language's runtime; you can take code specified in source form, or as bytecode, and pass it as a regular old string to a function, which will turn it into native machine code, load it into (an executable range of) memory, and then return you a function pointer to it.

Given this, the language can be based around an interpreter that normally just reads source- or byte-code directly. Interpreters are great, for the most part--they're easy to program; shorter and cleaner than compilers in source form; they frequently fit entirely in processor-cache in compiled form, whereas globs of native code don't; and they don't tend to branch much. If you've never implemented one, they tend to just be a short little "read instruction, parse it into an op and arguments, look up what function that op translates to (via the hash-table-of-function-pointers method above), pass that function the arguments, repeat" loop.

The only thing is, if you have to do that over and over for the same 1000 instructions, the "read, interpret, jump, return" loop does add overhead--especially since each of these operations has to execute separately and serially, where native instructions might get optimized together into a single clever instruction like x86's "lea" (fused multiply-add followed by load-from-memory, all in a single CPU cycle!)

So, if you want that performance boost for your tight inner-loop code, you could write a separately-compiled module in a lower-level language, and hook it into your code through the C ABI (as above--plugin system = dlopen = more function pointers!) But that's a big jump in complexity. Instead, if you just hook a JIT to your interpreter, the result is only slightly more complex than a plain interpreter.

A JITing interpreter does the same loop, but it also keeps statistics on how often it executes each piece of your code. When it notices that some piece of code is getting interpreted over and over a lot, it'll pass it into the JIT; get back a function pointer to real, native, optimized code; and then patch its in-memory representation of the code you fed it, so instead of saying "A, B, [C, D, E], F", where [C, D, E] is the block that's getting executed all the time, it'll say "A, B, jump to [this native address] and run what's there, F".

That single instruction will transfer control from the interpreter to the code compiled by the JIT, through the function pointer. When the compiled function is done, control comes back, and the interpreter steps to the next instruction like normal. Still easy, still simple, but way faster where and when you need it. Neat, huh?


With the help of half an litre Monster Energy, I even managed to understand most of what you explained! (I am an no skilled script-kiddo, that's trying to increase...)

Good explanation and thanks for the work done, that shall help me a lot later on!


This is the kind of stuff I come to HN for. Thank you for helping me finally realize what .dll's really are.


This reminds me of how ECL (Embedded Common Lisp) loads lisp modules; it translates them to C, compiles them to PIC with a C compiler, and dynamically loads the resulting .o file.


Wow, really helpful to give some motivation for importance.


thanks!


One great example of function pointers is in C++ when implementing inheritance. Say you have an array of Polygons, and each one has an Area() method. It turns out that some of them are Triangles, some are Squares, and some are Hexagons. polygon[i]->Area() calls a different function for each of these, and this is implemented by storing function pointers in "vtables" for each of the Polygon objects. In C++ this happens behind the scenes (you don't program this directly), but in C (or C++ for that matter if you needed to), you can get a very similar effect by storing function pointers in structs. The linux kernel makes extensive use of this programming style in driver code.


You use a function pointer when you don't know at compile time what function you are going to call at runtime. An example is plugins: you load a function from a dll (or shared object) and you get a function pointer.

When dealing with native code, functions are just piles of instructions in memory (with executable bit set in memory protection) that comes from the "text" section of your binary file. The code is stored in memory and therefore has an address, through which they can be reached.


If you ever seen some of the C code when hacking the Sony PSP slims for the first time; particulary Davee & Bubbletune with "ChickHEN" stuff, knowing how function pointers work is required[1].

1. http://forums.exophase.com/threads/tiff-exploit-hen-informat...

Disclaimer(?): The TRFyuki here, is me. :)


If you need to determine which function to call dynamically, in an extensible way. Take a signal handler or other callback. There are more sophisticated uses, but that seems a clear and straightforward example.


Psha, try pointer-to-member-functions in C++.

I wrote a fluent reflection/serialization engine a while back which made heavy use of them.

https://github.com/sparecycles/reflect/blob/master/include/r...


This is C, not C++.


A good basic article.

An advanced article would talk about arrays of function pointers, pointers to function pointers, returning function pointers and so on. It can get pretty crazy looking.


Nice summary. However, the comment about first example is slightly inaccurate. Saying the `sayHello` function does not take arguments is wrong. It does, it just does not care about them. You could still call it with arguments like sayHello("world", 15) and there would be no compiler warning nor error.

If you want to declare a function with no arguments, just do it as `sayHello(void)`. Then, the previous example would result in compile-time error.


Updated with that info from the comments already


Function pointers are a great tool for untangling balls of code and level-izing different components based upon dependencies in a large project. Rather than having to wait until all code is logically refactored, you can poke function pointers to higher-level functionality down into lower-level libraries to remove the physical link-time dependency and/or circular dependency loops.

There are other tricks you can use if your higher-level code is written in C++. You can take advantage of static initializers to do your function pointer poking for you instead of having to manually do it all in main() or some other point of initialization, potentially creating other physical dependencies. (As long as no other static initializers depend on the function pointers being in place, because order of execution is not guaranteed.) e.g.:

    extern "C" {
    static void higher_level_function_pointer(void) { ... }
    }

    namespace {
      class CallbackLoader {
        CallbackLoader() {
          lower_level_library_set_callback(&higher_level_function_pointer);
        }
      };
      static const CallbackLoader loader;
    }


Understanding function pointers in C unlocks the ability to write clean, object-oriented code with inheritance (kinda, sorta, shhhh).

With great power, etc. etc.


I committed what I now realize were true crimes against software engineering when I first learned of function pointers.


They can't be any worse than abusing the C preprocessor to implement templates for C code (a sin I committed in college).


Sometimes it is a useful sin, many data structure libraries for C use this for:

1) genericity 2) (extreme) speed

It's (unfortunately) impossible for C to equal C++ templates in this: write once, use for multiple datatypes (without macros).

For people who like copy pasting, generic macros is on the horizon for C11, which is rapidly being implemented (and mostly works) for Clang and GCC, the most important compilers. Read more about it here: http://www.robertgamble.net/2012/01/c11-generic-selections.h...

Simply put, it allows to create a small "shim"-macro that detects the type of the input parameters and redirect to an actual function based on type.


What's wrong with that? It allows you to write type-safe sort-of-generic data structures. It's certainly better than 'generic' data structures that work on void*'s.


> Understanding function pointers in C unlocks the ability to write clean, object-oriented code with inheritance (kinda, sorta, shhhh).

People often say this in regards to pointers, or something similar like in the article too, "When understood, function pointers become a powerful tool in the C toolbox.", but often don't explain how/why. In the article the author says that at some indefinite point of time in future they may write about that.

Do you, or anyone, have a link to somewhere not simply explaining the technical side of pointers, but their usage in idealistic and primarily real world examples? Bonus points if it includes indirection, function pointers and other things.

Update: User derefr gave a seemingly great answer below to a similar question like mine.


>> Do you, or anyone, have a link to somewhere not simply explaining the technical side of pointers, but their usage in idealistic and primarily real world examples?

I often use them for parsing json on the fly in an embedded environment with limited ram. Suppose you have a runtime with 4K RAM of which 2K is available, but in comes a string that requires much more space. You can't store the string in RAM, validate it and then use the variables. What I do is parse on the fly and store the variables along with function pointers that need to process them. Once the string has ended and you're sure it's valid and checksummed you process the variables with their function pointers. This technique stretches the length of the message you're able to process, especially for stuff like 32-bit floats that have a larger string than binary representation.


>> "When understood, function pointers become a powerful tool in the C toolbox."

The callback pattern.

I have no idea if I just made that up or if it's a term in use, but it's where I most often find them useful. For instance the interface to libpcap. You can pass libpcap a function to call when it sees network traffic. Without this you would need some sort of polling in your own code. I've seen this in use in a variety of event-driven frameworks written in C.

The other major one I've implemented in the past is thread-pooling with job queues. You can make a queue of jobs that way. Each job is a struct containing a function pointer and a pointer to a struct of arguments. When a thread becomes idle it pulls the job off the queue and calls the function with the given args.

There are probably more. Yes you can kind-of mangle OO out of them, but I prefer the other uses.


Elaborating on the OO bit, as derefr did an excellent job of talking about the usefulness of function pointers in a VM.

So, let's say I have a structure which I will use to represent objects in my game:

  typedef void (*thinkfunc_t)(void* self, unsigned int dt);

  typedef struct BaseFoo {
    float x,y,z;
    thinkfunc_t doThink;
  } BaseFoo;

  void null_think(void* self, unsigned int dt) { return 0; } /* empty think function */

  BaseFoo* Foo_new() {
    BaseFoo* ret;
    ret = calloc(1,sizeof(BaseFoo));
    ret->doThink = null_think;
    return ret;
  }

  BaseFoo* Foo_clone( BaseFoo* foo) {
    BaseFoo* ret;
    ret = Foo_new();
    memmove( ret, foo, sizeof(BaseFoo) );
    return ret;
  }
So, if I want to create a new BaseFoo, I just call Foo_new() and I'm off to the races; I get back a Foo object that, during my update loop, I can fiddle with:

  /* update loop snipped for brevity */
  BaseFoo* currentFoo;
  currentFoo = get_my_foo_from_my_big_foo_list();
  currentFoo->doThink(currentFoo, getFrameTimeInMilliseconds());
So, that's kind of cool, but what if I want to create a new, better foo, that actually moves? I do this:

  void move_think(void* self, unsigned int dt) {
    BaseFoo* foo = (BaseFoo*) self;
    foo->x += dt;
    foo->y += dt;
    foo->z += dt;
    return 0;
  }

  typedef BaseFoo MovingFoo;

  MovingFoo* MovingFoo_new() {
    MovingFoo* ret;
    ret = Foo_new();
    ret->doThink = move_think;
    return ret;
  }
So, this is very bland, but it gets across the idea that function pointers let me easily override member methods on class instances. You can imagine that, with a little cleverness, you can store arbitrary data into the struct, along with a table of function pointers to instance methods (a virtual function table).

This is a very fun, very deep rabbit hole.

In a lot of ways, you basically fake the type of prototypical inheritance you'd expect from, say, JavaScript.

EDIT:

Elaborating on another point--it's very common to use function pointers in a struct that then gets filled out by a dll or shared library.

For example, let's say that I've got a renderer that draws 3D stuff and conforms to an API. At runtime, I check what driver the user wants to use (software renderer, Direct3D, or OpenGL), and then I fill out the function pointers of that structure to point at the architecture and driver-specific methods I'd like to use.

The rest of my code doesn't change, because it only ever calls the interface exposed by the structure--the actual addresses pointed to by the function pointers are irrelevant.


Inheritance can be implemented as well:

    typedef struct AdvancedFoo {
      BaseFoo super;
      int a, b, c;
      thinkfunc_t thinkHarder;
    } AdvancedFoo;
Since `super` is the first member in the struct, a pointer to an AdvancedFoo can be used as a pointer to BaseFoo as well:

    AdvancedFoo *advFoo = AdvancedFoo_new();
    advFoo->thinkHarder(advFoo, advFoo->a);

    ((BaseFoo*) advFoo)->doThink(advFoo, some_int_method());
Of course, this inheritance pattern can continue in multiple steps.


I don't see the need to do the "ret->doThink = null_think;" since you always pass ret to the function. Just call null_think( ret... )


Ah, but what if the object is handed over to some other deep end of the code? We want to bake the null behavior into it so that any Foo knows how to think itself--calling code just needs to be aware that any Foo object will always have a valid think callback defined.

This is basically the null-object pattern.


You mean as in

    func(foot);
    /* Should check func != gun.shoot */


Very well written article, clearly explaining what is usually a rather complex topic for new programmers to understand, bookmarked for future use.

The other resource I use for similar situations is http://c-faq.com/decl/spiral.anderson.html [EDIT]: someone beat me to it

FWIW: The explanation by Dave G in the comments isn't quite right (I couldn't reply there, comments appear closed):

> void something(); // prototype of function that takes undetermined number of arguments

> void something(void); // prototype of function that takes no arguments

> void something() { return; } // function that takes no arguments

isn't quite right as `void something(){ return; }` is still a function that takes an undetermined number of arguments, it just cannot access them; calling `something(1,2,3)` is still valid.

> void unknown(){ return ;}

> int main(void){ something(1,2,3);}

is valid.


The C syntax really makes this simple feature look terribly complicated.

It's a while since I watched them, but in the SICP video lectures, as I recall, the notion of storing functions in variables - as introduced in, like, lecture 1 or 2 - took about 0.1% of the time - if indeed that. And that's because while the principles are actually more complicated than the C example (due to the lexically-scoped variable capture), the syntax is simplicity itself.

Consider the issue of making a variable, which goes something like this:

    (let ((x 1))
      (set! x (1+ x)))
You have your "let", then the list of names and values, and then the stuff that makes use of them.

Now consider the issue of making a function:

    (let ((x (lambda ()
               (message "stuff"))))
      (x))
(Well... maybe they called it define. I don't remember.)

The parallels are obvious, and indeed I suppose they deliberately chose this notation to emphasise the similarity between one type of value (e.g., an int) and any other (e.g., a closure). And so once you've got the hang of making a function that takes one type of value (e.g., an int), it's a small step from there to making a function that takes another type of value (e.g., a closure).

This is a dig at C and its ilk rather than the article. It's ridiculous that such a fundamental, basic primitive is given such a baroque syntax. But then C strikes me as designed for people who already know how to program so maybe the assumption is that most users will already know what function pointers are when they come to it, probably from being familiar with indirect jumps in assembly language.

This post is apropos of very little, I guess. I was just moved to comment after seeing six pages of stuff that still never got to the meat of the matter. So I suppose I should finish it like this: "</rant>"


I think the only complicated-looking thing in C is the syntax for declaring function pointer types. If you use typedefs you end up with

  typedef int (*math_op)(int, int);

  math_op ops[] = {
    add,
    subtract
  };

  int a = ops[0](3, 5);
  int b = ops[1](9, 2);
... maybe I've just used them too much, but I find them quite simple (except, as mentioned, setting up the initial typedefs which is a bit "odd").

One of my favourite uses of function pointers is for key-mappings, like in: https://github.com/muennich/sxiv/blob/master/config.def.h


Now it just needs the matching tutorial on initializing pointers in the data section :-)


Could you please elaborate?


(fitting user name, sir or madam!)

I'm guessing he's referring to declaring a global structure or collection of function pointers, and then using #ifdef's and compiler flags to actually assign those function pointers to things that make optimal sense for the architecture.

Or, at runtime (but hopefully before usage!) setting those global function pointers to point at routines optimal for the situation the program is running under--say, setting them to point at an optimized SSE3 vector routine if the CPU supports it.

Maybe that's it?


Just the other day I was looking at some embedded SoC which had some data structures for setting up various GPIOs and pins and such, it was something like:

   struct gpio_config {
       uint8_t func;
       uint8_t options;
       uint32_t *loc;
   };

   struct gpio_alloc {
      struct gpio_config pins[4];
      struct gpio_alloc *next;
   };

   struct gpio_alloc fsmc_1 = {
              ...
   };
All the pre-initialization stuff (which was being stored in flash so it was really read-only data) was code I realized not a lot of people wrote in C these days.


Thanks for the link. I will pass this onto other developers I know who have trouble understanding this topic.

I recently read this book and it's the best book I've ever read about pointers in C:

Understanding and Using C Pointers Core techniques for memory management By Richard Reese Publisher: O'Reilly Media Released: May 2013 Pages: 226

http://shop.oreilly.com/product/0636920028000.do


I remember fondly the time when I learned to use C function pointers. They really stepped up my code design :-)


For decoding types, particularly with function pointers, I recommend learning the right left rule. http://ieng9.ucsd.edu/~cs30x/rt_lt.rule.html


I've always liked the presentation of the "spiral rule" at this page: http://www.unixwiz.net/techtips/reading-cdecl.html


On a related note, a teaching aid I wrote out for my classmates in high school: http://artagnon.github.io/authored/cs-pointers.pdf


If the name of a function is already the memory address of the function, what extra information does the pointer store? Couldn't you just assign the name of the function to, say, an int?


You can coerce it to an int with a cast, with some caveats: It is implementation defined whether or not the size of an int is sufficient to store a pointer, and the type lets the compiler know what you want to happen when you operate one it.

If "foo" is an integer variable, then "foo + 1" returns the integer value of foo + 1, but if "foo" is a pointer, "foo + 1" returns the value of foo + the size of a pointer, for example.

So the pointer does not store any extra information, but the compiler stores and makes use of extra information about the pointer variable (not about the value itself) during compilation.


That makes a lot of sense, thanks!


You can, but it makes more sense to assign it to a void \*. After all, it's a pointer but you don't know the size of the pointed object.

Note: this is not true everywhere since (data) pointers and function pointers can be represented differently, but on a PC it works :)


This is a simple explanation of the basics of pointers to function.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: