since the variable type info is already statically available. That saves some typing, and (more importantly) blocks against bugs from changing one but not the other. I've been meaning to look it up in H&S to make sure it's always safe, but the guy who showed it to me is so strict about safe/standard C that it's likely.
Most of my favorite tricks actually involve the preprocessor, though. I know it's significantly less expressive than the macro systems in Lisp, Scheme, or OCaml, but C would be a very different language without it, and tasteful CPP usage can ease many of C's pain points.
(My other other favorite C programming trick is knowing Lua, which is excellent for scripting C. :) )
(My other other favorite C programming trick is knowing Lua, which is excellent for scripting C. :) )
I especially love LuaJIT, whose ffi makes it even easier to interface with C than standard Lua does (and its awesomely fast too). Nevermind that Lua is just nice to work in anyway :)
The best book for Lua by a longshot is Roberto Ierusalimschy's _Programming in Lua_, second ed. (http://www.inf.puc-rio.br/~roberto/pil2/) It covers the core language and the C API with the same clear, erudite treatment as K&R. (He is one of the core Lua authors.)
Lua is transitioning from 5.1 to 5.2 right now, which introduces some changes (improvements to the GC, adding to the standard libraries, and improving the package/module system). The main language and C API haven't changed significantly from 5.1; you should be fine if you learn 5.1 now and update later. Lua is small enough that you could add 5.1 to your projects as a library dependency and maintain it yourself, though - it's only about 16,000 lines of code.
Variables are in scope in their own initializers. This is fun when you inadvertently write something like:
int length = ...;
...more code, lose your concentration...
if(x) {
int length = length / 2 + 1;
...
}
This neither produces an error nor does what you'd expect, but just ends up being a creative way to initialize the inner length variable with garbage. I've done this more than I care to admit.
Definitely. Know your warning options! Also, try using multiple compilers. tcc compiles very quickly, clang often has better error messages, etc.
Speaking of variable shadowing: It's usually worth wrapping any preprocessor macros in a "do { ... } while (0)" block unless you deliberately want variable definitions to escape (in which case, token pasting a suffix is usually a good idea).
That is a gcc extension. The construct as a whole has the value of the last statement executed within. Usually an inline function is preferable since it achieves the same thing using only standard syntax.
Would be nice if there was one which would warn for this specific case but not shadowing in general, which I occasionally like. (This probably makes me a bad person.)
It certainly doesn't make you a bad person. If lexical scoping wasn't the intent of the language authors, it wouldn't exist. I tend to agree that the warning is more of a hassle than it's worth. Though I guess it can be argued that if your scoping is so deep that you actually need to reuse names, that perhaps some refactoring should be in order.
Seeing a variable shadow something in an outer scope makes me cringe, simply because the decrease in readability far outweighs any benefit it could give. You might remember how things are, but the poor sap who has to keep extra scoping depths in his mind just to maintain your code will curse your name every day.
If a type name is used, it always needs to be enclosed in parentheses, whereas variable names and expressions can be specified with or without parentheses.
Similarly with NEW0() and ecalloc(). Yes, the macro uses the parameter more than once, and yes it's another macro for the reader to grasp, but it's a simple one and NEW() is used so widely it's soon learnt.
The "extern inline" idiom for forcing inlining, which I picked up from Mike Stolarchuk.
Passing and assigning trivial structures by value instead of fiddley pointers.
Arena allocators.
Not so much a trick, but: you can safely free() NULL, which saves a conditional. In the same vein: not only is there no point to casting malloc()'s return value, but there are (admittedly rare) circumstances where doing so can be harmful. So save yourself the typing.
In your first example you must take care not to run afoul of alignment or aliasing rules. This is not always easy.
Using "extern inline" is a bad idea since no two compilers implement it the same way and none according to spec (that I know of). There is no standard way to force inlining of a function.
You're right, if you do the 4-byte lookup trick on a SPARC you'll get a SIGBUS if you do it at arbitrary offsets. It's not always easy to ensure your comparison target is e.g. at the start of your read buffer, or in its own array on the stack, or in the return value from strdup(). It's definitely a "trick", not a proper form.
I also wouldn't do "extern inline" on a random C compiler; it works on clang, gcc, and Sun's compiler, though.
You probably know this already, but if you pass -std=c99 to gcc, you have to have a separate definition for your inlined functions. Here's how I do it; I welcome suggestions for improvement:
А compiler will generate memory-fetching code once, optimize right side of comparisons into constants and make everything flow fast and safe. You can even use switch statement if you like.
It works fine on SPARC too; you just have to know where the data is coming from.
You're right to point out that people should be cautious about this code. I'm just listing my "favorite tricks". I'm not recommending that people use integer casts as their go-to string comparison.
The string literal is (all but -- honestly I'm not sure what the standard says here) guaranteed to be aligned naturally for the platform. The thing on the left hand side is not. If it's a pointer to a heap block, you're fine. If it's a pointer to a token parsed out of string input, it can be anything.
Even architectures that support misaligned accesses can be configured to trap on them and generate unexpected fatal signals.
Sorry, I wasn't talking about the string literal; I was talking about the buffer you're comparing it to. I'm saying: yes, you're right, you have to be careful that it's aligned.
With possibly-inlined, SSE-optimized strcmp variants that compare 8 or 16 bytes of a string at a time (at least according to Valgrind), how much speed is gained these days by casting and using the standard integer comparison instructions?
Although if you want your C to compile with a C++ compiler, you will need to cast the result of malloc(). C++ is more strongly-typed than C, and the compiler will complain.
You're right, that was an oversight. This is one of the more frustrating decisions in C++; it defies the semantics of a "void pointer", breaks existing code, and provides no real safety (the thing we're enforcing "type safety" over being a simple register-width integer used to express any type in the whole system).
But you do sometimes want to cut-paste code from .c files into .cpp files, and this idiom will make your compiler yell.
There are much more subtle differences that will wreak total havoc without the compiler emitting a single squeak. For example, compare this compiled as C or C++:
#include <stdio.h>
int foo;
int main(void)
{
struct foo {
int a, b;
} x;
printf("%zd\n", sizeof(foo));
return 0;
}
One should always write code in the best possible way for the language actually in use, even if this is invalid in some other language with superficially similar syntax. In converting code from C to C++, adding a few pointer casts will be the least of your worries.
One way implicit casts for void * are pretty big safety harness actually. You can still interpret any random pointer as "a register width thingamajig". Interpreting any random void * as a valid pointer to a particular type is obviously dangerous.
I like putting the constants on the left side of the comparison... cuts down on the 'missing one equals sign' errors. This can be a readability issue though.
GCC will warn about using an assignment as a truth value in if(), while(), etc., so it's probably safe enough to stick with whatever order you're familiar with and let GCC alert you if you miss an equals sign.
That 4 byte lookup's useful. I needed to do it recently and had assumed it wasn't possible.
If memory serves, CodeWarrior for Mac (and possibly other classic Mac compilers) had syntactic sugar mapping single-quoted four-character strings to the ubiquitous OSType. Something like:
In addition to breaking strict aliasing and alignment as other have pointed out, your first example also assumes little endian and won't work on big endian architectures.
Just don't do it unless it's a quick hack that you absolutely know you'll rewrite correctly within the day, before committing. And even then, write it correctly the first time around.
Using the ternary operator to return a function pointer, that is then immediately called, where the functions in question are stubs to system calls and have identical declarations.
As in, specifically:
if ((Lflag ? chown : lchown)(p->fts_accpath, s->st_uid, -1))
(void)printf(" not modified: %s\n",
strerror(errno));
Philosophically, it's due to a famous system programmer's dictum: "Always check the return value of system calls".
The corollary of this is that one should clearly document when explicitly ignoring the return value. A simple way of this in C becomes a cast to void of the return. Since printf does I/O, it qualifies.
Specifically, this code is from a patch to the FreeBSD source which is ruled by style(9); you will find this form throughout BSD source.
That page says it dates from the late seventies; I was still using it mid-nineties. I don't remember the last time that I linted but today Ubuntu is lint-unaware. These days the compiler will pick up most of the things that lint used to.
C is full of surprises, and generally my favourite trick is trick du jour. While it's not strictly for C, Hacker's Delight[1] is my favourite collection of bit twiddling tricks. A highly recommended read if only for the intellectual value.
I've done something similar to your vsnprintf()-based code by using snprintf() into an undersized buffer, realloc()ing the buffer to the returned value, then calling snprintf() again. This works well for reused buffers that have a length associated with them and may need to grow over time, but clearly vsnprintf() could do the same thing.
Oh, I wouldn't use asprintf in any performance-critical code. But I wouldn't be using character strings in any performance-critical code either, so that issue doesn't arise for me.
It's a shame they didn't offer msprintf(), which would use malloc(), and asprintf(), which would use alloca(), for cases where you only want a temporary string without heap usage/fragmentation overhead.
You can't use alloca from inside a library function, since it would allocate within the library function's stack frame and the allocation would no longer be valid when the function returned. (Or rather, you can use alloca within library functions, but you can't return a pointer to that allocation, so it wouldn't be useful here.)
Theoretically you could define an alloca()ed-pointer-returning Xsprintf as a macro, though... (but ask tptacek notes, it's probably a bad idea).
Something like this (C99; given p, stack-allocates p_sasprintf_buf of size at most 16 if the string would fit, and uses asprintf otherwise):
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#define SASPRINTF_MAXLEN 16
#define SASPRINTF_MERGE(a, b) a ## b
#define SASPRINTF_LEN(p) SASPRINTF_MERGE(p, _sasprintf_len)
#define SASPRINTF_BUF(p) SASPRINTF_MERGE(p, _sasprintf_buf)
#define SASPRINTF(p, fmt, ...) \
size_t SASPRINTF_LEN(p) = snprintf(NULL, 0, (fmt), __VA_ARGS__); \
char SASPRINTF_BUF(p)[SASPRINTF_LEN(p) <= SASPRINTF_MAXLEN ? SASPRINTF_LEN(p) + 1 : 0]; \
if (SASPRINTF_LEN(p) <= SASPRINTF_MAXLEN) { \
snprintf(SASPRINTF_BUF(p), SASPRINTF_LEN(p) + 1, (fmt), __VA_ARGS__); \
p = SASPRINTF_BUF(p); \
} else { \
if (asprintf(&p, (fmt), __VA_ARGS__) == -1) \
err(1, "SASPRINTF_L at %s, %d", __FILE__, __LINE__); \
}
#define SASPRINTF_FREE(p) do { \
if (p != SASPRINTF_BUF(p)) \
free(p); \
} while(0)
/* Test harness */
int main(void);
int main(void) {
char *p, *p2;
SASPRINTF(p, "%s", "foo");
SASPRINTF(p2, "%s", "Really long string, really.");
printf("%s\n%s\n", p, p2);
SASPRINTF_FREE(p);
SASPRINTF_FREE(p2);
exit(EXIT_SUCCESS);
}
I was going to say "...but you have to be pretty insane to do this", but I haven't managed to get incorrect-but-compiling code out of the above macros. Of course, I'm not at all convinced that it's faster than asprintf... (even after the obvious optimizations.)
This is not valid C99. In C99, arrays must have at least one element. Also, what is err.h? Whatever it is, it is not C99.
As for speed, if asprintf() does something clever to avoid rendering the string twice, this is actually slower unless snprintf() is faster than a malloc() call (unlikely). Furthermore, some compilers implement variable-length arrays with malloc() so for these, this is definitely not an improvement.
Beyond the speed of this particular call, using variable-length arrays can have a performance hit in general since gcc is unable to inline functions using them.
Any time I have ever, ever used a gcc extension in my own code, I have always been sorry for having done so later --- usually more than a year later, at a moment when I don't really have time budgeted for being sorry about binding my code directly to gcc.
(I see a difference between literally using GCC-dialect C and relying on dubious C constructs that happen to work well on C; for instance, I've never been bitten by "extern inline").
I agree your code should not depend on GCC, but stuff like __attribute__((nonnull(1), format(printf, 3, 4))) - "the first argument may not be NULL, the third argument is a printf()-style format string with arguments starting after the fourth element" - can produce some useful warnings and can be trivially disabled on non-GCC compilers. Some of the most useful extensions don't lock you into GCC.
(Now, if only Microsoft would get off their asses and make their compiler C99-compliant, we could all write much nicer code.)
It's a very useful gcc extension. Use if if you're building with gcc. Standards adherence is all well and good, but there are many problem areas where you can rely on a single compiler (surely others support it too: intel and clang seem like likely suspects).
There are standards issues and there are standard issues.
In reality, you're probably never going to see a 1's complement machine.
Your likelihood of needing to use a piece of code under a compiler other than GCC is deceptively high.
Meanwhile, the extensions we tend to think about when we think of GCC aren't subtle things like "can you use // comments in C code". They're constructs that require many additional lines of code to replace. It is a giant pain when you find them, later on, when you need to compile something under Visual C or SunWorks. Your fix to comment syntax isn't going to break code at runtime, but your fix for the missing ellipsis operator definitely can!
From bitter experience, I think 'mansr is right on this, and it's worth making an effort not to let GCC extensions creep into your code.
But like I said, it depends on what the code is. If it's a kernel module, then you'd be silly not to use the gcc extension. Likewise if it's a platform-locked linux or mac thing (Android middleware, maybe). You work in security, where you're expected to port stuff between platforms regularly. Not everyone does. And those extensions are there because someone likes them.
> If it's a kernel module, then you'd be silly not to use the gcc extension.
Sure, if you're not interested in anyone porting that code to another OS. This happens all the time with drivers -- just because the interface is OS-specific doesn't mean all the code is.
> And those extensions are there because someone likes them.
My favourite C trick is -Dcontinue=break in the Makefile.
Sacrasm aside, I like (( x > 0 )&&( (x&(x-1)) == 0 )) trick to test if x is a power of two. But all arithmetic tricks like this need to be commented in detail, used rarely, and properly documented.
Edit: I've also used "-Dfor=if(0);else for" to make some ancient C++ compiler obey C++0x scoping rules for variables declared in initializer list of 'for' statement.
A macro for calculating the size of a buffer for a C-style string containing a (signed/unsigned) integer:
#define CS_INT_SIZE(int_type) ((size_t)(0.30103 * sizeof(int_type) * 8) + 2 + 1)
int x = -1000;
char buf[CS_INT_SIZE(x)]; // instead of char buf[100];
snprintf(buf, sizeof(buf), "%d", x);
Another macro for calculating the length of a string literal in compile time (just like strlen would do). Note the extra check for string literal.
#define CSLLEN(s) (sizeof(s "") - 1)
int len = CSLLEN("hello"); // len == 5 here
Another one for logging variable arguments or no arguments at all.
#define Log_Trace_(...) Log(__FILE__, __LINE__, __func__, LOG_TYPE_TRACE, "" __VA_ARGS__)
void Log(const char* file, int line, const char* func, int type, const char* fmt, ...);
Log_Trace(); // prints just the name of the file, line and func
Log_Trace("Error %d", error); // prints the same as above and the error number
Your logging example is invalid. A variadic macro must be invoked with at least one argument matching the ellipsis. GCC is forgiving here, but some other compilers are not.
There are slight variants; like for sharing the cleanup code (e.g. "finally"), but that's the essence of it. I'm sure I've typoed something above, nit-pickers beware :-)
CHECK_MALLOC is an anti-pattern (do you have CHECK_FOO for any given FOO which might behind the scenes allocate?).
I have a litany of reasons why you shouldn't bother checking malloc returns, and instead invest a little effort in making sure your platform malloc is configured to abort instead of returning NULL. The simplest and most compelling of those reasons is that it's easier and cleaner.
I'm also not a fan of a macro that introduces an implicit dependency on a goto label.
do { } while(0)
is a pretty convenient way of expressing single-return; you just use "break" instead of "return".
I can accept (most of) your issue(s) with the MALLOC macro, MALLOC is a bad name here from previous habit. The point was that there's code one has no control over which indicates failure by returning a NULL values, this is where it's generally useful.
I strongly disagree about the goto label. "break" is not equivalent (consider a nested for/while/switch). Plus, it's not "implicit" if it's well-known and oft-used in the code in question.
You strongly disagree about something I don't strongly disagree about. I wouldn't use goto to get single-return, nor would I wrap drastic control flow in a macro, but whatever floats your boat.
I have much stronger opinions about checking malloc. It's something people spend a lot of effort to do that actually makes their code worse.
Would your opinions on checking malloc still hold if the system in question were an embedded system that should continue operating with partial functionality even when out of memory? Also, when you say "makes their code worse," are you saying harder to read, harder to debug, less secure, slower, buggier, or all of the above?
My habit of checking malloc() also comes from my distaste for audio software that randomly displays erratic behavior when memory starts getting tight, rather than displaying an alert that an allocation failed.
I think you can always find exceptions -- such is engineering. :-)
In my example above, one of the things I also failed to clarify about that test was that it was in the context of a system that could back off and restart or the call was returning NULL for reasons other than OOM. But like I said, we rigged malloc() to blow, because malloc() is the generic purpose allocator, and you're screwed if that goes.
I think tp gives good advice here: for your typical malloc() user, you're usually screwed if malloc returns NULL, because that's your heap allocator, and you have nothing else :-)
An embedded system, generally, will have a great deal more knowledge of how to back off -- in other words, the memory allocator is something that is under much more control -- it probably isn't malloc()...
Your boat will sink if you use do while break. I know, I've tried. :-) One aspect I failed to highlight is the logging the macro does, which can come in handy.
I wish I could erase the MALLOC thing, it was an after-thought ... and now I feel like I'm leading people astray (oh well...). Even in the code that used it, it was for a existing calls that returned NULL instead of an rc. malloc() was rigged to blow in that code base. Sigh...
I mean this in a good way, wouldn't it be better to stack up all of this on stack-overflow parent link ? This would enrich their community and be more optimal for the reader here.
p.s to the guy who down-voted my previous ( now deleted ) comment. you were right. It helped me think more precisely regarding why I was trying to crack a joke.
Set the character, toggle various bits, then retrieve the character. It saves a bunch of left and right shift and or/and of values. [My C is quite rusty, so I apologize if the syntax is a bit off.]
Since other people have been nitpicky in the comments here, I thought it might be worth pointing out that this assumes an 8-bit byte, so it's not general. ;) Remember, 8 bits is just a convention.
Even with 8-bit bytes, the layout of the bit-fields is unspecified. Using char as the declared type of a bit-field is also not defined by the standard (although may be supported by an implementation).
The Texas Instruments C55x DSP has 16-bit bytes. See the compiler documentation at http://www.ti.com/litv/pdf/spru281f section 5.3 if you do not believe me. This DSP has more unusual properties:
type size (bits)
------------------------------
char 16
short 16
int 16
long 32
long long 40
float 32
double 32
pointer (data) 16 or 23
pointer (function) 24
>What's the proper scenario to use "long" instead of "int"? I've never bothered to use it.
It was necessary with 16-bit processors, because ints were 16-bit shorts, and longs were 32-bits.
With modern processors and OS', there isn't really a reason to use it. In fact, it's potentially dangerous if you're writing *nix code that's supposed to run on 64 or 32-bit systems. In that case, you don't want to use longs, because they're 32-bits on a 32-bit compile, but 64 on 64-bits on a 64-bit compiler. For Windows, int and long are interchangable 32-bit values, which is another reason to avoid using longs as much as possible when writing portable code.
The immediate reason is that the ALU is 40 bits wide. The reason it has this particular size is probably a tradeoff between computational power vs silicon area and power consumption.
I had an OCD incident few years back and did a lot of thoughtful reading of all things readable on the subject. In distilled form the sacred knowledge is this - as far as the C standard is concerned, a byte is always exactly 8 bits, and a char is AT LEAST 8 bits. In fact, there is a compiler that operates in terms of 60 bit chars and that's the one on older Cray machines.
No, a byte is not always 8 bits. byte and char are basically synonymous.
byte -- addressable unit of data storage large enough to hold any member of the
basic character set of the execution environment.
...
Note 2: A byte is composed of a contiguous sequence of bits,
the number of which is implementation-defined.
That's all very well, except that it's wrong. The last line needs to do "(buffer + mask) & ~mask" or you'll end up pointing outside the array.
You also violated the C standard in a less severe way. According to the standard, any identifier name starting with a double underscore (or underscore followed by upper-case letter) is reserved by the implementation for any use. These reserved names are frequently used in system header files, and encroaching on that namespace can easily lead to weird errors if the code is ever compiled on some other system.
Whoops, that's what I get for prettying things up after copy and pasting. Thanks for pointing that out.
As for the double underscore being reserved for implementation use, this was in fact part of the compiler (well, runtime in this case) implementation, again a side-effect of copy and paste.
The ring buffer is a nice trick, but it only works on array sizes that are a multiple of 2. The generalized version would be "array[i++ % ARRAY_SIZE] = data". If ARRAY_SIZE happens to be a multiple of two, I'm sure most compilers would optimize to a bitmask anyways..
And if it is not a power of 2, the % operation will be anything but fast. If a size not a power of 2 is strictly required, a simple increment and compare is almost certainly faster.
Signed integer overflow is undefined, so make sure you're using an unsigned type in your ringbuffer example. (And an array with some-power-of-2 elements; you should probably use sizeof(array)/sizeof(<star> array) instead of ARRAY_SIZE too.)
No. ~0 has type signed int, which on a one's complement machine is a negative zero. Converting this to an unsigned type yields zero. On a sign-magnitude machine, ~0 is -0x7fffffff (assuming 32-bit for simplicity). Conversion to unsigned 32-bit is done by adding 0x100000000, yielding 0x80000001, again not what was desired. If the variable being assigned to has a width different from that of int, things get even more interesting.
As far as I know analog devices still makes some ones-complement Analog to Digital Converters. But I don't know of any modern processor that uses them.
That will only set as many bits as are in an int, which is wrong if the type of the variable you are setting is wider than int. For a long, you could of course use ~0ul, and so on for other types, but then you'd have to find and update every such assignment if the variable were ever changed to a wider type.
Since some people are posting macros (including some intentionally malicious ones), it is handy to know the -E compiler flag for debugging purposes. This will run only the preprocessor on the given files and print the result to stdout. (N.b. -E works at least on gcc and clang; the flag may be different elsewhere.)
Not really a C language trick, but useful nonetheless.
I think of a trick as something only you or only few people know about regardless of how much it gets used. A technique is something that is more widely in use by people.
Even though it's one of my favourites, modern high performance hardware with branch prediction, out-of-order execution, and multiple issue generally makes this trick rather unnecessary. Combine that with the cleverer compilers we have these days and you get pretty much the same performance out of the simple copy.
Even worse is that modern x86 (and probably other) CPUs also have instructions for 16-byte vector registers that can be used to copy or compare data much faster than 1 byte per cycle. Recent versions of glibc use some linker magic to pick the optimal code to use for strcmp, memcpy and friends based on the instruction set available to the CPU at runtime. Of course gcc and glibcxx's developers must not trust glibc and will sometimes replace calls to these functions with thier "optimized" builtin versions that use the lowest common denominator ISA. An easy way we got a 5% boost in througput in mongodb was to force them to call into glibc. https://github.com/mongodb/mongo/blob/master/pch.h#L47-56
Most of my favorite tricks actually involve the preprocessor, though. I know it's significantly less expressive than the macro systems in Lisp, Scheme, or OCaml, but C would be a very different language without it, and tasteful CPP usage can ease many of C's pain points.
(My other other favorite C programming trick is knowing Lua, which is excellent for scripting C. :) )