Banned C standard library functions in Git source code

alister · on Aug 25, 2019

I'm glad to see that setjmp() and longjmp() are still allowed.

I'm just kidding by the way. For those C programmers who haven't encountered these before, it is a powerful way to do a "goto" in C. Powerful in the sense that you can jump anywhere, not limited to the same function. If it's used at all these days, it's used for exception handling.

More info: https://en.wikipedia.org/wiki/Setjmp.h

LukeShu · on Aug 25, 2019

The way I see the list is that it isn't meant to be exhaustive, it's meant to be functions that contributors are likely to try to use. A new contributor might not realize that they should be using Git's internal strbuf.h instead of the libc string functions. It's a way to give them feedback on that from their compiler, before they spend more time on the patch and send it to the mailing list.

masklinn · on Aug 26, 2019

Also seems mostly for functions which look innocent and are likely to be used but are very easy to misuse.

setjmp/longjmp is not in this category, it might be argued to be easy to misuse but doesn't look innocent and is not likely to be used in normal codebases.

rkagerer · on Aug 25, 2019

Might be nice if the error messages suggested such alternatives instead of just saying "banned"?

grogenaut · on Aug 25, 2019

I believe the git project welcomes suggestions and discussion via the mailing list, and also accepts patches that like you suggested above.

Lerc · on Aug 25, 2019

I had quite some fun playing around with it to obfuscate code. I did find that jumping to a different position in the same expression broke on a lot of compilers as soon as optimization was turned on.

Here's a simple program to output a string with the characters in an unusual order.

    #include <stdio.h>
    #include <setjmp.h>

    #define J(x,y) (longjmp(x,y),0)

    int main(int argc, char **argv)
    {
        jmp_buf j[011];
        int x,X=0;
        signed char* i = " eehcef lo u ef'oYurls " +021;

        (x=setjmp(j[X++]))?J(j[x],*++i):((x=setjmp(j[X++]))?J(x[j],*--i):((x=
        setjmp(j[X++]))?J(x[j],(putchar(*i),*i+=128)):(x=setjmp(j[++X]))?((x<
        0)?J(1[j],5):J(2[j],3)):(x=setjmp(j[++X]))?((x>>X)?J(1[j],6):J(1[j],3
        )):(x=setjmp(j[++X]))?((x&0x80)?J(1[j],7):J(1[j],5)):(x=setjmp(j[++X]
        ))?(((X<<5)&x)?J(1[j],4):J(2[j],X)):(x=setjmp(j[++X]))?((x&128)?J(0[j
        ],X):J(2[j],4)):(setjmp(j[3])>=0)?J(j[2],6):0));

        return 0;
    }

begriffs · on Aug 25, 2019

> the same expression broke on a lot of compilers as soon as optimization was turned on

C is not a "try it and see" language. If you stray from the spec then things will act differently and break between compilers. Section 7.13.2 specifies that only local variables declared as volatile will have a determinate value after a longjmp.

kazinator · on Aug 25, 2019

ISO C says that any local variables that are not declared volatile whose values are mutated after the context is saved with setjmp then have indeterminate values when the context is restored with longjmp. So yes, that interacts with optimization. Any variable that is cached in a register may revert to its old value when that register is restored blindly from jmp_buf.

mcherm · on Aug 25, 2019

You, sir, have a crafty and evil mind. Well done.

valleyer · on Aug 26, 2019

This was super fun to decipher!

1. The first time through this giant ternary defines 9 different small subroutines. Each subroutine can consider the value of "x" to be their argument. X is used as an index into the jmp_buf (subroutine) array during setup, but after that it's always just equal to 8

2. Subroutines 0, 1, and 2 are special. Their argument is the index of another subroutine to jump to. They mutate the character buffer state and then jump to the specified subroutine

3. Before jumping, subroutine 0 moves back a character, subroutine 1 moves forward a character, and subroutine 2 prints the current character and marks it by setting its high bit. Once a character is marked, it will never be printed again

4. All other subroutines get the current character value as their argument

5. Subroutines 3, 4, 5, 6, 7, and 8 all check, in different ways, whether the current character has been marked: x < 0, x >> 8, x & 0x80, (8 << 5) & x, x & 128, x > 0

6. Execution starts at subroutine 3

So you can annotate the code as:

    // Subroutine 0:
    //   move to next char;
    //   goto x;
    (x=setjmp(j[X++])) ? 
     J(j[x], *++i) :
    
    // Subroutine 1:
    //   move to prev char;
    //   goto x;
    ((x=setjmp(j[X++])) ? 
     J(x[j], *--i) :
    
    // Subroutine 2:
    //   print and mark (set high bit on) curr char;
    //   goto x;
    ((x=setjmp(j[X++])) ?
        J(x[j], (putchar(*i), *i+=128)) :
    
     // Subroutine 4:
     //   if (curr char is marked) {
     //     // goto 1(5);
     //     move to prev char;
     //     goto 5;
     //   } else {
     //     print and mark curr char;
     //     goto 3;
     //   }
     (x=setjmp(j[++X])) ?
        ((x<0) ? J(1[j],5) : J(2[j],3)) :
     
     // Subroutine 5:
     //   if (curr char is marked) {
     //     // goto 1(6);
     //     move to prev char;
     //     goto 6;
     //   } else {
     //     // goto 1(3);
     //     move to prev char;
     //     goto 3;
     //   }
     (x=setjmp(j[++X])) ?
        ((x >> X) ? J(1[j], 6) : J(1[j], 3)) :
     
     // Subroutine 6:
     //   if (curr char is marked) {
     //     // goto 1(7);
     //     move to prev char;
     //     goto 7;
     //   } else {
     //     // goto 1(5);
     //     move to prev char;
     //     goto 5;
     //   }
     (x=setjmp(j[++X])) ?
        ((x & 0x80) ? J(1[j], 7) : J(1[j], 5)) :
     
     // Subroutine 7:
     //   if (curr char is marked) {
     //     // goto 1(4);
     //     move to prev char;
     //     goto 4;
     //   } else {
     //     print and mark curr char;
     //     goto 8;
     //   }
     (x=setjmp(j[++X])) ?
        (((X << 5) & x) ? J(1[j], 4) : J(2[j], X)) :
     
     // Subroutine 8:
     //   if (curr char is marked) {
     //     // goto 0(8);
     //     move to next char;
     //     goto 8;
     //   } else {
     //     print and mark curr char;
     //     goto 4;
     //   }
     (x=setjmp(j[++X])) ?
        ((x & 128) ? J(0[j], X) : J(2[j], 4)) :
     
     // Subroutine 3:
     //   if (curr char is not marked) {
     //     print and mark curr char;
     //     goto 6;
     //   } else {
     //     exit;
     //   }
     (setjmp(j[3])>=0) ?
        J(j[2],6) : 0));
    
    // X = 8
    // goto 3(0);

A more straightforward way of writing this would be

    #define IS_CURR_CHAR_MARKED() (*i < 0)
    #define PRINT_AND_MARK_CURR_CHAR() do { putchar(*i); *i += 128; } while (0)
    int curr_sub = 3;
    
    for (;;) {
        switch (curr_sub) {
            case 3:
                if (!IS_CURR_CHAR_MARKED()) {
                    PRINT_AND_MARK_CURR_CHAR();
                    curr_sub = 6;
                } else {
                    return 0;
                }
                break;
            case 4:
                if (IS_CURR_CHAR_MARKED()) {
                    i--;
                    curr_sub = 5;
                } else {
                    PRINT_AND_MARK_CURR_CHAR();
                    curr_sub = 3;
                }
                break;
            case 5:
                if (IS_CURR_CHAR_MARKED()) {
                    i--;
                    curr_sub = 6;
                } else {
                    i--;
                    curr_sub = 3;
                }
                break;
            case 6:
                if (IS_CURR_CHAR_MARKED()) {
                    i--;
                    curr_sub = 7;
                } else {
                    i--;
                    curr_sub = 5;
                }
                break;
            case 7:
                if (IS_CURR_CHAR_MARKED()) {
                    i--;
                    curr_sub = 4;
                } else {
                    PRINT_AND_MARK_CURR_CHAR();
                    curr_sub = 8;
                }
                break;
            case 8:
                if (IS_CURR_CHAR_MARKED()) {
                    i++;
                    curr_sub = 8;
                } else {
                    PRINT_AND_MARK_CURR_CHAR();
                    curr_sub = 4;
                }
                break;
        }
    }

Lerc · on Aug 27, 2019

That's quite impressive work. It becomes a bit clearer if you think of the string as a tape. Then a big case statement with Left,Right,Mark seems rather Turing complete :-)

begriffs · on Aug 25, 2019

> you can jump anywhere, not limited to the same function

Outside of the current function, yes. Anywhere, no.

Longjmp allows you to return to places you’ve already been and have saved with setjmp. Also attempting to longjmp to a function that has since returned is undefined behavior. Thus longjmp’s usual use case is like exceptions in other languages, going back up the call chain, skipping intermediate functions.

ryandrake · on Aug 25, 2019

I think reasonable people can debate this one. setjmp()/longjmp() are very useful when working with libpng [1] and jpeglib [2], as you mentioned, as a crude exception handling mechanism. I have seen these functions used in production safely for that purpose. I can imagine other horrible uses for them though. Unlike things like strcpy() which are security-holes-by-design, setjmp()/longjmp() should be in a "carefully code review the exact way these are used" list rather than a banned list.

1: http://www.libpng.org/pub/png/libpng.html 2: https://www.ijg.org

slovenlyrobot · on Aug 25, 2019

The trouble is how they compose with the expectations of code you don't write. It's just not normal in C to expect execution of your function to abort halfway through, and so memory leaks and broken state are almost guaranteed when they're mixed with the wrong third party library.

It might work great when you wrote it, but it might not even survive the next change to the codebase, and 10 years out, who is to know what programmers will come along and plug new stuff in without noticing the stack magic, etc

edit: another aspect of the same issue is that it can be difficult or impossible to write generic longjmp-safe code for many kinds of tasks. In C++ you have stack unwinding to e.g. delete large temporary heap allocations during a complex operation, but no such facility exists in C.

megous · on Aug 25, 2019

longjmp will not interrupt third party code (well maybe if it uses callbacks and you use it from a callback).

setjmp/longjmp will almost certainly require tracking any resource use carefully so that resources can be released at the setjmp point, when execution gets back there.

I mostly use it for terminating the recursive descent parsing on error. Otherwise it gets tedious to check return values everywhere in hand written code.

smcameron · on Aug 26, 2019

> very useful when working with libpng

There is no reason libpng needed to use setjmp as part of its API, and that it does so is (I hope) widely regarded as a bad decision. It's "useful" only in so much as that is the API it defines, but there is no reason that this had to be the case, and simply returning an error code would have been very very much preferable.

caspper69 · on Aug 26, 2019

Didn't glibc use setjmp()/longjmp() for task switching eons ago (maybe even pre-Linux)?

slovenlyrobot · on Aug 25, 2019

These are used /everywhere/ in PostgreSQL, exactly for exception handling. The result isn't bad at all, but indeed, too powerful a tool for 99% of developers

CamperBob2 · on Aug 25, 2019

Fortunately, 99% of developers, like 99% of drivers, are better than average.

cmroanirgo · on Aug 25, 2019

> 99% ... are better than average.

I'm thinking there's a flaw in your mathematics...

elgenie · on Aug 25, 2019

99% of the people on Earth have a higher than average number of legs.

saalweachter · on Aug 26, 2019

And a below-average number of heads.

mrbungie · on Aug 25, 2019

No flaw with math/stats here, that can actually happen. I think you're confusing average/mean with median (50-percentile).

PS: I'm not sure where he's getting his numbers though

sooheon · on Aug 25, 2019

I'm 99% sure that was tongue in cheek.

jraph · on Aug 26, 2019

I think that it would have been even funnier to say that 99% of developers and 99% of drivers are better than the median, but I only figured this thanks to your parent's comment.

aiCeivi9 · on Aug 26, 2019

https://insights.stackoverflow.com/survey/2019#evaluating-co...

> 70% of respondents say they are above average

dragonwriter · on Aug 26, 2019

Depends which average it refers to any of the means, or the media, or the mode can be “average” (most typically, though, it means the arithmetic mean), and it's only impossible in terms of the median.

LorenPechtel · on Aug 26, 2019

Exactly. Anything where you have a main clump with a long tail will show this pattern--most samples will be better than the mean.

Stratoscope · on Aug 26, 2019

You've never been on a project that was 80% done and 80% to go?

ConceptJunkie · on Aug 26, 2019

And after the second 80% is finished, then you only have 80% left.

Merg · on Aug 26, 2019

Not native speaker here, but I think, he just meant that, most programmers and drivers thinks they are better in respective things then, they really are.

It is just like letting programmer build a house. It is just a really bad idea.

fanf2 · on Aug 25, 2019

And Lua’s exceptions are somewhat odd because they are based on setjmp/longjmp. https://www.lua.org/manual/5.3/manual.html#4.6

masklinn · on Aug 26, 2019

Apparently they're also used for error handling in libpng. That's one of the reasons the libspng author started on their project.

crankylinuxuser · on Aug 25, 2019

And that's the perfect usage for goto. Error catching and jump to fail block...

johannes1234321 · on Aug 25, 2019

Goto requires the fail block to be in the same frame. It can be a working error strategy for failing critical errors in such a way from a failing module. You have to be aware that lots of stuff is prone to leak (which can be avoided by i.e. using some memory arena) and locks/mutexes/... might be in unclear state, so code has to be aware of that.

userbinator · on Aug 25, 2019

The destination of the goto is usually the clean-up code.

johannes1234321 · on Aug 25, 2019

That's nice for the single frame.

Let's take the example of PostgreSQL, which was mentioned in this thread: A potential implementation (I have no knowledge of pgsql internals) might be that for a single user session all memory is from a single arena, all locks and other handles are tied to the session handle, then it can be an efficient strategy to longjmp out on a critical error (i.e. IO error) and clean the arena as well as things tied to the session. instead of bubbling this up through all frames.

Mikhail_Edoshin · on Aug 26, 2019

It's a different use case. There was a nice article[1] about exceptions in plain C with setjmp and longjmp.

[1] http://www.on-time.com/ddj0011.htm

IshKebab · on Aug 25, 2019

You're in the 1% though right?

Funny way of saying "too error-prone for anyone sane to consider using".

kijiki · on Aug 26, 2019

It is also the only way for an xlib program to survive losing its connection to the X server. xlib will let you register a callback to handle loss of connection with XSetIOErrorHandler, but unconditionally calls exit() after your callback returns. Which is extremely anti-social behavior for a library, but that is what it does.

So the solution is to setjmp before each call to an X function that might notice the connection is closed, and longjmp out of the error callback. Ugly, but it works.

These days, you should really just use XCB, or even Wayland.

LorenPechtel · on Aug 26, 2019

That's nuke-from-orbit level stupidity!

mlyle · on Aug 25, 2019

You can build coroutines from them, too, which is really cool and useful.

thechao · on Aug 25, 2019

I did! And the ergonomics are dreadful!

https://github.com/jaroslov/coro

yongjik · on Aug 25, 2019

I don't know why I still remember these, but if you need to build coroutines in C, ucontext.h might be a better tool:

https://pubs.opengroup.org/onlinepubs/7908799/xsh/ucontext.h...

anfilt · on Aug 25, 2019

Those are deprecated.

thomasjames · on Aug 25, 2019

I had only seen them in the context of userspace threading. I wonder what mainstream libs use them. Do you know of any?

fucking_tragedy · on Aug 25, 2019

Generators are a subset of coroutines, and coroutines are useful in lexing and parsing.

thomasjames · on Aug 26, 2019

Whoops, I meant that use set/longjmp() to implement coroutines. My wording in retrospect was super unclear. Sorry!

vasama · on Aug 25, 2019

I think you mean fibers, and no you can't.

tlb · on Aug 25, 2019

Here is an implementation of coroutines on top of setjmp/longjmp: https://fanf.livejournal.com/105413.html

superpermutat0r · on Aug 25, 2019

How does it work for arbitrary arguments? What about function pointers. What about calling a coroutine inside a coroutine?

It also has a stack managed in runtime. I'm not sure that would work for any other application than the fun example here.

vasama · on Aug 25, 2019

Looks more like an implementation of undefined behaviour to me.

dang · on Aug 25, 2019

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

monocasa · on Aug 25, 2019

The underlying implementations define the behavior here very well.

fanf2 · on Aug 25, 2019

Well, I would not go that far. My coroutine hack relies on “unwarranted chumminess with the implementation” (http://c-faq.com/struct/structhack.html) and there are lots of incidental things that can break it. It is best treated as an example of the principles of thread switching and stack allocation.

zzo38computer · on Aug 25, 2019

I don't use setjmp() and longjmp() often, but sometimes I do; I use goto more often than those (but still not all the time). (I want goto in JavaScript too. I figured out a algorithm to do so, but have not implemented it.)

One use of setjmp/longjmp I have used is in ZORKMID, to deal with the debugger. At the beginning of the execute() function I have:

  while(setjmp(exception_buffer));

(The semicolon is correct; the loop body is supposed to be empty.) Normally, if you enter the debugger and then you exit the debugger to continue the execution, then it will continue from where it left off, which is likely in the middle of the execution of some instruction, and may result the same error again. But, if you change the program counter before continuing execution, then the debugger will keep track of that and will use longjmp instead when continuing execution, therefore skipping the rest of the instruction that stopped.

pault · on Aug 25, 2019

What can you accomplish with goto that you can't accomplish with try/catch and custom errors?

hibbelig · on Aug 26, 2019

> What can you accomplish with goto that you can't accomplish with try/catch and custom errors?

You can program in C. C doesn't have try/catch...

dtech · on Aug 25, 2019

Goto sounds like a generally bad idea in a more expressive language like Javascript. Can you give an example of some Javascript-with-goto code that would be better than vanilla Javascript?

zzo38computer · on Aug 25, 2019

I have sometimes encountered such thing (but I do not remember right now), but what I do know is that goto is more helpful in C than in JavaScript. That doesn't mean that goto isn't worth anything in JavaScript, but only that there are less circumstances where it is useful compared to the other stuff. (The same is true of macros.)

lcuff · on Aug 25, 2019

I remember setjmp() and longjmp() being the only way to do a STREAMS device driver I wrote 30 years ago. Don't know if things have changed since then. Ah, what memories.

CorvusCrypto · on Aug 26, 2019

Could be wrong but I believe s lot of coroutine implementations use this as well since the call preserves the frame info in the result.

gnulinux · on Aug 26, 2019

I had really fun implementing coroutines, (almost-zero cost) exceptions, iterators etc with setjmp and longjmp. I think they're too useful to be banned outright (especially since C has no view on stack) since there are cases where you need to either use them or inline assembly, but I agree that they're the evilest of evilest std C functions.

chris_wot · on Aug 25, 2019

I've always wondered why goto is actually considered harmful. I really should read the original paper by Djikstra...

GrumpyYoungMan · on Aug 25, 2019

Read David Tribble's "Go To Statement Considered Harmful: A Retrospective" instead: http://david.tribble.com/text/goto.html . It contains the entire text of Dijkstra's paper and goes over its meaning in a modern context on a paragraph-by-paragraph basis. It's very well written and makes clear the "encourages spaghetti code" argument, while true at the time, applied in a era where basic control flow constructs that we now take for granted did not exist in commonly used languages.

nybble41 · on Aug 26, 2019

Thank you for the recommendation; that was a very interesting read. I find that I can't quite agree with the last "nontrivial goto" example, though. This version without explicit gotos is actually shorter, and IMHO not really any more difficult to follow:

    int parse(void)
    {   
      Token tok;
      while ((tok = gettoken()) != END)
      {   
        while (!shift(tok))
          if (!reduce(tok))
            return ERROR;
      }
      return ACCEPT;
    }

I will say that any language without goto, or where the use of goto is discouraged, should at least provide guaranteed tail-call elimination as an alternative. In this case the gotos could easily be converted into loops, but not every algorithm is so accommodating.

gbacon · on Aug 26, 2019

See also Structured programming with go to statements[0] by Donald E. Knuth.

[0]: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.103...

saagarjha · on Aug 25, 2019

It encourages spaghetti code.

Someone1234 · on Aug 25, 2019

Not using it can also create spaghetti code. For example the "alternatives" often suggesting for exiting a multi-tiered loop (which often add additional boilerplate, additional variables, etc that all need to be maintained/bug free).

FartyMcFarter · on Aug 25, 2019

Indeed. Despite not having actual goto, Java went out of their way to have labeled loops and the "break label_name" syntax in order to avoid this awkwardness.

jjuhl · on Aug 25, 2019

[RAII](https://en.cppreference.com/w/cpp/language/raii) to the rescue..

smcameron · on Aug 25, 2019

libpng uses setjmp and longjump as part of its freakin' API.

fortran77 · on Aug 25, 2019

There really are use-cases for it, especially when implementing a Forth-style interpreter in C.

jlg23 · on Aug 25, 2019

> If it's used at all these days, it's used for exception handling.

And in all of the CPS code.

Throw_Away_4384 · on Aug 25, 2019

> If it's used at all these days, it's used for exception handling.

What about co-routines?

f00zz · on Aug 25, 2019

Yeah, I remember some green threads libraries based on longjmp/setjmp.

saagarjha · on Aug 25, 2019

I see zero use of those in the codebase, so perhaps they should be banned too. Or maybe they’re not on this list because nobody was using them already?

susam · on Aug 25, 2019

I have worked in large code bases (RSA NetWitness, Oracle Database, etc.) where setjmp/longjmp were used to create macros for exception handling using TRY-CATCH-ENDTRY blocks in C.

saagarjha · on Aug 25, 2019

Personally, I feel like wrapping exception handlers in macros makes it easy for people to do unsafe things with it like dynamically allocating memory or calling system functions that can’t handle it correctly.

peff · on Aug 25, 2019

I see a lot of comments to the effect of "shouldn't XYZ also be banned". The answer is that we're not necessarily trying to be exhaustive. The point is to flag common errors before we even hit review, so we add new functions mostly when somebody tries to misuse them. I don't recall anybody trying to abuse longjmp() in Git's codebase yet (and no, that's not a challenge).

Cogito · on Aug 26, 2019

Peff, for the people asking in the thread, is there a place where correct alternatives are suggested or demonstrated?

I know there are a few different places that talk about how to use git's internal machinery, but not sure if any are specific to these banned functions.

masklinn · on Aug 26, 2019

The original commits mentions git's strbuf API[0] and its xsnprintf, a variant of snprintf which asserts that the destination buffer was big enough[1] (rather than just return truncation information).

For other codebases, snprintf is the usual recommendation, and careful straight buffer manipulation (mem*) iff performances are a concern.

[0] https://schacon.github.io/git/technical/api-strbuf.html

[1] https://code.forksand.com/linux/git_git/commit/7b03c89ebd103...

peff · on Aug 26, 2019

The commit messages that add them to banned.h discuss alternatives, though most of the explanations are Git-specific and assume you'll look elsewhere to figure out how to actually use those alternatives.

stiff · on Aug 25, 2019

Looks like Git has its own string type:

https://github.com/git/git/blob/master/strbuf.h

https://github.com/git/git/blob/master/strbuf.c

See this for the story of why strncpy/strncat are insecure:

https://en.wikipedia.org/wiki/C_string_handling#Replacements

adamnemecek · on Aug 25, 2019

Like essentially all C programs.

segfaultbuserr · on Aug 25, 2019

Every C programmer has his/her own standard library...

r3trohack3r · on Aug 25, 2019

I wonder about this every time I read C.

There is talk about npm/rust having massive dependency trees (and they do) because it makes taking on dependencies too easy. But I feel like C is on the opposite side of the spectrum, where managing dependencies is difficult so every C code base is rolling it’s own version of everything.

C also has an expansive standard library but it hasn’t offset re-invention of that stdlib in the ecosystem. I just see those implementations trapped inside code bases that don’t export them in a consumable way for other projects...

Personally I love writing C but rarely do because dependency management makes innovation in those code bases so time consuming for me.

jancsika · on Aug 25, 2019

It's bad that C devs have rolled their own awful, buggy custom solutions.

However, here's the upshot-- that awful, buggy, custom solution is guaranteed to remain compatible with the codebase that contains it. You won't find a single instance where such a dev "improved", "refactored", "optimized", or "modernized" that awful, buggy code in a way that stopped the rest of the code from working and then shipped that broken blob of junk. And if you did find that, there's no way they'd convince the rest of the devs that this is a useful thing to do in the interest of "moving forward" or whatever.

On the other hand you'll find zillions of dollars spent on systems to keep dependency management tools from causing exactly that problem.

Just to give a real-world example-- a user reported that an abandoned C++ plugin wasn't working. We did a race:

1. Three devs try to get a single C++ dependency of that plugin to work cross platform (OSX, Windows, Linux).

2. I tried to port the C++ plugin to a C plugin with no deps.

By the time I ported, tested, and shipped, those three devs were tracking down a bug with the build script of the C++ dependency on Windows.

Edit: granted the plugin itself is only about 2000 lines of code.

ShinTakuya · on Aug 26, 2019

Regarding remaining compatible, I would argue that this is why modern languages implement lock files and version pinning. If you don't like a particular change but need some extra functionality or a security fix you can fork the relevant libraries or extend the functionality with an extra self written library.

riquito · on Aug 26, 2019

It's not a guarantee that it will work, it's not that rare to encounter a minor/patch version in a library that introduced some subtle incompatible change (usually unbeknownst to the author)

ShinTakuya · on Aug 26, 2019

That's literally the purpose of the lock file. The lock file locks the entire dependency tree. So unless you're bumping versions or you fail to save the lock file, the entire dependency tree's versions will remain the same.

>some subtle incompatible change

In statically typed languages this normally isn't an issue. Of course I'm aware that logic can also be changed, but in that case it's up to you to write appropriate tests (or just don't bump the versions of your libraries without a good reason).

Volt · on Aug 26, 2019

It still sucks. I have a few Haskell projects from a few years ago that I wanted to compile on another system. So I froze the dependencies, moved the project, and tried to build. Solver failed. I gave up.

ShinTakuya · on Aug 26, 2019

As far as I'm aware, Haskell's dependency system doesn't have a lock file. That's the key part to keeping things stable. Version pinning alone isn't enough.

rubber_duck · on Aug 25, 2019

You had the unfair advantage of working on your own :)

adamnemecek · on Aug 25, 2019

This is a C++ dependency manager problem though.

Volt · on Aug 26, 2019

That's the point

oblio · on Aug 26, 2019

From a modern package manager perspective, C++ is almost as archaic as C.

needs · on Aug 25, 2019

That was very true for a long time. But now with cmake and a package manager like hunter, it became quite easy and pleasant to work with dependencies. ExternalProject_Add() may also work.

segfaultbuserr · on Aug 25, 2019

Creating a lightweight library would make C programming much more productive, and I'm sure many have tried, but I wonder why none of them see widespread applications, instead of writing some half-baked, ad-hoc helper functions.

One reason, I guess, is the diverse range of applications of C, another reason is the lack of advanced features like generics and templates.

But it would still be useful to create a simple library, intended for Unix-like platforms (beyond the BSD extension of stdlib). So it's more of a cultural problem?

needs · on Aug 25, 2019

If I had to guess why there are no such lightweight library, it is because when somehow the library API doesn't provide what you want, you cannot hack the library itself easily. If you import the source code you may have trouble building it, and maintening such "patch" is not a lot of fun.

segfaultbuserr · on Aug 25, 2019

Yes. I guess the fact C doesn't have advanced features like generics and templates probably have made the problem worse, making it difficult to create a "drop-in" API.

goatinaboat · on Aug 25, 2019

There is a powerful C library that covers nearly all use cases, you merely need to write a configuration file for it. Perhaps you’ve already heard of it? Tcl.

StreamBright · on Aug 25, 2019

All of them are null pointer exception & segfault free I bet.

sfkdjf9j3j · on Aug 25, 2019

Due to the language's design, C programs are in fact guaranteed to be free of exceptions.

teddyh · on Aug 25, 2019

Except SIGFPE.

segfaultbuserr · on Aug 25, 2019

I nominate your comment for the HN comment of the day.

StreamBright · on Aug 25, 2019

Nitpicking but you are right. Null pointer bugs (null pointer dereference to be precise) are all over the place in C though.

https://stackoverflow.com/questions/4007268/what-exactly-is-...

zsrxx · on Aug 25, 2019

What's a good library for this kind of boilerplate? A lightweight one if possible, i.e. not fucking glib

vmsp · on Aug 25, 2019

Salvatore Sanfilippo, from Redis fame, has a nice one https://github.com/antirez/sds.

chucksmash · on Aug 25, 2019

Thanks for linking this, it's always good to see how very strong engineers solve problems. Never in a million years would this approach have occurred to me. For the interested, the README in the linked repo is done well and clearly explains the approach itself as well as pros and cons. This also reminded me that I've been meaning to watch antirez' "Writing System Software"[0] videos on YouTube for quite a while now, so thanks for that too.

[0]: https://www.youtube.com/playlist?list=PLrEMgOSrS_3fghr8ez63x...

megous · on Aug 25, 2019

I'd rather make it explicit that returned/passed value is something other than a regular zero terminated string. Otherwise you can easilly make a mess, if you pass zero terminated string to a function that expects some special bytes in the memory before the pointer.

doboyy · on Aug 26, 2019

I do this, but I like to just return the actual struct and also avoid typedefing pointer semantics. Can double as a dynamic array too. It's true. We all have string libraries...

zsrxx · on Aug 25, 2019

I haven't looked much so I can't comment on the quality of the library but that hairdo is glorious. Props to Salvatore!

tlb · on Aug 25, 2019

There are far too many. APR is a solid choice -- very portable, well-tested, and fast. It uses memory pools a lot, which work well for many styles but not everything.

See https://apr.apache.org/docs/apr/1.7/modules.html

emmelaich · on Aug 26, 2019

This is a good place to start:

https://dwheeler.com/secure-programs/3.71/Secure-Programs-HO...

seamyb88 · on Aug 25, 2019

What's wrong with glib, other than not being lightweight? I love glib.

NewJazz · on Aug 25, 2019

libbsd has strlcat and strlcpy.

or just use snprintf everywhere.

adamnemecek · on Aug 25, 2019

Rust. Half-joking.

commandlinefan · on Aug 26, 2019

That was my first thought. No strncpy? Ok,

  while (*s1 && n--)  {
    *s2++ = *s1++;
  }
  *s2 = '\0';

it is, then.

fwsgonzo · on Aug 26, 2019

I don't think strncpy is the function you are looking for, either. Its for zeroing fixed-size character buffers. And it doesn't indicate problems.

https://blog.liw.fi/posts/strncpy/

jasonhansel · on Aug 25, 2019

Nice. Is there a similar standalone C library for safe/sane string handling? (Please don't tell me to use C++.)

ayosec · on Aug 25, 2019

https://github.com/antirez/sds

trasz · on Aug 26, 2019

https://github.com/alexreg/libsbuf

dblock · on Aug 25, 2019

Windows programmers are familiar with StrSafe.h, https://en.wikipedia.org/wiki/Strsafe.h and https://github.com/dotnet/coreclr/blob/master/src/pal/inc/st... which go a lot further.

Also found https://github.com/mubix/netview/blob/master/banned.h on Github with a better list.

WalterGR · on Aug 25, 2019

IIRC, StrSafe.h - and much more - came from the development of the Microsoft Security Development Lifecycle (SDL).

Code Red was the wake-up call for Microsoft and in February 2002, based on a memo from Bill Gates that first coined the phrase "Trustworthy Computing," Microsoft shutdown Windows development for the first time ever to get a handle on the security issues the products were facing.

https://www.itprotoday.com/strategy/story-behind-microsoft-s...

pjmlp · on Aug 25, 2019

Additionally we get to enjoy bounds checked arrays(std::...), and iterators on debug builds, with possibility to selectively enable them in release mode.

While Windows by all means still has its security issues, the toolchain is much more security oriented than most FOSS alternatives thanks to the Windows XP wake up call.

Android and ChromeOS are probably the mostly locked down alternatives on the FOSS space.

uncletaco · on Aug 25, 2019

Tangential but the fact that I had to open up the machine and remove a screw to completely replace ChromeOS with linux bothers the fuck out of me.

floatboth · on Aug 25, 2019

That's how secure boot should work. Replacing the root of trust should require serious physical access that can be tamper-evident. And yeah, out of the box, the trust is with the vendor — who else would be trusted in a device that doesn't have an owner yet?

uncletaco · on Aug 26, 2019

I never thought of that. And it makes a lot more sense in that context.

pjmlp · on Aug 25, 2019

Well, you can see how they are locking it down with containers sandboxing from an IO talk, by making use of gVisor and a Rust based containers.

"Linux for Chromebooks: Secure Development"

https://www.youtube.com/watch?v=pRlh8LX4kQI

saagarjha · on Aug 25, 2019

My Chromebook just made me enable developer mode.

toast0 · on Aug 26, 2019

Developer mode lets you boot another OS, but leaves you open to accidentally wiping your drive if you hit the wrong button on boot. Replacing the firmware is a good idea, you do need to remove the write protect screw, but that really makes sense for some note of physical security.

saagarjha · on Aug 26, 2019

Multiple wrong buttons in a specific sequence, at least on my Chromebook.

temac · on Aug 25, 2019

I don't know, you can use valgrind and fsanitize and ibstdc++ __gnu_debug:: containers or _LIBCPP_DEBUG under libc++.

Just know your tools...

pjmlp · on Aug 26, 2019

The big difference is that on MSVC++ those debugging features are enabled by default on debug builds.

File->New C++ Project, already there.

Available to everyone, regardless of their compiler switch mastery level.

Defaults matter.

yesmar · on Aug 26, 2019

You can find an up-to-date, authoritative version of banned.h here: https://github.com/x509cert/banned

slovenlyrobot · on Aug 25, 2019

That's a surprisingly small list, missing e.g. sscanf / gets / strtok / all the other "usual suspects" at least

asveikau · on Aug 25, 2019

I don't see a huge problem with strtok as long as it isn't a multi threaded program. It is not in gets territory which is literally impossible to use correctly.

bhk · on Aug 25, 2019

Synchronous reentry is a problem. Say some loop using strtok calls a function... how do you know that that function doesn't use strtok (or call another function that does...).

Any use of non-const static variables in general has this problem, and strtok is just one example.

derefr · on Aug 25, 2019

> Say some loop using strtok calls a function... how do you know that that function doesn't use strtok (or call another function that does...).

I’m constantly surprised that C isn’t specified in such a way that lack-of-reentrancy can be determined at compile time. You’d “just” need a symbol table, sort of like the debug symbol table, in each compiled object, holding for each visible symbol the set of function calls marked `__no_reenter` that are predominated by that symbol in control flow. (Yes, some functions do computed jumps, like with longjmp. Just implicitly label those functions __no_reenter unless the programmer explicitly labels them __reenter!)

asveikau · on Aug 25, 2019

The problem isn't exactly reentry. It's that the function, having already exited, has modified global state, and it needs the same state to be there on the next call. If another call is interleaved (another callee with its own strtok loop) then the state is overwritten with the inner loop, and your own next call won't have its expected contents.

Whereas with a term like reentry I think of two strtok frames on the stack at the same time, which is not possible.

I think it's a bit easier to conceptualize with a function that produces a simple return value in a static buffer. Take getpwnam(). It fills a global structure with info about a given user. So you may keep a pointer to that on your stack. Then you call some other function foo(). foo looks up another user. Suddenly you can't rely on that other call to getpwnam() not having overwritten the data in the result you got back the first time.

bhk · on Aug 25, 2019

Ah, yes. What I described wasn't reentry. A reentrant function using strtok could exhibit this problem, but it applies more widely than that.

userbinator · on Aug 25, 2019

how do you know that that function doesn't use strtok (or call another function that does...).

The same way you know a lot of other things about the codebase: By reading and understanding it. Given what strtok() is used for, it's almost always going to be working on a single context at any one time.

bhk · on Aug 26, 2019

First, many C projects are libraries that are written for reuse in other programs. So "the codebase" is an arbitrary, unknowable thing. We don't have the ability to look at all the other code that will eventually coexist in the same process.

Second, code changes over time. Assuming you inspect 100 lines of code to ensure that they don't call strtok() ... do you sprinkle comments all over as a warning to future programmers not to add any calls to strtok()?

Finally, assuming we could evaluate all of the code in a program at once, it is impractical to build non-trivial programs that way. Instead, we rely on contracts. The idea is that if a caller conforms to its end of the contract, the callee will conform to its end. This allows us to reason about correctness of some code without reading all the other code in the universe and thinking through all the potential code paths. Any function using strtok() would be presenting a rather ugly contract clause "this function destroys strtok's static state". Even worse, such a clause would be contagious, infecting any client of that function, and so on.

missblit · on Aug 25, 2019

Might as well just use strtok_r everywhere and not worry about if you're using strok safely or not.

lholden · on Aug 25, 2019

"I don't see a huge problem with insert thing here as long as it isn't a multi threaded program."

Usually a bad omen. :)

slovenlyrobot · on Aug 25, 2019

I can't remember what's wrong with it, but it's responsible for about eleventy billion CVEs over the years

tropo · on Aug 26, 2019

You can safely use gets on trusted input. For example, your program might create a pipe or socketpair to communicate between the processes that result from a call to fork. The stdout of one process feeds into the stdout of the other process.

Unless you have an insane gets implementation that writes the bytes backwards, you can also safely use it on untrusted input. Simply place a guard page at the end of the buffer. You can do the call from a forked child that you let die, taking notice of the problem in the parent. On systems without threading (because of locks in libc) you can do a longjmp to recover.

tlb · on Aug 25, 2019

git does use multiple threads for some longer operations, like gc and pack, so using strtok would be a disaster.

_kxbd · on Aug 25, 2019

Maybe gets doesn't need to be banned because it got removed in C11 and at least gcc issues a warning on use.

ianai · on Aug 25, 2019

I hope it grows through community effort. This is a good sanity check sort of thing.

jpjuni0r · on Aug 25, 2019

I'm curious how sscanf is considered bad?

saagarjha · on Aug 25, 2019

scanf(“%s”, …) can cause a buffer overflow.

mort96 · on Aug 25, 2019

It sucks that there's no scanf("%.*s"), where you give the buffer length as a separate argument. The fact that it's in a string literal also makes using a compile-time constant really ugly.

claystu · on Aug 25, 2019

You can overflow buffers all sorts of ways in C.

jcl · on Aug 25, 2019

Looks like the second vsprintf definition should be "BANNED(vsprintf)", not "BANNED(sprintf)". Not that it matters much, as it only affects the error message.

peff · on Aug 25, 2019

Yes, you're right. Patches welcome. We do our development on a mailing list; see https://git-scm.com/docs/SubmittingPatches for details.

However, if you're more comfortable using GitHub PRs, there's a gateway interface at https://gitgitgadget.github.io/.

ttaylorr · on Aug 25, 2019

I submitted a patch to fix this typo from my fork (which you can see at https://public-inbox.org/git/cab687db8315dd4245e1703402a8c76...).

gus_massa · on Aug 25, 2019

Nice catch! You can send a "PR", but note that they have their own custom "PR" procedure, not the standard "PR" procedure in GitHub.

> Git Source Code Mirror - This is a publish-only repository and all pull requests are ignored. Please follow https://github.com/git/git/blob/master/Documentation/Submitt... procedure for any of your improvements.

LukeShu · on Aug 25, 2019

Their "custom" procedure is the `git send-email` command, which has a lot more of a claim to be "standard" than GitHub's custom procedure.

saagarjha · on Aug 25, 2019

Aside: GitHub’s refusal to provide a feature to close pull requests for mirrors is utterly ridiculous since it requires that people (or bots) go around and disable pull requests and point people to the patch submission process.

OJFord · on Aug 25, 2019

What's weirdest to me is that it differs from the policy on Issues, which can be turned off in settings.

I sort of get not allowing it, except that it's inconsistent.

yjftsjthsd-h · on Aug 25, 2019

> not the standard "PR" procedure in GitHub

If it is specific to one implementation, then it is not standard, pretty much by definition.

notatoad · on Aug 25, 2019

the word "standard" has more than one meaning. it's not standard by the definition of standard that you chose, but you can use your context cues to realize that it's not the definition of standard that OP meant.

zeckalpha · on Aug 25, 2019

I’m sure they’d welcome a patch.

bigtrakzapzap · on Aug 26, 2019

These shouldn't be used either (_s are available in C11, _l tend to be locale-parameter thread-safety, and _r have sizes/state for bounds/thread-safety):

- strtok -> strtok_r / strtok_s

- asctime -> strtok_r / strtok_s

- gmtime -> gmtime_r / gmtime_s

- localtime -> localtime_r / localtime_s

- ctime -> ctime_r / ctime_s

- dirname -> dirname_r

- basename -> basename_r

- devname -> devname_r

- readdir -> readdir_r

- ttyname -> ttyname_r

- gamma -> gamma_r

- lgamma -> lgamma_r

- lgammaf -> lgammaf_r

- lgammal -> lgammal_r

- atoi -> atoi_l

- atof -> atof_l

- atol -> atol_l

- atoll -> atoll_l

- gets -> gets_s

- scanf -> scanf_l / scanf_s

- fscanf -> fscanf_l / fscanf_s

- sscanf -> sscanf_l / sscanf_s

- tmpfile -> tmpfile_s

- fopen -> fopen_s

- getenv -> getenv_s

- strdup -> strndup

- strcmp -> strncmp

- strlen -> strnlen

- (Multibyte/wide conversion functions without mbstate_t parameter)

- wcslen / wscnlen -> wcsnlen_s

- wcsncasecmp / wcscasecmp_l / wcsncasecmp -> wcsncasecmp_l

- strcasecmp / strcasecmp_l / strncasecmp -> strncasecmp_l

- bzero (use explicit_bzero, in some cases)

- calloc, realloc -> reallocarray (for arrays of non-byte items)

- memmove -> memmove_s

- strncat -> strncat_s

- strncpy -> strncpy_s

- srand / rand -> rand_r

There are others that are platform-specific. Thread-safety, internal mutable state (not thread-safe), and buffer-overflows are the primary concerns that aren't necessarily applicable in all situations.

masklinn · on Aug 26, 2019

Annex K was never very popular, is still very easy to misuse[0] and has pretty much been deprecated and slated for removal.

[0] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1967.htm#mi...

damien_brooklyn · on Aug 26, 2019

Back in my individual contributor days I got into serious trouble because of this. Where can I find it has officially been deprecated? I tried and I can still use it without getting any warnings?

imrehg · on Aug 25, 2019

Interesting to see the starting commit for that file, there's quite a bit of discussion and rationale too: https://github.com/git/git/commit/c8af66ab8ad7cd78557f0f9f5e...

codedokode · on Aug 25, 2019

What surprises me in C developers is that C exists for probably 40 years but they still don't have proper strings (not just pointers). In many cases there is no large performance penalty for storing string length, and checking it, but they still use pointers or a separate pair of variables for pointer and buffer size instead of single object.

pjmlp · on Aug 26, 2019

Because they have a security culture that you only get errors due to "holding it wrong" or "every good programmer does it right", despite evidence on the contrary.

Plus added the fact that early C compilers generated quite lousy code on 8 and 16 bit computers, there is this idea to micro-optimize each line of code as it is being written, without any profiling feedback of it actually matters, rather cargo cult how writting code like X is faster than Y.

For example, outside 3D rendering and audio software processing, I never saw a visible impact (to the end user) of bounds checking.

jstimpfle · on Aug 26, 2019

> For example, outside 3D rendering and audio software processing, I never saw a visible impact (to the end user) of bounds checking.

Indeed the computation overhead on bounds checking is irrelevant for "cold" code, but consider this

1) Pretty sure you can get bounds checking when compilers detect that you're accessing a static array.

2) Otherwise, it's unclear how to devise a system that integrates bounds checking with C semantics. (Yes, that's unfortunate!)

3) Bounds checking does at least increase code size.

pjmlp · on Aug 26, 2019

1) Totally optional from ISO C point of view, not all C compilers do do it, usually requires static analysis or specific compiler warnings to be enabled. Which not everyone does.

2) Solaris does it perfectly fine on SPARC thanks to tagged memory (ADI). Which Google in collaboration with ARM will make mandatory on future Android releases as well. [0]

3) It hardly mattered in MS-DOS and Amiga LOB applications developed in across Turbo Basic, Quick Basic, GFA Basic, Turbo Pascal, Clipper, so it matters even less nowadays unless we are speaking about PIC like hardware.

Regarding bounds checking I usually refer to Hoare's turing award speech, back in 1981:

"Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."

Or for that matter, the DoD Multics's B2 security evaluation[1], with several remarks how PL/I made the system safer with its string handling, pointer integrity validation and bounds checking.

As noted, yes there are niche cases where bounds checking does have an impact, but for a large spectrum of code that gets daily written it isn't the case.

[0] - https://security.googleblog.com/2019/08/adopting-arm-memory-...

[1] - https://multicians.org/multics-fer.pdf

jstimpfle · on Aug 26, 2019

1) It's not standardized but it's very simple to implement nevertheless.

It is my understanding that 2) doesn't have anything to do with C (so given hardware support, you can have it for free whether working in C or not. I think this kind of invalidates your point).

And also, 2) is not perfect, only probabilistic (a source I found says 94% likelyhood to detect OOB).

And it works only for the most basic situations where you use the system allocator and never subpartition these allocations. So, beyond these simple cases that we can get for free without any involvement from C semantics, I still maintain that it is unclear how to devise a reasonable a useful system to do bounds checking that can be added to the C memory model. We can make up an annotation syntax to cover many of the simpler cases, but these are hardly better than plain assertions (which I regularly use).

I doubt Hoare had a good idea to add general bounds checking on a low-level language like C, otherwise that would be standardized by now.

pjmlp · on Aug 26, 2019

Pointer validation via hardware memory tagging has everything to do with C, because it is only due to C's shortcommings that millions of research dollars keep being spent to try to make it work.

In what concerns Solaris, and the requirements for future Android with ARM memory tagging, the system allocator is all there is, at least from official support point of view.

Hoare hardly needed to pursue such endevour, because all systems programming languages derived from Algol, like ESPOL, NEWP, PL/I, PL/S, PL/8, BLISS,.... were sane regarding bounds checking.

Hardware validation of memory is the only way to tame C, the alternative is to just dump the language, because as proven by ISO C11 dropping Annex K, very few actually care about making the language safe.

However given UNIX's dependency on C, it is also quite clear to me that in the next couple of decades C will be around, long after I am gone, and business opportunities to create companies on top of CVE exploits due to memory corruption bugs.

jstimpfle · on Aug 26, 2019

> Pointer validation via hardware memory tagging has everything to do with C, because it is only due to C's shortcommings that millions of research dollars keep being spent to try to make it work.

So, to restate, Solaris does not do it "perfectly fine". Thanks for making my point.

> In what concerns Solaris, and the requirements for future Android with ARM memory tagging, the system allocator is all there is, at least from official support point of view.

That's a pity, because if you're not doing your own allocators then you'll have to accept lock contention, extreme memory overhead (for smaller allocations, say <= 64 bytes), and you'll need to match every little allocation with a deallocation, instead of making e.g. custom pool allocators.

You're just not going to write a large infrastructure (i.e. performance-oriented) system in this way.

zxcvbn4038 · on Aug 25, 2019

There are countless libraries that add higher level string functions and no end to higher level languages. C fills the niche where you want something higher level then assembly but lower level then Perl, ruby, python, etc. Sometimes you want or need to manage your own memory. Arduino is a good contemporary example.

stjohnswarts · on Aug 25, 2019

Exactly. C philosophy is to use libraries and not put things like a better string library in the core functions of the language. It keeps the language relatively clean and easy to understand, unlike c++

unscaled · on Aug 26, 2019

So the C philosophy is to provide a bad standard string library that is not thread-safe and makes it impossible to write programs without remote code execution vulnerabilities, rather than have a better library with exactly the same functionality but with more security and safety "bloat"?

/snark

Yes, C used have a cavalier approach towards security in the past. But if you're asking why the standard library is not fixed yet, I think that instead of pinning it to some lofty philosophy, it's safer to say that good C developers realized long ago that the original C strings are a mistake. Most big C projects define their own string functions and often their own length-prefixed string types. The C standard committee just gave up on fixing this issue in the standard library, but this is not due to philosophy, but because of the impracticality to force a standard solution on this stage.

pjmlp · on Aug 26, 2019

Bonus points when the libraries are so incompatible among themselves that require extra conversation steps, and then just get dropped 'cause "mind the performance".

jstimpfle · on Aug 26, 2019

When compatibility is needed, APIs should consume pointer+length pairs. It doesn't get better than that, in ANY language, in terms of simplicity and modularity.

Libraries like Qt with extreme lock-in are at the other end of the spectrum. If it works for them, that's nice. Doesn't work for me. I don't think using a string library is in the spirit of C programming.

pjmlp · on Aug 26, 2019

Which means either pointer+length get packed into their own structure to avoid mix up errors, thus requiring conversion function calls across libraries, or they are given separately manually, thus opening the door to the copy-paste mistakes from pointer+bad length that they are supposed to protect against.

Qt is as locked-in as any LPGL 3 FOSS project is.

jstimpfle · on Aug 26, 2019

> Which means either pointer+length get packed into their own structure to avoid mix up errors, thus requiring conversion function calls across libraries, or they are given separately manually, thus opening the door to the copy-paste mistakes from pointer+bad length that they are supposed to protect against.

No, we were talking about compatibility/modularity. You're shifting the topic.

> Qt is as locked-in as any LPGL 3 FOSS project is.

I'm not speaking about licenses lock-in, but about lock-in from an engineering point of view. Are you aware of any significant Qt projects that don't have "Q" all over their codebase?

(And yes, by contrast to GPL, I believe you can use LGPL libraries without suffering a terrible amount of (license) lock-in)

pjmlp · on Aug 26, 2019

How am I shifting the topic?

Compatibility/modularity doesn't happen in the air, rather in written code.

So either one passes structs around, and somehow they need to be compatible.

Or one passes pointer + lenght as two separated variables, with the consequences to keep in sync two unrelated variables, from the compiler point of view.

pjmlp · on Aug 26, 2019

Arduino uses C++.

jbboehr · on Aug 25, 2019

I've used talloc [1] where appropriate to greatly simplify memory allocation and string handling.

[1] https://talloc.samba.org/talloc/doc/html/index.html

emmelaich · on Aug 26, 2019

C has _only_ pointers for variable size things, not just strings.

Roughly speaking C vars are either known fixed length, or accessed via pointer. There is nothing else.

(except arrays -- which are mostly pointers)

doboyy · on Aug 26, 2019

The thing you are pointing to could have its length prefixed. I've always assumed that this isn't the case because nobody could commit to the size of the length prefix. Is it string8, string16, string32, or string64? Using a sentinel value to denote length is less opinionated and more portable.

jstimpfle · on Aug 26, 2019

And also unbeatable in memory efficiency.

doboyy · on Aug 26, 2019

Unfortunately, it's often algorithmically less efficient and more error prone, so everybody ends up replicating what git is doing with 'strbuf' but in slightly incompatible ways. The effect makes dealing with strings in C unnecessarily unpleasant.

MuffinFlavored · on Aug 25, 2019

For anybody else:

Why is strncpy insecure?

https://stackoverflow.com/questions/869883/why-is-strncpy-in...

> strncpy() doesn't require NUL termination, and is therefore susceptible to a variety of exploits.

wruza · on Aug 25, 2019

It also fills all remaining bytes (if any) with nuls. ‘N’ in its name is not the same ‘n’ as in snprintf etc. It could be named fldcpy, as it works with 0-padded fixed-width fields rather than 0-terminated strings.

vagab0nd · on Aug 25, 2019

Yes. Also as mentioned in the post, strlcpy (when it's available) is safer.

jandrese · on Aug 25, 2019

But still not safe, thanks to the return value. Imagine the case where you are using strlcpy because you can't be sure if the source buffer is properly null terminated.

dezgeg · on Aug 25, 2019

If your buffer isn't a NUL-terminated, then don't call a function that is only defined for NUL-terminated buffers. It's as simple as that.

I'm baffled by how some people claim strlcpy() is 'broken' or 'not safe' because it doesn't handle non-NUL-terminated inputs; the exact same thing applies to just about any function in the C standard library that takes strings as input. Are functions like strchr(), fopen(), printf(), strstr(), setenv() 'broken' as well?

begriffs · on Aug 25, 2019

> If your buffer isn't a NUL-terminated, then don't call a function that is only defined for NUL-terminated buffers.

It's not that people want to pass strncpy source buffers that lack NUL termination, it's that strncpy in certain situations will not NUL terminate its results.

https://begriffs.com/posts/2019-01-19-inside-c-standard-lib....

> some people claim strlcpy() is 'broken'

Speaking of strlcpy, it thankfully doesn't have the problem that strncpy does. However strlcpy is not in the C standard or in POSIX, so can't be used portably. In C99 snprintf is a better choice.

dezgeg · on Aug 26, 2019

Yes, strncpy() not NUL-terminating it's output is nasty and well-known. But the comment I was responding to was claiming strlcpy() being unsafe.

jandrese · on Aug 25, 2019

Yes, because cutting off the trailing NULL is a threat vector every C programmer needs to be conscious of whenever they're dealing with strings.

But it's also a problem because maybe the programmer is using strlcpy to grab the first five lines of a 10TB memory mapped file. If you're not thinking about the implications of that return value it can be a real surprise.

fserb · on Aug 26, 2019

This file could be made much better by having comments pointing to reasonable alternatives to the banned functions.

noname120 · on Aug 25, 2019

Why is there no brief explanations in this code why each function is banned?