Hacker News new | past | comments | ask | show | jobs | submit login
The Use of assert() in SQLite (sqlite.org)
139 points by henning on April 27, 2019 | hide | past | favorite | 99 comments



I’d like to mention Go. This seems to me like a pointless quibble about how to catch and handle errors, the kind of argument you see buried in the HN comments and gloss over because you’ve heard it a bunch of times before. The assert() macro in C is roughly just if ... { panic() } in Go. You want that, use it. So the idea that getting rid of assert is “essentially telling developers that they are not allowed to do run-time verification of invariants” is completely nonsensical.

The Go philosophy against assert() and assertion libraries is largely that the error messages produced by assert() are low quality. The idea is that you should not just have a stack trace, you should have a good error message. The assert() macro can give you a stack trace with some effort, but rarely gives you a good message. Good use of panic() is easy and gives you both. It’s also used mainly for runtime invariant checking, although there are some other ways it can be used (since it can be caught).


I dont kbow about go, but in C or C++, it's a common idiom to do something like:

  asset(condition && "static reason why we expected this condition to be true");
Unfortunately, this doesn't allow for a dynamic string, but does generally present a message about the assert. Also, if properly configured, a failed assert results in an abort and should generate a core/crash dump. Pretty useful to be able to load in a debugger for further inspection.

But to the point about provide a good message: that's on the developer, and its going to be true no matter the language.


    if (argc != 2)
        throw std::runtime_error("Expect 1 argument. Actual "s + std::to_string(argc - 1));
Don't get terrified by exception. If you don't catch any exception, then it is mostly abort() with dynamic string


This will unwind the stack, executing destructors. Which is the last thing that you want to do in a scenario where conditions that were supposed to be invariants are broken. If your invariants don't hold, you don't really know what's going on - think of it as high-level undefined behavior. The faster you stop executing code within the security boundary (usually process), the better. This is exactly why assert() calls abort(), and why you should do so as well as soon as you've logged enough data to diagnose the issue.


It still incurs a nontrivial runtime cost, unless you've ifdef'd it away in release builds, but even so it is a heavy price to pay in debug or semi-debug^ builds for things like container operator[] bounds checking. The best implementation of assert I've seen raises SIGTRAP on error, which neatly lets you attach a debugger and break when the error occurs, at which point you have the full stack trace and program state.

^ semi-debug in our case means minimal optimisations and debug asserts enabled.


Bounds checking is one of those things that's easy for a compiler to hoist out of a loop and for branch prediction to guess right when not. It has very high value to cost ratio.


> It still incurs a nontrivial runtime cost,

why do you care about the runtime cost of something that makes your program crash ?


>why do you care about the runtime cost of something that makes your program crash ?

I think both parent commenters (htfy96 and Asooka) got on a tangent because htfy96 substituted a throw exception for an assert() in gp's (hermitdev) code -- as if they were equivalent. However, exceptions have fundamentally different semantics from asserts and are not substitutes for each other. (For anyone unfamiliar with the different purposes of exceptions vs asserts, please read the top 2 answers in this Stackoverflow question.[0])

Things like array bounds checking (used as a runtime safety check instead of an "impossibility" check) would not be tested with assert().

To answer your direct question, programmers who use assert() correctly for "testing their understanding of the world by testing an impossible condition" don't want to pay a runtime penalty for it.

[0] https://stackoverflow.com/questions/1467568/debug-assert-vs-...


Exceptions have a setup cost. You pay that setup cost every time an exception frame is created.


That is not true on most platforms in use today. The Itanium ABI uses "zero cost" exceptions which do not have any setup or runtime cost when an exception is not thrown (at the cost of making throwing exceptions much more expensive than other schemes). 64-bit Windows and arm64 use similar schemes.


I guess I have to use weasel words in every post now.

Let's pretend my previous post started with

"In most cases, "


But that's the point, it's not most. GCC and Clang default to zero-cost DWARF. Add Windows and Arm64 and that's almost everything.


The only "relevant" platform where exceptions aren't zero cost nowadays is 32-bit windows - not even 64-bit one. so no, absolutely not "in most cases".


Easier is to embed file and line number and possibly stringified assert expression via a macro. The string approach might help as a comment when reading the code, but suffers from all the problems comments do.


I usually just make my own AssertSz() macro that can take printf args. Works great for those pernicious asserts that need more data.


> The assert() macro in C is roughly just if ... { panic() } in Go

The article very clearly says that (unlike normal C behaviour) it's a no-op in release builds of SQLite, because otherwise they get a ~3x slowdown. Does Go compile away those `if` statements?


> Does Go compile away those `if` statements, or did you not read the article?

The HN guidelines specifically request that we not make comments like this.

> Please don't insinuate that someone hasn't read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."

https://news.ycombinator.com/newsguidelines.html


You're right, it was out of line. Edited, and thanks.


Well, thank you for taking criticism so well! We all get a little snippy at times - goodness knows I've done it myself more than I want to admit.

I was about to delete my comment since it was no longer relevant, but maybe I will leave it here since your reply and edit are such a great example of how to be constructive and civil. (OTOH, let me know if you'd prefer for me to delete it.) Thanks!


I find the fastest way to introduce bugs into programs is to compile programs differently in development than in production. It also warps developer's perspectives of how their program performs. As they say at NASA, "Test what you fly, fly what you test."

That said, you can do this easily enough in Go. Define a package:

  package debug

  const Debug = true

  func Assert(cond bool, msg string) {
        if Debug && cond {
                panic(msg)
        }
  }
Have your build system generate the file that contains Debug = true. In optimization mode, make it Debug = false. As it is const, the compiler will propagate it, Assert will be inlined and the dead code removed. If you don't like relying on the compiler, have your code generator generate empty Assert functions. If you are working in a very bad code base that catches all panics, you'll need an extra 5 lines of code in the implementation of Assert, and your stack traces won't be pretty. (Roughly: put the condition check in a goroutine, block the Assert call on a channel signal from that goroutine.) I'd recommend the first project be to remove those panic catch-alls.

But really, all of this is only appropriate when you are far out in the weeds. Ship your program with those asserts. If they are too slow for production they are too slow for development.


"Test what you fly and fly what you test" is what we live by at SQLite. Testing is done three times:

(1) One test uses compiler and preprocessor options set up to measure coverage. This is not so much a test of SQLite itself as it is a test of the test logic, to verify that the testing provides 100% MC/DC.

(2) The second test runs with assert(), ALWAYS(), and NEVER() enabled. This is like a unit test. It verifies internal assumptions and state, at the expense of running 3x slower.

(3) Finally, we build as for delivery and test once again. This is the string test, where we "test what we fly".

All three test runs must give the same answer (modulo performance) before a release.

During day-to-day development work, we usually run (2), but occasionally toss in a (1) or (3) just to confirm that we haven't introduced any gaps in test coverage, or bugs that are masked by the debugging logic.

So you are correct that you should not do all your testing using one configuration and then deliver a different build. But that does not mean you can't have extra logic in your code that helps do unit testing and debugging during development and which is excluded from release builds. You just need to make sure that you rerun all tests in the release configuration.


That will still evaluate the cond in all cases. Passing a func will make it lazy but might have performance implications (I’d have to test it).


It won't, because of inlining and dead code removal. (At least, it didn't back around Go 1.11 when I last tested this.)


If you do Assert(ExpensiveOp()), removing all of it is only a valid optimization if ExpensiveOp() is 100% side-effect free. But even when that is the case, the compiler won't inline ExpensiveOp() and go through the effort of proving that it's side effect free if it's sufficiently complex.


Maybe if the thing that is generating the cond at the call location is inline-able, but not if it's doing an expensive operation.

https://play.golang.org/p/Ts6QqRrWwuQ

I couldn't get that to not trigger the lock.

[later] I could get it to remove it if you pass in a func instead.


Your Cond() has a side-effect. It can never be optimized away.


Correct. Which is one of the differences between this approach & macros. Though the lazy approach appears to work.


It’s not the if staments: for an assertion the panic branch should be ice-cold and likely proven by invariants up the stack frame — other assertions or plain old conditionals — so in theory should be optimized away every time. It’s the expression in the condition that’s often expensive. The assert macro eliminates these trivially with the preprocessor , but go cannot as easily because the condition expression may have side effects.


Help me understand why this matters? Is it really Hipp's argument that Go is unsuitable for serious systems work because invariant signaling in the code is too slow? That seems pretty nitpicky, and also ironic since the "serious" systems language to which he's comparing Go is C.


Not being able to have code that compiles in debug but doesn't in release is a pretty serious issue for CPU-bound performance-critical code. Otherwise, you're going to be commenting things out a lot or using preprocessors, both of which are annoying and slow down development velocity.

As an example, checking errors on GPU is enormously slow because you stall the CPU until the hardware FIFO drains every time you do. You can't ship software that checks for errors after issuing each DirectX or OpenGL call. But writing graphics code is also really annoying without a debug mode that checks for errors after each render command, because the failure mode of graphics programs tends to be "black screen" or "screen full of garbage".


Isn't a preprocessor the right place for something like this? It doesn't have to operate at the level of source code but if you want your actual code/AST, not just the behavior, to be different from dev to prod then you need a language that compiles to Go?

Baking that into the language seems like it just replaces Go with that meta-language and hides the Go underneath like how "C" is really CPP-lang with a C compiler underneath.


I would take anything Dr Hipp says very seriously. While he may be wrong, I wouldn't discard his thoughts without some facts to back it up. SQLite, both its functionality and amazing test suite, as well as his other achievements, certainly make him worthy of my respect.


Not to be snarky, but because it's relevant to the discussion:

https://www.zdnet.com/article/sqlite-bug-impacts-thousands-o...

This stuff doesn't happen in Go and Rust programs.


What does that specifically have to do with golang or Rust?


Hipp's argument --- which comes kind of out of nowhere in his post --- is that Go is unsuitable as a serious systems language because it lacks compile-time asserts. Which is, I think, ironic, because his language is C, and so produces memory corruption vulnerabilities that languages like Go and Rust preclude.

I'd rather have Go's guarantees than C's.


In go you can use build tags to achieve this. The compiler will drop function calls to empty function implementations (tested on go 1.11.5).


Error handling is quite different in Common Lisp than in other languages. Assert is also different. An assert has up to three parts:

1. The expression to be asserted

2. A set of places (ie variables/references)

3. A format datum and arguments (like sprintf)

When an assert fails you see the formatted message from (3) and get some options:

- invoke a restart to some place higher up the control stack

- invoke the restart generated by the assert which asks for new values for the set of places in step 2, and then retries the assert

- abort

- crash

These options may be invoked by the program itself or by the user if the debugger is invoked


I don't think assert got used correctly if it needs a meaningful error message, personally. When an invariant of my algorithm fails, I've generally found I have a fundamental logical error with no particular relationship to the point in the code where it fails.


Does panic() in Go give you source code line numbers and file names?

As far as I'm concerned, assert() has two jobs, both essential: Killing the program AND giving an error message which gives the developers knowledge of precisely where the deadly assert() is. Doing only one of those things leaves the job undone.


panic gives you line number and stacktrace


The Go compiler has `assert` and uses it for checking invariants, as mentioned in the article. I love Go, I'm down with the decisions the designers have made for the language but not exposing a proper and simplified assertion statement/expression while expecting the user to make their own doesn't sit well with me.


Isn't go compiler written in go? Is it different version of go that has `assert`?


They already have precedence in this, having a special version of the language only the golang authors can use (e.g. generics and asserts), and then claiming that they're not useful or too complicated for others to use.


> The CORRUPT_DB macro is used in many assert() statements. In functional testing builds, CORRUPT_DB references a global variable that is true if the database file might contain corruption. This variable is true by default, since we do not normally know whether or not a database is corrupt, but during testing while working on databases that are known to be well-formed, that global variable can be set to false.

This is interesting - this really reveals their philosophy where they assume that until proven to be good, stuff is considered to be broken.


Not only does zig embrace this philosophy, it introduces the concept of "language-level assertions". For example, integer operations, both signed and unsigned, assert that no overflow occured. Likewise many of the casts, such as non truncating integer conversion, assert that the numerical value is unchanged.

In safety-protected build modes assertion failures trigger a panic, which is globally overridable, but the default behavior is to print a full stack trace.

Finally, the assertion behavior can be modified at any scope. So one can identify the performance bottlenecks and turn assertion failures into undefined behavior.


For reference, they use this code instead of assert:

    #if defined(SQLITE_COVERAGE_TEST) || defined(SQLITE_MUTATION_TEST)
    # define ALWAYS(X)      (1)
    # define NEVER(X)       (0)
    #elif !defined(NDEBUG)
    # define ALWAYS(X)      ((X)?1:(assert(0),0))
    # define NEVER(X)       ((X)?(assert(0),1):0)
    #else
    # define ALWAYS(X)      (X)
    # define NEVER(X)       (X)
    #endif


ALWAYS and NEVER don't pass-through their values consistently as claimed in debug builds, they pass through truthiness instead, with actual values in release builds. Perhaps this would be better for debug builds:

    #define ALWAYS(X) (assert(X), (X))
    #define NEVER(X) (assert(!(X)), (X))


That would duplicate X


In addition to, not instead of.


I've always found the name `assert()` to be particularly bad.

Many people do not know that their compilers optimise away assertions in `-O` ("release") builds.

They use assertions for control flow and input validation, which is very wrong.

I believe that could be trivially fixed by giving functions proper names that indicate what they do or what they are intended for. For example, calling such an optimised-away assertion function `debug_only_assert()` would immediately rule out such unintentional misuse.


> I believe that could be trivially fixed by giving functions proper names that indicate what they do or what they are intended for. For example, calling such an optimised-away assertion function `debug_only_assert()` would immediately rule out such unintentional misuse.

That is Rust's tack:

> [assert!] assertions are always checked in both debug and release builds, and cannot be disabled.

> Unlike assert!, debug_assert! statements are only enabled in non optimized builds by default. An optimized build will omit all debug_assert! statements unless -C debug-assertions is passed to the compiler.


This is misinformation: assert() is disabled if NDEBUG is defined, which is usually done on the command line.

In sane build systems the command line is visible by default.

Non-sane build systems display wrong percentage numbers instead, but that is a different topic.


Ha, reminds me of the time I was working on a project that worked great in debug mode and then crashed and burned when they went to compile it in release mode. Turned out someone had put the call to the initialization code inside an assert function. The bug took some time to figure out because all the crashes happened in code that was nowhere near the actual problem and of course when you turned on debugging the problem disappeared.


While I'm still a fan of the name `assert()`, one alternative that I like is `invariant()`. It seems to capture the idea a little better.

The debug vs release build is a bit of a problem still but I think part of the issue is around the idea that debug builds are too slow to use at all or take so long to build anyway so you might as well use release optimizations. This is something to address with faster compilers and better teaching (and perhaps encouraging folks to try learning how to use debuggers effectively).


That’s exactly what Rust does (debug_assert() is stripped from release builds, assert() stays).


Huh? I was worried I had been doing this wrong this whole time, but I haven't been able to replicate that with release builds.

https://gist.github.com/technion/7c74fad9efd1f14244e25a1cee3...


In many settings, "release build" implies passing -DNDEBUG, which disables the asserts.


> The SQLite developers believe that the lack of assert() disqualifies Go as a language for serious development work.

Great comment.


It seemed random and out of place to me. Like some sort of Wiki vandalism.

Why do I care about their opinion of Go in SQLite dev docs?


Whoever wrote that was probably shooting from the Hipp.


> Why do I care about their opinion of Go in SQLite dev docs?

Apparently some peoplr care, considering the number of comments on how Go can, does and should handle this usecase.


I did not understand this comment, as they also wrote ALWAYS, NEVER and testcase. Why could you not write your own assert in Go? (I have not programmed in Go)


I dunno either, because it's incredibly trivial.

    func assert(cond bool){
      if !cond {
        panic("whoopsies")
       }
    }


And then you can just #define assert(...) to eliminate it...oh wait, Go doesn't have a macro-preprocessor.


To play along....

Go does have build tags, so one could have 2 files for the assert function, each with opposite build tag logic. It does allow for an effective NOP (it would be a stub function and I believe the compiler would eliminate that).


I tested this and it does indeed eliminate that code.


I'm wondering if it inlines it (and thus eliminates it).

But this would come with a performance penalty: it makes the function you assert() in no longer a leaf function (it calls assert()) and AFAIK, Go only inlines leaf functions. (So, any function in which you call assert() is no longer eligible for inlining.)


chrchang523 is correct. In my test the caller function can still get inlined.


deathanatos: Go 1.12 can inline non-leaf functions, and I’d be surprised if it still fails to do the right thing here.


It would still evaluate the condition in both cases, unless it can guarantee that the condition is a pure function with no observable side effects.


You can always log to stderr with a stack trace and exit with a non zero static code. I’m guessing they’re referring to the fact that the built in panic keyword can be “caught” further up the stack a la an exception.


IMO, far better to log, then abort/core. I want to be able to inspect the state of the failed case, not just know that it failed.


It’s far better to let the caller decide. As a user of a library, I absolutely do not want a library terminating my process. Sure, some programs can benefit from this, but some programs need to guarantee that they keep running. Being able to recover (Go’s counterpart to panic()) allows the caller to decide which behavior they want.


I'm not sure I agree. There's a very, very small limited set of conditions that a client can recover from a condition that would trigger an assertion failure. Dont want an abort? Then use the release libs where those have been preprocesed out.

Conversely, at a former employer we used a set of 3rd party database libs for ODBC on Linux that for the smallest config error would emit nothing to stderr and call exit(1) from a shared lib. that was beyond annoying


I was afraid someone would mention preprocessing out the asserts :)

That also has its advantages, but would argue that it is impossible to selectively opt in or out.

Honestly, for a well tested piece of code, the point is moot, since the assert(s) will never evaluate false anyway.


> I was afraid someone would mention preprocessing out the asserts :)

Problem is, you need to understand what you're using, and like it or not, assert is defined as a macro by the C standard. And, assert is conditionally defined (and it's not the only conditionally defined macro).

I know its long, verbose, and dry; but if you want to understand C (or C++ for that matter), you really need to read the ISO standards that correspond to the implantation.


You can also easily enable runtime failures so the point is indeed moot.


If you're in the same process as the code that had its invariant fail, how can you ensure that your process is in a safe state afterwards? A failed invariant, by definition, means that you're in uncharted territory - any anticipated failures are (assuming proper design) reported via the usual error mechanisms.


To draw_down: your comments are showing up as dead,

As for assert in Go... couldn't you just make a function that takes a Boolean and panics?


You could, but then when you get an error, you'll see the file and line number of the assert function rather than the code that actually produced the error.


The runtime package allows you to grab the current trace, you can just pop the last frame off and print that.


True. The only other challenge is that in production software you rarely want to call panic()


As another comment mentioned you can just use build tags to make assert a no-op in release builds


Also, there is no way to eliminate the assert clause in a high-performance (release) build.


I haven’t worked with a lot of code bases, but am I correct in assuming this level of precision with error handling is fairly rare? Impressive stuff.

Orthogonal: one of my favorite talking points about Erlang is the fact that nearly every line of code is an assertion, and that those assertions are always enabled, even in production.

You’d not want to use Erlang for truly high performance code, granted.


Interesting point about Erlang! Where can I find more info on that specific aspect?


If you can tolerate a video, my talk from Midwest.io covers it to some degree.

It’s really just a happy byproduct of the overall design of the language and VM. The two primary features that make it happen are immutability and ubiquitous pattern matching.

https://youtu.be/E18shi1qIHU


Ooh, I love the idea of `testcase(X)` to instrument test coverage.


Interesting how it's different to branch coverage. If you have

    if(X) {}
    else {}
    if(Y) {}
    else {}
"normal test coverage" is weaker than SQLite tests with the following lines added:

    testcase(X && Y);
    testcase(X && !Y);
    testcase(!X && Y);
    testcase(!X && !Y);
I wasn't sure when reading the article, to be honest, but making sure you test things you don't explicitly branch on is kinda cool.


I'm probably being a blockhead but I didn't quite get how the testcase macro is used in instrumenting coverage. Can you explain that?


The testcase macro is the following:

  # define testcase(X)  if( X ){ sqlite3Coverage(__LINE__); }

If the testcase(condition) always evaluates to true, the code coverage analyzer would complain that the if ( X ) {sqlite3...} statement never evaluates to false and the branch coverage drops below 100%, the converse holds conversely if it was never evaluated to true in the first place. Only if the code using the testcase macro gets repeatedly called in a way so that the testcase macro evaluates sometimes to true and sometimes to false, the coverage stays at 100%.

sqlite3Coverage() does some dummy work so that the call will not be optimized away by the compiler.


“The SQLite developers believe that the lack of assert() disqualifies Go as a language for serious development work.“

Spicy!


> In SQLite, the presence of assert(X) means that the developers have a proof that X is always true.

Almost impossible to prove anything in a sufficiently complex system such as SQLite. The difference between assert and always seems arbitrary IMHO


You're only using it locally for specific invariants. That doesn't have to depend on the system's complexity. For example assert(2+2==4) works the same, regardless of it's only a one line app, or in the middle of an OS kernel.


The difference is that assert has been “proved” to be true, while ALWAYS is merely thought to be true.


I’m still uncertain what they mean by “prove” though.


Probably the same mechanism by which most proofs work, not fully formal but it seems to follow. Without the proofs in comments (or transitively through other asserts) it's up to the reader to recreate the argument, though. The purpose is to be more explicit than a comment, I suspect, using the same notation as the code.


sqlite has extremely thorough test coverage with something approaching 100% branch coverage and such wide production usage it's basically proven that a lot of this stuff works.


Most (all?) of the production usage probably doesn't enable the assert though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: