Stranger Strings: An exploitable flaw in SQLite

gregors · on Oct 25, 2022

Great work!!!

"SQLite is extensively tested with 100% branch test coverage. We discovered this vulnerability despite the tests, which raises the question: how did the tests miss it?"

Tests only validate known behavior. They didn't have a test that exercised a large string. I don't know how their test suite works but I'm curious if quickcheck might have possibly found this or if the 32-bit viewpoint of the programmer at the time would have also limited possible generated values?

perlgeek · on Oct 25, 2022

> how did the tests miss it?

I'll offer another explanation: branch coverage != path coverage

You generally cannot cover every possible code path through a program, because even for n branches, the possible paths scale as 2^n, and for loops it's even worse.

Once you have 100% branch/statement coverage in your tests, all bugs that are left are those that arise from the interaction of two or more statements, in a way that wasn't covered in the tests.

kyralis · on Oct 26, 2022

> Once you have 100% branch/statement coverage in your tests, all bugs that are left are those that arise from the interaction of two or more statements, in a way that wasn't covered in the tests.

A bit tangential, but this statement is not accurate. You can have complete branch/statement coverage and still have single-line bugs due to not fully testing the domain/range of all relevant variables; the simplest example is the potential for divide-by-zero bugs in otherwise fully tested code paths.

100% branch/statement coverage is necessary for real confidence, but it is sufficient for very little.

insanitybit · on Oct 25, 2022

https://www.sqlite.org/testing.html#mcdc

This is what sqlite means by "branch" coverage, it is in fact a bit stricter.

You can see in this document how significant the sqlite testing is, the 100% MC/DC is just one component.

taeric · on Oct 25, 2022

I don't think that tracks. Consider:

    if (a) {
       doA();
    }
    if (b) {
       doB();
    }

With (a,b) as (true, false) and (false, true), you satisfy every outcome of each decision test, but you never tested "doA()" and "doB()" at the same time.

I think there is some interpretation of that last bullet, "Each condition in a decision is shown to independently affect the outcome of the decision" that is inline with the idea. But that sounds more like on an statement like:

    if (a && b)

that you test changing the values of a and b independently to show that either one of them can cause the check to pass.

insanitybit · on Oct 25, 2022

By "this" I meant what I linked, not "what you just said". I realize in retrospect that I was unclear. I'm not sure if their approach would imply satisfying the (true, true)/ (false, false) case given that they are separate branches.

taeric · on Oct 25, 2022

Right, but I think that is what the original poster meant. There is a "path" through my example that hits both "doA()" and "doB()." And having 100% branch coverage doesn't necessarily get you there. Even under the stricter rule that they are using.

insanitybit · on Oct 25, 2022

Yep, we're on the same page.

taeric · on Oct 25, 2022

Coming back to this, I understand your previous post about "this" now. :D Apologies on misunderstanding that the first time around!

capableweb · on Oct 25, 2022

Unsure if Property Testing would actually have found this issue, 2 billion bytes is a very large string and usually you wouldn't put the limit so high, the tests would run for too long.

Fuzzing would be a more appropriate measure to have found this issue, as usually you do fuzzing with no limit of length/size, as that's the entire point of fuzzing to expand the scope of what you're testing.

Regarding test coverage, I'd argue that even caring about test coverage is giving people a false sense of security. It's easy to achieve 100% test coverage, but that doesn't actually say that you're testing the right/wrong thing, just that at one point that line of code was executed, but again, that says nothing about correctness.

shagie · on Oct 25, 2022

> Fuzzing would be a more appropriate measure to have found this issue

SQLite has some fairly extensive fuzzing. https://www.sqlite.org/testing.html#fuzz_testing

One of the sections on Fuzz testing:

> 4.3 Boundary Value Tests

> SQLite defines certain limits on its operation, such as the maximum number of columns in a table, the maximum length of an SQL statement, or the maximum value of an integer. The TCL and TH3 test suites both contains numerous tests that push SQLite right to the edge of its defined limits and verify that it performs correctly for all allowed values. Additional tests go beyond the defined limits and verify that SQLite correctly returns errors. The source code contains testcase macros to verify that both sides of each boundary have been tested.

chasil · on Oct 25, 2022

It sounds from the article that "-fstack-protector-all" might remove everything except the denial of service.

"...arbitrary code execution is confirmed when the library is compiled without stack canaries, but unconfirmed when stack canaries are present, and denial-of-service is confirmed in all cases."

Everything should be compiled with all available compiler defenses:

https://manpages.debian.org/testing/devscripts/hardening-che...

chasil · on Oct 25, 2022

It does appear safe with the OS package on CentOS 7.

  $ hardening-check /usr/lib64/libsqlite3.so.0.8.6
  /usr/lib64/libsqlite3.so.0.8.6:
   Position Independent Executable: no, regular shared library (ignored)
   Stack protected: yes
   Fortify Source functions: yes (some protected functions found)
   Read-only relocations: yes
   Immediate binding: no, not found!

adrianN · on Oct 25, 2022

It is absolutely not easy to achieve the kind of coverage that sqlite has. You need massive amounts of tests for that. Of course coverage is not sufficient, you can exercise code without checking any properties at all. But to say that it gives a false sense of security is too harsh.

capableweb · on Oct 25, 2022

> It is absolutely not easy to achieve the kind of coverage that sqlite has.

It is though. Give me one project and I'll send you a test suite that "tests" 100% of executable lines. It won't have any actual assertions, but that's not what code coverage is about anyways, so hopefully you won't mind.

Number of assertions per line of code is about as good metric for how well tested something is as code coverage, which means just about nil.

chasil · on Oct 25, 2022

Compliance with the DO-178B standard was not easy for the SQLite maintainers. It required several years to achieve.

By adhering to the DO-178B standard, the testing is confirmed extensive and satisfies requirements for use in avionics.

"Rockwell Collins... introduced me to this concept of DO-178B. It’s a quality standard for safety-critical aviation products, and unlike so many quality standards, this one’s actually readable... one of the key things that they push is, they want 100% MCDC test coverage.

"Your tests have to cause each branch operation in the resulting binary code to be taken and to fall through at least once.

"At the machine code level... actually, MCDC’s a little stricter than that. Let’s not dwell on the details, but I had this idea, I’m going to write tests to bring SQLite up to the quality of 100% MCDC, and that took a year of 60 hour weeks. That was hard, hard work. I was putting in 12 hour days every single day. I was just getting so tired of this because with this sort of thing, it’s the old joke of, you get 95% of the functionality with the first 95% of your budget, and the last 5% on the second 95% of your budget. It’s kind of the same thing. It’s pretty easy to get up to 90 or 95% test coverage. Getting that last 5% is really, really hard and it took about a year for me to get there, but once we got to that point, we stopped getting bug reports from Android."

https://corecursive.com/066-sqlite-with-richard-hipp/

Indeed, SQLite is used for flight systems on the Airbus A350. No other major database vendor is certified to do so.

https://www.sqlite.org/famous.html

adrianN · on Oct 25, 2022

How much do you charge? Because adding assertions to tests that give MC/DC coverage is a lot cheaper than writing them. I know a couple of big companies who might be interested.

Someone · on Oct 25, 2022

> Give me one project and I'll send you a test suite that "tests" 100% of executable lines.

I don’t think you can do that with every project. For example, a function f may do:

  if (p == null) {
    abort();
  }

If all callers also do that test, there’s no way to hit that abort() line in the program. Your test suite may not be able to hit it either. Maybe f is a private function, or that test occurs on one of its intermediate values, or f accidentally has two identical tests:

  if (p == null) {
    abort(“p is null”);
  }
  if (p == null) { // copy-paste bug
    abort(“q is null”);
  }
  if (r == null) {
    abort(“r is null”);
  }

Even if it is possible to hit that line, finding a path through the code that passes in a null value can be extremely hard.

As a simple contrived counterexample, try getting 100% coverage on a function that does if(secure_hash(foo) == 0) bar();

phoe-krk · on Oct 25, 2022

> Your test suite may not be able to hit it either.

Minor nitpick, process terminations are unit-testable in general; see e.g. https://github.com/google/googletest/blob/main/docs/advanced...

garaetjjte · on Oct 25, 2022

They don't count assertions in testing coverage: https://www.sqlite.org/testing.html#coverage_testing_of_defe...

jonhohle · on Oct 25, 2022

What is fair in testing and what is being tested? If I can stub `secure_hash` and `bar`, that’s a pretty easy test to write. Assuming the symbols are visible at test time, stubbing those is relatively straight forward in C, Java, Bash, and Ruby, JavaScript, etc.

Even in the above examples, a linter should point out that the second `p == null` is inaccessible. I’m very pro test coverage, but static analysis shouldn’t be thrown out just because unit tests are available.

jonhohle · on Oct 25, 2022

> It's easy to achieve 100% test coverage

If code hasn’t been written to be testable, 100% coverage is not _easy_, by any stretch. Any legacy code base written without unit tests that I’ve come across required significant refactoring to get any basic assertions I place, let alone any reasonable coverage.

I also think there is some value in high coverage, even without assertions. At a minimum it’s a guarantee that code has been run before production/distribution. That’s probably a lot more than can be said about a lot of code.

ajross · on Oct 25, 2022

Yeah, exactly. People tend to misunderstand fuzzing, and think of it as "feeding random garbage at an API". Fuzzing, done with proper tooling, is automated coverage testing. You have a bounds overflow that depends on input data but only if condition A and B and C are true in the runtime state? Fuzzing will find it, almost always, by detecting the branching behavior of your code.

So all the replies here saying "the tests didn't find it because they didn't cover the case" are missing the point. There are test paradigms that don't rely on humans to enumerate cases.

marcosdumay · on Oct 25, 2022

> Unsure if Property Testing would actually have found this issue, 2 billion bytes is a very large string and usually you wouldn't put the limit so high, the tests would run for too long.

Yet, this is a textbook case of issues caught by static analysis.

pfdietz · on Oct 25, 2022

Coverage testing is just a special case of a more general concept: mutation testing (where the mutation is "if you reach this line/branch, blow up.")

Perhaps mutation testing could have revealed the test suite was lacking in a test detecting that bug (for example, a mutation that changed a type from unsigned to signed or vice versa.)

Quekid5 · on Oct 25, 2022

I'm so sad that so few test frameworks/libraries offer mutation testing. (And its profiling cousin Causal Profiling.)

Admittedly the some languages might make them very hard to do, but the payoff would be so, so sweet.

taeric · on Oct 25, 2022

My team at work took it out of our product. :( I couldn't make strong arguments that it was providing value, as it had turned into the slow test that always passed, by the time new folks joined. I do think it has a lot of merit, though.

pfdietz · on Oct 26, 2022

To be clear... mutation testing is a way to assess the power of a test suite, not a way to test software itself (in this it is analogous to using statement/branch coverage to assess a test suite). Do you mean the tests added to achieve high mutation coverage were not worth the time it took to run them? The mutation process itself is not something that would be part of CI. Doing all that mutation and generating the tests to kill non-equivalent mutants is very time consuming, I will admit.

taeric · on Oct 26, 2022

On the code base in question, the full but with mutation tests was only 8 minutes. Not snappy, but hardly a disaster.

pfdietz · on Oct 26, 2022

I'm guessing these tests were run constantly, in CI, not periodically (say, daily, or even weekly, or before release). SQLite has tests that take many hours that must pass before release.

There are testing modes that are worth doing, and I think should be part of a development process, but shouldn't be run at every check-in. Long soaks of random input property-based testing is like this, for example.

taeric · on Oct 26, 2022

We run a lot of tests constantly. So, yeah. That said, as soon as it is in the release pipeline, I really encourage the team to not directly watch it. If we don't trust our tests to allow a change to go out, we need to fix the tests. Not train members to watch the deployment.

pfdietz · on Oct 26, 2022

It's not a matter of watching a deployment, it's a matter of having separate testing modes used in appropriate ways. Rapid tests should catch most bugs in CI, but more laborious tests could catch other bugs that would escape CI.

Personally, there's a free software effort where my contribution is running tens of millions of randomized property-based tests each day and registering bugs when failures occur. This supplements and complements the fast tests that all must pass before changes are merged to master.

taeric · on Oct 27, 2022

Apologies, I meant my point to be tangential. I agree with your general point. Our deployment pipelines go from build to prod though. With a general cadence on my current team of less than an hour. 6ish minutes to that is not a concern.

Even when I was on teams where the general cadence of getting changes out all the way to prod was on the order of a week, we still had changes getting to our beta environments in less than an hour. And that was on a team that wanted mutation testing, so we had it.

Edit: So my point was team mates that were worried about the 6 minutes added by mutation testing there, were the ones that would commit and then constantly refresh the page so that they could "approve" a change as it goes out. I pushed for all approvals to be automated, so there was nothing for them to watch.

admax88qqq · on Oct 25, 2022

Boundary Value tests are standard practice when doing extensive testing.

Bugs rarely occur for typical input values but more often occur at the "boundaries" or limits of the input domain.

That being said I don't really fault SQlites testing here.

The key takeaway for me is that even with exhaustive testing you can still have security issues like this, so perhaps what is needed is a language or methodology change.

If SQLite, a quality codebase with exhaustive tests, can have memory vulnerabilities, maybe we need to leave C behind.

chasil · on Oct 25, 2022

Your problem is that you need to replicate the DO-178B certification in Rust:

"I’m going to write tests to bring SQLite up to the quality of 100% MCDC, and that took a year of 60 hour weeks. That was hard, hard work."

Whatever comes out of Rust will be much worse until compliance is achieved. That's going to take substantial time.

gavinhoward · on Oct 25, 2022

SQLite is used in places that Rust has no compiler for and that no other language has a compiler for.

One severe CVE in about 20 years is no reason to rewrite in another language and remove support for some platforms.

admax88qqq · on Oct 27, 2022

I'm not saying we need to rewrite SQLite and I'm not sure why everyone is assuming I suggested rust.

But people need to consider how much testing SQLite did and still hit memory vulnerabilities when they consider picking up C for a new project.

nicoburns · on Oct 25, 2022

In the long term, I hope C stops being the default language that hardware vendors provide for their hardware.

gavinhoward · on Oct 25, 2022

Me too, but I also hope Rust is not that default language.

threatofrain · on Oct 25, 2022

Just how many machines are we talking about? Does anyone have estimates?

gavinhoward · on Oct 25, 2022

Probably every Airbus A350 XWB and a lot of GM, Nissan, and Suzuki cars. Maybe also a few or a lot of General Electric products. [1]

The Library of Congress also recommends it for preservation of digital content, which would be harder if Rust compilers were lost. If all C89 compilers were lost, it would be much more simple to create a new one to compile SQLite than it would be to recreate a Rust compiler.

[1]: https://www.sqlite.org/famous.html

RHSeeger · on Oct 25, 2022

Indeed. Tests do not tell you code is correct. They only tell you a list of ways it is not incorrect.

PathOfEclipse · on Oct 25, 2022

I think this thinking is short-sighted. Tests increase your confidence that code is correct, and they are much cheaper to do that formal verification, which itself still requires a leap of faith that there is no bug in the formal verification system, its implementation in the compiler toolchain, or the assumptions made about the underlying hardware.

You can cast doubt on anything. Will you be alive 30 seconds from now? You don't know for sure! But you can be pretty confident. Similarly, testing can increase confidence in code correctness. SQLite has an excellent security history overall, and it's testing strategy is part of the reason for that success.

pclmulqdq · on Oct 25, 2022

Formal verification also becomes practically impossible at a certain level of complexity.

SQLite is better tested than many silicon chips, so it's hard to fault their testing strategy.

Perhaps a few static analyzers should be run on the codebase to find possible bugs like this.

hulitu · on Oct 25, 2022

> Formal verification also becomes practically impossible at a certain level of complexity.

Then reduce complexity.

function_seven · on Oct 25, 2022

You can make things as simple as possible, but no simpler than that.

SQLite is a complex program, inherently. It's designed to allow users to do complex things, and as such cannot be simpler than its purpose allows.

I'm no formal verification expert—so I don't know if SQLite is above or below that fuzzy threshold—but "reduce complexity" is not always an available tool.

nottorp · on Oct 25, 2022

Well, int main (void) { return 1; } is most likely bug free. But it's not useful (except as /bin/true).

jwilk · on Oct 25, 2022

That would be a very buggy /bin/true implementation.

nottorp · on Oct 26, 2022

Right, but a correct /bin/false :)

hulitu · on Oct 25, 2022

> Tests increase your confidence that code is correct, and they are much cheaper to do that formal verification,

Your confidence as SW engineer. The purpose of a test is to look for unexpected behaviour, not for expected behaviour.

RHSeeger · on Oct 26, 2022

And to be specific, they generally look for _known_ unexpected behavior; in that you need to have a test for it.

Though clearly there's still unknowns, like when you're throwing random data at an api to see if you can find bugs. But even still, you know what _ways_ you're doing that. Tests generally won't turn up things nobody has thought to test.

aa-jv · on Oct 25, 2022

.. and combined with coverage they also tell you what you are yet to test properly.

kevincox · on Oct 25, 2022

Actually: It tells you some things that are not tested properly.

Just because something has coverage doesn't mean that it is tested properly as evidenced by this bug even though sqlite has 100% branch coverage.

Macha · on Oct 25, 2022

Yeah, coverage (or rather 100% - coverage) is basically a lower bound on what's not tested.

masklinn · on Oct 25, 2022

Even "coverage" is not enough, what 100% coverage? Lines? Branches? Paths?

Macha · on Oct 25, 2022

That's my point, coverage (whatever your chosen metric) can only prove the absence of tests, not their presence

e.g. if your coverage is 80%, then you know (100-80)% = 20% is definitely not tested since it never even ran in your tests. It doesn't tell you much about the 80%

fulafel · on Oct 25, 2022

s/properly//

paulmd · on Oct 25, 2022

> Tests only validate known behavior. They didn't have a test that exercised a large string.

Which is the classic thing with no-code and with Copilot-style assistance: writing the function that meets the spec is easy. Writing a comprehensive and correct spec is the hard part.

It's like thinking someone is a better programmer because they type faster or have a better keyboard or whatever: the words aren't the hard part, typing is maybe 10% of the time you spend programming at most. It's logical correctness in composing those functions, and being able to "hold the entire sculpture in your mind" so to speak.

rowanG077 · on Oct 26, 2022

A function in a programming langauge is nothing more then a spec. Programming is writing or translating the specification.

XCSme · on Oct 25, 2022

Proving that a code always works means that there doesn't exist any case in which it breaks. You can't really prove something doesn't exist, unless the input domain is of fixed size and you can test all of it. In self-hosted technology, it is even trickier to prove something won't break, as there is so much potential variation in the hardware/software that your program runs on. There might even be stupid edge-cases, like the program failing only when the device it runs on is upside-down, or placed at high altitude, etc.

rowanG077 · on Oct 26, 2022

You can prove things even if then input domain has infinite size. For example n < n + 1 is always true. It's trivial to prove this. Yet the n can have infinite different values.

XCSme · on Oct 26, 2022

True.

Just one note, there are still edge cases, if n overflows the data type, you would have n > n + 1

rowanG077 · on Oct 26, 2022

Very pedantic but: In that case the size of the domain of n is not infite.

naikrovek · on Oct 25, 2022

Is this what "formal verification" is? That seems very hard for software that accepts arbitrary input.

I understand formal verification, a bit, in terms of FPGA design, but not software.

mrcus · on Oct 25, 2022

“Program testing can be used to show the presence of bugs, but never to show their absence!”

hulitu · on Oct 25, 2022

Welcome to SW testing. Testing for limits is a no go. Testing for random input is a no go.

harryvederci · on Oct 25, 2022

"This bug is an array-bounds overflow. The bug is only accessible when using some of the C-language APIs provided by SQLite. The bug cannot be reached using SQL nor can it be reached by providing SQLite with a corrupt database file. The bug only comes up when very long string inputs (greater than 2 billion bytes in length) are provided as arguments to a few specific C-language interfaces, and even then only under special circumstances."

Source: https://www.sqlite.org/cves.html#status_of_recent_sqlite_cve...

chmaynard · on Oct 25, 2022

> The bug is only accessible when using some of the C-language APIs provided by SQLite.

I assume the SQLite CLI (sqlite3) uses these APIs. Could the bug be triggered indirectly by input to sqlite3? The article doesn't appear to address this question.

SQLite · on Oct 25, 2022

The CLI does use the relevant APIs, however I don't know of a way to reach this bug using a script input to the CLI. If there is such a path that I don't know about, it seems like it would require at least a 2GB input script.

_the_inflator · on Oct 25, 2022

This, and also there is the saying: "Never trust the frontend"

So in any case a string length check in the Backend should help here. Also if you get strings from a Browser Frontend, it seems unlikely you could create a payload of 2GB due to several reasons, one being string length limitations on the Browser side: https://stackoverflow.com/questions/34957890/javascript-stri...

torstenvl · on Oct 25, 2022

[EDIT: https://sqlite.org/forum/forumpost/3607259d3c appears to be only one way this bug can rear its head, not the core of the bug itself]

ferbivore · on Oct 25, 2022

That's a different bug.

torstenvl · on Oct 25, 2022

You're right, my mistake

efficax · on Oct 25, 2022

ah looks like they had 32 bit assumptions that broke on 64 bit chips.

zoomablemind · on Oct 25, 2022

I vagely remember that there were a few more places in SQLite codebase which could benefit from proper use of size_t type. Catching increment overflows is not that straightforward, perhaps some review specifically for such cases may help SQLite get even more robust.

remram · on Oct 25, 2022

... in one internal function. It seems that the assumptions totally hold for the library as a whole.

russellbeattie · on Oct 25, 2022

The root of this problem seems to be the widespread availability of 64-bit CPUs.

≥ When this code was first written, most processors had 32-bit registers and 4GB of addressable memory, so allocating 1GB strings as input was impractical. Now that 64-bit processors are quite common, allocating such large strings is feasible and the vulnerable conditions are reachable.

One has to wonder how much code written in the past 30 years has safety checks assuming 32-bit processors which now have vulnerabilities like this one. I'd bet quite a lot.

three14 · on Oct 25, 2022

Quick turnaround!

July 14, 2022: Reported vulnerability to the Computer Emergency Response Team (CERT) Coordination Center.

July 15, 2022: CERT/CC reported vulnerability to SQLite maintainers.

July 18, 2022: SQLite maintainers confirmed the vulnerability and fixed it in source code.

July 21, 2022: SQLite maintainers released SQLite version 3.39.2 with fix.

veltas · on Oct 25, 2022

> 100% branch coverage says that every line of code has been executed, but not how many times

Actually, 100% branch coverage says every line has been executed but also that every branch has been both taken and not taken. So e.g. every non-constant if statement is both taken and not taken.

IshKebab · on Oct 25, 2022

It's only possible to execute all lines if you take/don't take every if branch. They're the same.

mort96 · on Oct 25, 2022

No? Take this code:

    void foo(int arg) {
        int *x = malloc(sizeof(int));
        if (arg) {
            free(x);
        }
    }

If your test only ever calls `foo` with a non-zero `arg`, you have covered 100% of the lines in the function, but you never discover the memory leak bug in the case where `if` body isn't executed.

rhdunn · on Oct 25, 2022

An if statement without an else can have the case where line coverage does not cover the "don't take the branch" part of branch coverage. Therefore, if not taking that branch results in a bug line coverage only will not pick up that bug. -- For example, if you initialize a value in an if statement's then branch but don't have a default value for it or an else branch to provide a default.

Or even if you have a ternary operator (which has an implicit if-then-else take/don't take branch) that is usually only one line.

XCSme · on Oct 25, 2022

So also all combination of branches should be tested, right? Doesn't that lead to exponential test-case growth?

SCHiM · on Oct 25, 2022

No so that's exactly it.

Not all combinations are tested, all branches are both taken and not taken, but not all combinations of taken/not taken are tested.

j0hnyl · on Oct 25, 2022

It's nice to see such comprehensive research that covers seemingly every angle a bug, its context, and history.

marcodiego · on Oct 25, 2022

I wonder what would be the simplest/fastest kind of test that could detect this flaw considering that SQLite is extremely well-tested. Wouldn't mutation tests catch it?

sgt · on Oct 25, 2022

Impressive work

jeffbee · on Oct 25, 2022

When using respectable language, is it guaranteed that these types of overflows are impossible? Suppose for example I call C++'s std::string::string(const char*) ... is it fundamentally impossible for it to fail in this way, or is it just UB if you pass it a huge string?

LegionMammal978 · on Oct 25, 2022

The constructor is guaranteed to either construct a valid `std::string` or throw an `std::bad_alloc` exception if the allocation fails. If there were a size limit past which calling the function is undefined, that would have to be noted in the function's documentation.

jeffbee · on Oct 25, 2022

Honestly I don't see anything in the chapter and verse that supports your argument. It says that the constructor has a precondition that [s, s+traits::length(s)) is a valid range, which throws just a little uncertainty over the question without resolving it. It seems easy to create a range that is not valid (consider a very large starting position s, for example).

LegionMammal978 · on Oct 25, 2022

The requirements are perfectly well-defined. I'll be working with the N4868 draft of the C++20 standard [0]. First, `std::string::string(s)` is equivalent to `std::string::string(s, std::char_traits<char>::length(s))` (§21.3.3.3). Next, `std::char_traits<char>::length(s)` is equal to the smallest `size_t` value `i` such that `s[i] == 0` (§21.2.2, §21.2.4.2).

From here, we yield our only preconditions for the call being valid: `s` must point to either a `char` equal to 0 or an element of an array of `char`s that is eventually followed by a 0 (§7.6.1.2, §7.6.6), and there must be no thread or signal handler concurrently modifying it (which would cause a data race, §6.9.2.2). These are necessary for each `s[i]` access in the length computation to be valid.

Letting `n` be the computed length, we have our final precondition, that `[s, s + n)` is a valid range (§21.3.3.3). "Valid range" has a specific meaning in the standard (§23.3.1):

> A sentinel `s` is called reachable from an iterator `i` if and only if there is a finite sequence of applications of the expression `++i` that makes `i == s`. If `s` is reachable from `i`, `[i, s)` denotes a valid range.

Since `s` must point to an array of `char`s of length greater than `n`, `s + n` is clearly reachable from `s`. Therefore, `[s, s + n)` denotes a valid range, given our preconditions.

The only length restriction in the standard is that the source string must be no more than `SIZE_MAX` characters long (excluding the null terminator), since otherwise `std::char_traits<char>::length(s)` would have no correct return value.

[0] https://open-std.org/jtc1/sc22/wg21/docs/papers/2020/n4868.p...

jeffbee · on Oct 26, 2022

Thanks for the detailed answer.

There is another thing I found string able to do: throw std::length_error. But that's consistent with your statement that it either succeeds or throws.

So, going back to my question, it seems like the small leap from C to C++, if the author embraces the STL, does categorically prevent this class of overflow.

LegionMammal978 · on Oct 26, 2022

> So, going back to my question, it seems like the small leap from C to C++, if the author embraces the STL, does categorically prevent this class of overflow.

"Categorically" is a bit strong; if any index were ever truncated to a 32-bit integer before writing to s[i], that could also cause major issues.

Also, the C++ standard library does not have a monopoly on safe interfaces; all of the relevant APIs can be trivially reimplemented in C. It's just that C libraries have not historically made safe internal interfaces a priority.

leni536 · on Oct 25, 2022

That basically means that s+traits::length(s) must be defined behavior. If s is not a pointer to a 0-terminated string then the behavior of this constructor is undefined.

p-e-w · on Oct 25, 2022

It's bugs like this one that lead me to believe that yes, "rewrite everything in Rust" is in fact the solution.

I have the greatest respect for the quality of the SQLite codebase, which is one of the best-tested open source projects in existence, but at the end of the day, if formatting a string can lead to arbitrary code execution, it's time to question the foundations you're building on.

As long as C and C++ are considered acceptable languages for writing mission critical software, these vulnerabilities will never go away. We have incredible static analysis tools and hyper-advanced fuzzers driven by genetic algorithms, and yet this continues to happen despite every precaution imaginable being taken.

It's time to fix the problem at its root. Languages that are memory unsafe by default need to be phased out sooner rather than later.

tinalumfoil · on Oct 25, 2022

Sqlite is tested so thoroughly I doubt Rust's borrow checker (which is essentially just another test) would benefit it: https://www.sqlite.org/testing.html

Sqlite also has never had a severe CVE, from the parent's link:

> All historical vulnerabilities reported against SQLite require at least one of these preconditions:

> The attacker can submit and run arbitrary SQL statements.

> The attacker can submit a maliciously crafted database file to the application that the application will then open and query.

The OP's vulnerability requires passing an insanely long string to a C api, which isn't something you'd expect to be secure.

p-e-w · on Oct 25, 2022

It's not about the borrow checker in this case. The borrow checker deals with ownership, which mostly protects against thread safety issues, data races, and use-after-free.

Here we are dealing with a memory safety problem of a type that cannot occur in Rust unless you use `unsafe` blocks. Rust doesn't allow unsafe memory access such as pointer arithmetic or out-of-bounds operations (those produce a runtime panic). C happily accepts those, and the effects may include buffer overflows leading to arbitrary code execution.

This should never, ever be possible when writing a normal program. I love that Rust has named the keyword that enables such things "unsafe", because that's precisely what it is. That C has this behavior as a default is nothing short of madness, though of course C is an artifact of a time when such issues weren't understood as well as they are today.

schemescape · on Oct 25, 2022

So if SQLite was written in Rust, this would cause a runtime panic and thus would only be a denial of service CVE. Is that correct?

fps-hero · on Oct 25, 2022

The bug relies on undefined behaviour in the C language, but it’s exploitable is due to the way memory access operations are written in C. Equivalent Rust code would not be exploitable at a language level because that type of raw pointer array access isn’t possible with resorting to unsafe blocks. The undefined behaviour that caused the bug also doesn’t not exist in Rust.

chc4 · on Oct 25, 2022

Yes. If you index out of bounds in a Rust program, it will raise a panic. This is infinitely better than causing memory unsafety and creating a remote code execution exploit, and the best you can possibly do.

p-e-w · on Oct 25, 2022

If the code were entirely analogous, that is correct, unless the Rust code used some kind of `unsafe` tricks in order to increase performance (this is fairly common in advanced Rust code).

That being said, whether a runtime panic leads to DoS depends on the execution model. In web applications, threads are usually mapped to connections in some way, and a runtime panic only unwinds the thread it occurs in, so this wouldn't necessarily mean that other users experience service disruptions.

More to the point, I'd expect modern application and/or library code to include default safety checks that prevent scenarios like absurdly large queries from happening in the first place, and provide recoverable errors if such pathological inputs are encountered.

PathOfEclipse · on Oct 25, 2022

If your bounds checks can make your programs as much as twice as slow, then I think it's worth considering that bounds checking is not a panacea for low-level code: https://coaxion.net/blog/2018/01/speeding-up-rgb-to-grayscal...

danhor · on Oct 25, 2022

If the compiler can be sure that there is no bounds checking needed and can skip it. Rust provides facilities for this, C and C++ less so.

We have seen time and time again that humans aren't good enugh at that task, even in low-level code (and 2x is very buch an outlier).

chc4 · on Oct 25, 2022

I have bad news for you: both Chromium and Firefox allow any page to submit arbitrary SQL statements by default, which has caused several actually-for-real exploits, most of them related to buffer overflows due to integer overflow or use after frees, both of which are mitigated in Rust. https://bugs.chromium.org/p/chromium/issues/detail?id=900910 is a series of security SEV:HIGH bugs in Chromium and Chromecast, for example. https://bugs.chromium.org/p/chromium/issues/detail?id=116060... is a bug from December 2020 that is exclusively from sqlite.

https://www.sqlite.org/cves.html mentions this.

jcranmer · on Oct 25, 2022

> both Chromium and Firefox allow any page to submit arbitrary SQL statements by default

How do you do this in Firefox? Firefox never implemented WebSQL.

chc4 · on Oct 26, 2022

Sorry, you're right; Chromium and Safari do, not Firefox.

oconnor663 · on Oct 25, 2022

In this case Rust (or Go or Java or almost anything other than C/C++) would've caught this error with array/slice bounds checks. The borrow checker is usually more concerned with things like use-after-free, where other languages would lean on garbage collection.

0x457 · on Oct 25, 2022

Nah, in Rust out-of-bound index would be a panic, which is better than RCE.

I don't think SQLite needs to be rewritten in any language, tho.

oconnor663 · on Oct 25, 2022

Yeah to be clear by "caught" I mean "crashed cleanly at runtime".

_flux · on Oct 25, 2022

> Sqlite is tested so thoroughly I doubt Rust's borrow checker (which is essentially just another test) would benefit it

You may as well be correct about not helping here, but borrow checker provides proofs, not tests; a test can only prove the presence of a bug, while a proof can prove the lack of a certain kind of bug.

Ultimatt · on Oct 25, 2022

Until someone exploits the logic of the borrow checker leaving all Rust compiled programs vulnerable...

oconnor663 · on Oct 25, 2022

You might be under the impression that the borrow checker is executing at runtime in a compiled program. But it's a purely compile-time thing, like a type checker.

p-e-w · on Oct 25, 2022

Sure. But that bug only needs to be fixed once, whereas bugs like the one described in the link need to be fixed again and again and again and again and...

IshKebab · on Oct 25, 2022

> The OP's vulnerability requires passing an insanely long string to a C api, which isn't something you'd expect to be secure.

Yes it is!

Rust would help here - not the borrow checker, but the bounds checking.

counttheforks · on Oct 25, 2022

Isn't it? It might be naive, but I would not expect that:

1. Accepting user input on a webpage (e.g. a first_name field on a singup form)

2. Storing that user input in a database

Would trigger a RCE if the string was escaped properly.

gavinhoward · on Oct 25, 2022

I said this before [1], but one severe CVE in 20 years is no reason to rewrite SQLite in a language that has no compiler for some of the platforms it is deployed on.

Sure, write new stuff in Rust, but SQLite needs to stay in C at this point.

[1]: https://news.ycombinator.com/item?id=33331841

zimpenfish · on Oct 25, 2022

> no reason to rewrite SQLite in [Rust]

I would like them to try, though, because I suspect, barring a Bellard Event, no-one will be able to get close to SQLite's functionality[1], reliability, etc., in a new Rust implementation within 5 years (possibly even 10 years.)

[1] including being able to load extensions, integrate into basically everything as library, etc.

gavinhoward · on Oct 25, 2022

If there's no way to get close to it in a certain amount of time by starting from scratch, chances are there's no chance to do it with a rewrite in the same amount of time.

And again, SQLite is used in places that Rust has no compiler for. This is a hard constraint that precludes using Rust.

astrobe_ · on Oct 25, 2022

I wish people would abstain from that sort of insipid comments and offer instead suggestions about how to replace 40+ years of programming (that's just for C, BTW) and teralines of code in less than a century, and who's gonna pay for this. Because that's the crux.

p-e-w · on Oct 25, 2022

The stakeholders are going to pay for it. It's not an accident that companies like Amazon and Microsoft are betting big on Rust. The cost of code quality management and vulnerability mitigation is already astronomical. If Rust can lessen that burden even a little (and it certainly can), the effort of rewriting core technologies quickly pays for itself.

And most of those "teralines" of code are legacy garbage. The core OS, a basic userland, a database and a webserver are 90% of the world's critical IT infrastructure. That doesn't take a century to rewrite.

pclmulqdq · on Oct 25, 2022

For most of the world, though, that code is decades old, with decades of testing and battle-hardening.

You will lose all of that in a rewrite, which is why people don't do it.

jjnoakes · on Oct 25, 2022

I'm not arguing for or against, but:

> You will lose all of that in a rewrite

I don't think you lose everything in a rewrite in general - you still have all of the edge cases and business logic in the old code that you translate to the new code. It's not like you are implementing whatever it is from scratch.

Sure, there are opportunities for the translation to introduce regressions, but that's a vastly different argument (and I think a much more minor downside).

pittmajp · on Oct 26, 2022

I’m not sure what you’d gain is worth the cost. What’s the cost in dollars and cents of SQLite cves over the past 20 years, and what would the cost be of rewriting that entire codebase?

At some point you have to come to terms with the fact that some bugs shouldn’t be fixed.

jjnoakes · on Oct 26, 2022

I think the cost savings is more than just the cves. You also get faster features and have to write fewer tests. You can use higher level abstractions with fewer rough edges. You may even be able to take advantage of the ecosystem more.

And you don't have to rewrite everything at once, you can do it at the function or file level bit by bit. Perhaps the parts that are changing the most are done first, or perhaps the parts with the most bugs.

I'm still not saying rewriting is always right or wrong, but I think it isn't as cut and dry as some make it sound.

nottorp · on Oct 25, 2022

> a database and a webserver

Well I have this native app fetish. I mean why pay for 4 G ram for each electron app?

No, there is more legacy code than you list there.

nottorp · on Oct 26, 2022

Hey, I have something for you. This is how my Skype looks today:

https://imgur.com/LudK5vL

Please keep those web apps.

pornel · on Oct 25, 2022

As much as I love rewriting things in Rust, SQLite is of the least concern. With enough effort C can be made reasonably secure, and SQLite puts that effort. This particular bug wasn't a low-hanging fruit.

jmull · on Oct 25, 2022

Go for it?

The people who think this is the solution are the ones who will need to take up the work.

It's not like the people who don't think this is the solution will do it.

Get ready to work hard for 5, 10, 20 years... rewriting sqlite will be a monumental effort and take the laser focus of a strong development team for years, if not decades. It's fraught with risk. Think about it... how many years would it be before a stable version of sqlite based on rust was available with comparable performance and fewer bugs than the c-based version?

(Seems likely to me that it would never happen.)

I think you'd be better off creating a sqlite competitor, that could start off much smaller with a much smaller feature and much less stringent performance requirements that, nevertheless, has enough for many common cases. Of course, you'd still have to convince people to use it over sqlite. If your main selling point is "significantly fewer bugs than sqlite" I don't think it's going to be very easy to make that true, rust or not.

jeffbee · on Oct 25, 2022

Don't throw C and C++ in there together. C++ would clearly have prevented this particular CVE, if the author had used idiomatic C++ APIs.

pjmlp · on Oct 25, 2022

Yeah but that is mostly the problem with C++, and I am a big C++ fanboy, too many people write C style code with C++ compilers, to the point that Bjarne keeps doing tireless keynotes about how to write idiomatic C++.

Just the other day I found a relatively recent C++ tutorial on YouTube using Turbo C++ for MS-DOS, imagine how idiomatic it looked like.

pjmlp · on Oct 25, 2022

We don't need "rewrite everything in Rust", rather "rewrite everything in safe languages".

0x000xca0xfe · on Oct 25, 2022

How would you embed SQLite into other programs if it were written in Rust? Isn't the lack of basically all dependencies the main advantage?

(I'm actually curious, how does dynamic linking work with Rust? Can you build a .so/.dll/.h pair and ship it everywhere like with C?)

steveklabnik · on Oct 25, 2022

You can create a dynamic library, to be linked to just like any C library, from Rust. Doing so was actually some of the first production uses of Rust ever.

everywhere depends on what you mean; rustc does have support for fewer platforms than gcc. But it's still quite a few.

chmaynard · on Oct 25, 2022

Just curious, what language was used to write the Rust compiler?

nequo · on Oct 25, 2022

Originally OCaml, and now Rust:

https://github.com/rust-lang/rust

jraph · on Oct 25, 2022

Fair point, but the Rust compiler is not there at runtime. It also only needs to be written once for all the Rust programs, so assuming Rust is safer, you restrict the unsafety to the compiler (and the unsafe constructions of Rust, which are clearly marked as unsafe if memory serves).

(the Rust compiler is written in Rust, but the LLVM backend it uses (and soon GCC, IIRC), is still written in C++)

_flux · on Oct 25, 2022

Like C was used to write C and OCaml was used to write OCaml, Rust is written in Rust. Mostly, anyway, the GitHub repository seems to have also JavaScript, Python and Shell.

Though to actually get binaries out of it LLVM is used, which itself is written in C++.

xxpor · on Oct 25, 2022

Rust is written in Rust today, but it was bootstrapped in OCaml.

josefx · on Oct 25, 2022

C strings, not even once.