How is LLVM tested?

DannyBee · on March 25, 2016

"Amount of unit tests is pretty small comparing to regression tests. One reason behind that decision is that LLVM internals constantly change all the time. "

Well, no, and i'm not sure where they got this. It's not a "decision". BTW, This also assumes that all of the tests in the unit test dir are unit tests, and all the tests in test/ are regression tests.

This is explicitly false.

The general viewpoint of LLVM is that most things people write (analysis, optimization) should be tested by checking the IR or dumps. That is, we want easy to follow and modify unit tests, not just compilable C++ code ones.

Thus, you will find a lot of unit tests that are in test/, instead of unittest/, because they are testable using IR printers.

Those things that are API's should be tested by unittests, and work is underway to increase coverage. The reason some things aren't cover is simply that in the earlier days of the project, not all of this got unit tests, and it's only in the past couple years that people have become sticklers about it.

So that one would be knowingly "not enforcing good development practices", not "llvm internals change all the time". The fact that internals change all the time is in fact a good reason to have unit tests.

Joky · on March 25, 2016

The lit test are not really "unitests" per-se, even if we try to consider them as it. We are writing them to stress a specific part of the compiler (can we vectorize this specific construct for instance), but they still involve a large chunk of the compiler. The reason we don't have more real unittests is IMO because: - it is hard to write (you need to mockup the analysis result that is used by the vectorizer if you want to test it in isolation). - it adds lot of maintenance burden/overhead - it does not add much value for the compiler over the lit tests.

DannyBee · on March 27, 2016

"The reason we don't have more real unittests is IMO because: - it is hard to write (you need to mockup the analysis result that is used by the vectorizer if you want to test it in isolation)"

Right, and it's infinitely easier to just dump the result of the analysis and check that it said what you want using lit.

We even have a framework explicitly to do that -analyze for opt is specifically meant to run analysis and dump their output.

This doesn't test API boundaries, but the dumpers and printers are usually forced to be written in terms of the API.

(and folks are starting to get religion about testing the other API boundaries)

kderbe · on March 25, 2016

If you're interested in the testing techniques used by established open source software, here are a couple projects that appear to have far more extensive test coverage than LLVM:

SQLite ( https://sqlite.org/testing.html )

Opus ( https://www.ietf.org/proceedings/82/slides/codec-4.pdf )

I don't mean it as criticism of the LLVM project, as I haven't worked with it and can't say whether its tests are sufficient. Only that, from reading this overview, its testing techniques do not seem particularly elaborate.

vmorgulis · on March 25, 2016

Thanks too.

Opus testing is impressive:

> Run thousands of hours of audio through the codec with many settings

- Can run the codec 6400x real time

- 7 days of computation is 122 years of audio

illumen · on March 27, 2016

LLVM wrote a state of the art fuzz tester. They have HUGE pieces of software compiling successfully and passing their unit tests.

So, yes, the testing of LLVM is &quite& elaborate.

Oh, sqlite uses this fuzzer to test itself.

pthreads · on March 25, 2016

Thanks for the links. You are right, these projects seem to have some extensive test coverage.

a_imho · on March 25, 2016

Very nice. Anyone could point me to LLVM's code coverage and whether they use TDD?

"Amount of unit tests is pretty small comparing to regression tests. One reason behind that decision is that LLVM internals constantly change all the time. Supporting tests under such conditions is very time consuming."

Isn't the whole promise of testing is easier maintenance in the long run? Also, regression tests seem to run much faster, is it because of the Google Test?

jjoonathan · on March 25, 2016

Yes, but if you're not wise about how you test it's easy to not get the bang for your buck. Cargo cult testing is a very real problem.

Example: I was extending some code the other day that had 3kloc of thin wrapper code and 30kloc of mocks and unit tests which effectively did nothing but ensure that the 3kloc passed things through unaltered. Meanwhile, the code is widely acknowledged to be frustrating to interface with because it uses a string-based API that isn't documented (and no, those unit tests won't tell you semantics), versioned, or maintained with an eye towards compatibility. Naturally, there are no integration tests with the layer above or below it, and those interfaces are where all the actual bugs happen. So the tests increased cost by a factor of 1000% and had 0 return.

Also see: cargo cult comments. You know, these guys:

    /**
     * Gets the foo.
     */
    Foo getFoo() {return foo;}

Anyway, my personal rule of thumb is to start with integration tests and only drop to the granularity of unit tests if I've got a "logic hairball" on my hands where the interfaces are much simpler than the internal logic (parsing and algorithm implementations, usually). I'd be happy to hear other opinions though!

lfowles · on March 25, 2016

My experience with integration tests first is that a handful of use cases get stuffed into these test cases, but it's harder to figure out what you are actually testing. It's good to test that everything comes together and works in this one cherry picked case, but I'd rather be able to give the more granular guarantee that all of my components hold up their individual contracts.

ZenoArrow · on March 25, 2016

Doesn't it make sense to focus on contracts at a higher level of abstraction though? Wouldn't it be better to put the contracts at the level of the user rather than setting contracts for most/all of the functions in your code? If the mocked user input resulted in the correct output and there were no other side effects, wouldn't that be sufficient in many cases?

lfowles · on March 25, 2016

There might be confusion about terminology here. I think of unit tests as testing contracts of individual units and then each appropriate level of abstraction has its own set of unit tests. Integration tests are those that use the external user interface and represent a use case, BUT a unit test of layer N+1 could effectively be an integration test of layer N (I just wouldn't call it that if it was using a mocked layer N interface).

jdmichal · on March 25, 2016

> ... BUT a unit test of layer N+1 could effectively be an integration test of layer N (I just wouldn't call it that if it was using a mocked layer N interface).

Fixing the problem of unit tests becoming de-facto integration tests is exactly what mocks are for. If you don't mock your dependencies, then you are in fact executing an integration test. The problem becomes that your mocked dependency and the real dependency can now drift because there's nothing tying them together. So a unit test works but you still have an integration bug.

Personally, I think both unit and integration tests are really important, but that integration tests tend to get a bit underlooked with all the existent zeal towards unit tests. Testing a component's contract is obviously necessary, but testing that your components are expecting the correct contract from others is also important.

lfowles · on March 25, 2016

I agree, both are really important. I think integration tests get passed over because they require a higher level view of your software. Personally, in my day job, I'm not extremely familiar with the domain, so I'm pretty ineffective at writing sensible integration tests. I can, however, pass my unit tests/contracts off to the guys that are familiar with the domain so they can shape up an integration test.

jjoonathan · on March 25, 2016

My experience with production/enterprise/whatever code has been that bugs at interface boundaries exceed bugs interior to an interface, so I like to check that A plays nice with B plays nice with C rather than ensure that A, B, and C will be individually robust players in the game of blame volleyball. That won't save them from inter-component misunderstandings.

As always, exceptions: algorithms, logic hairballs, parsers, anything with a sufficiently low surface-area:volume is likely to generate significant bugs in the interior, so it can likely benefit from a unit test.

lfowles · on March 25, 2016

I'm writing a translation wrapper (very high surface area:volume), but I might have the advantage that A and B are in my project, so I can have a good idea of how they behave. C isn't, but I have an additional test suite for the small bit I use, just to make sure it behaves reasonably well, and that if it doesn't, that I recognize something is wrong.

jjoonathan · on March 25, 2016

Yeah, that's reasonable -- haven't had to do that yet in production but I'd probably do the same.

wyldfire · on March 25, 2016

> Isn't the whole promise of testing is easier maintenance in the long run?

Well, it's not as if the point is regarding testing vs not, it's regarding the balance of what kind of tests LLVM should have.

LLVM and Clang have extraordinarily well-defined interfaces at their boundary (C/C++/LLVM IR|executables), so writing regression tests is bound to reap great value over time. Writing unit tests, less so. The benefit of unit tests over ones with larger scope is that it's often easier (or possible at all) to write tests that cover uncommon use cases/environments/errors.

A less tangible benefit of unit tests is that it helps guide sufficient decomposition of the design-under-test. From what I've seen, LLVM doesn't need any help in this area, they're extraordinarily well-disciplined here.

pygy_ · on March 25, 2016

> Isn't the whole promise of testing is easier maintenance in the long run?

It depends on the kind of maintenance. It makes it easier to add or refactor code without changing the semantics of what exists, but it makes it harder to modify the semantics.

A test suite that's too tight can be an impediment to change.

ZenoArrow · on March 25, 2016

> "Isn't the whole promise of testing is easier maintenance in the long run?"

Yes, but the structure of the tests matters. Generally speaking there are two major problem areas to avoid with unit tests:

1. Unit tests that are too close in design to the specific implementation. What this means is that instead of testing the desired behaviour in an abstract way, you mirror the design decisions made in the code being tested. When the code being tested changes, this can break the unit test, even if the new code has the desired behaviour.

2. Test-induced design damage. Unit testing relies on being able to isolate the code being tested. This can lead to an increase in code complexity, as well as a less flexible design.

Both of these issues can be overcome, but it does mean that you have to consider what to test and how to test it, as badly applied tests can cause maintenance issues.

mining____ · on March 25, 2016

I suspect they mean that unit testing internal stuff that changes regularly means rewriting unit tests to use the new internal implementation details.

In contrast, the regression tests don't have the dependency on implementation detail, just the interface.

krzrak · on March 25, 2016

Very interesting article, thanks!