Google Is Uncovering Hundreds of Race Conditions Within the Linux Kernel

elchupanebre · on Oct 3, 2019

Yay Konstantin Serebriany and his team! The dude behind these sanitizers is quite brilliant. The Universe should know its heroes.

ChuckMcM · on Oct 3, 2019

Race conditions create meta-stable states. Fixing them always increases predictability of the code. It can also result in fixing formerly "cosmic ray" type bugs that occurred once or twice and were never seen again.

This is because one of the sources of very hard to reproduce bugs is a set of race conditions aligning just right.

hazeii · on Oct 3, 2019

Meta-stability [0] is a hardware issue, surely? Race conditions can create some crazy effects (BTDT) but at least they are fixable in software.

[0] https://en.wikipedia.org/wiki/Metastability_%28electronics%2...

ChuckMcM · on Oct 4, 2019

With all due respect to Wikipedia, the term meta-stability is applicable to any system issue where the behavior of the system is undefined when one or more of its inputs can be undefined.

The inputs to "software" finite state machines are typically inputs in the form of variables. Those inputs define which branches within the routine will be taken during processing.

You can model a subroutine as an FSM for which its "outputs" are its state, the FSM takes inputs, processes them, and sets new outputs or a new state.

We use fuzzing to expose the state machine to all possible combinations of inputs and this identifies all possible exit states from all possible input states, but assumes the input states are stable.

Multi-processing introduces the possibility of race conditions. In a race condition, one input state is present when the state machine is entered, but during execution the race completes which changes the input value.

We use 'stable' to define an input that has the same value across the entire duration of the state machine's state execution.

During a race condition, an input may change value one or more times across the time interval of the state machine's state execution. Thus at any instant in time the input has a singular value, during an interval in time that value may have many different values. These inputs are meta-stable.

And yes, you can fix meta-stability in software with things like mutexes and execution exclusion. Just like you can fix meta-stability bugs in hardware signals by add a clock synchronization domain that spans the widest period of meta-stability possible for a signal.

One of the things that makes race condition bugs so hard to debug is that your typical tracing facility assumes that the inputs passed to a function are stable and don't change. Thus you'll see a trace record of a function call, with its parameters, and walk through the code and say "Wait, with those parameters this code could never do what it just did." Or conversely, that variables that have shared write status across execution domains will be the same throughout a function.

Does that make is clearer what I was talking about? Race conditions suck :-)

eyko · on Oct 4, 2019

I think Wikipedia would benefit from your contribution!

cornstalks · on Oct 3, 2019

I think the parent meant (emphasis on the scare quotes) “cosmic ray” type bugs as the kind of bugs we programmers sometimes attribute to cosmic rays, despite actually being a complex race condition.

hazeii · on Oct 3, 2019

Good point, happened I had one today that appeared impossible, yet (of course) it was easily understandable once the root cause was found.

The Zebra test is a good friend.

[0] https://en.wikipedia.org/wiki/Zebra_%28medicine%29

HashThis · on Oct 3, 2019

This is why open source for lower level systems is amazing!! That is why graphic card drivers should be open sourced

fulvous · on Oct 3, 2019

They are for the 2 of the biggest GPU vendors.

CobrastanJorji · on Oct 3, 2019

Is nVidia no longer one of the two biggest GPU vendors?

metamet · on Oct 3, 2019

I don't think that's what meant. "2 of the biggest GPU vendors" !== "the 2 biggest GPU vendors".

sriram_sun · on Oct 4, 2019

Nice distinction! Thank you.

duskwuff · on Oct 3, 2019

Depends on what you count. If mobile GPUs are included, I wouldn't be surprised if one of PowerVR, Mali (ARM), or Adreno (Qualcomm) was on top.

noego · on Oct 3, 2019

I'm not surprised that there are vast numbers of latent low-probability race conditions in the linux kernel or any major software project. Having seen the testing process in both hardware and software projects, the two are not even comparable. The testing process in most software projects is dominated by fully deterministic unit tests that are very simple in nature, and make large numbers of assumptions about the behaviors and interactions with other components. Low-probability race conditions between different components is exactly the outcome I would expect from this testing process.

https://software.rajivprab.com/2019/04/28/rethinking-softwar...

plorg · on Oct 3, 2019

Maybe this will do a better job of identifying the race condition that made the AMD card in my laptop unusable. About 5 years ago a bunch of changes evinced a change in behavior when trying to switch hybrid graphics controllers. I worked in a bug report for a long time after I and a few others fingered it as a race condition (it failed maybe 1/10 switches). A cluster of other changes meant it was difficult to bisect (it broke one way at a few points and broke differently at other points, but in a way that made it difficult to identify whether the bug we were triageing existed at that point.)

Havoc · on Oct 3, 2019

Solid contribution. Will hopefully improve stability even further.

Does this automatically generate fixes too, or does someone need to investigate time for each by hand?

sanxiyn · on Oct 3, 2019

It needs investigation by hand. I don't see how that can be automated.

Havoc · on Oct 3, 2019

I would have thought so too, but the github says:

>kcsan-with-fixes: Contains KCSAN with various bugfixes for races detected; the commit messages for those bugfixes include the KCSAN report as-is.

Which to me seems to imply some sort of automatic mitigation. Maybe I'm reading too much into it

sanxiyn · on Oct 3, 2019

Nope, that's just the repository with manual fixes.

pedrow · on Oct 3, 2019

For those interested, here are some of the bugs which their tool has found: https://github.com/google/ktsan/wiki/KTSAN-Found-Bugs

sanxiyn · on Oct 3, 2019

No, that's KTSAN, which is different from KCSAN, although they do find similar bugs.

mikorym · on Oct 3, 2019

Are there any HN readers here that use MINIX or possibly another OS that does not have the Linux kernel? I'm not about to argue pros/cons but would like to see peoples' use cases; my own use cases have not necessitated using anything more complicated than Ubuntu, and quite happily so.

Still, I think from my interest point of view it would be interesting to not only understand the Linux kernel better, but also other OS design paradigms.

beefhash · on Oct 3, 2019

I swear by OpenBSD. I use it on my home server. I use it to write C code (their C standard library has a few very nice things, such as arc4random, strl{cat,cpy}, explicit_bzero, timingsafe_memcmp, libtls, and other things that I run into often enough that I don't want to think about them before I start adding portability stuff).

Of course, it's not much of an option for anything I require proprietary software.

__turbobrew__ · on Oct 3, 2019

Openbsd has opened my eyes to how deficient documentation is in Linux.

mikorym · on Oct 7, 2019

Meaning it has good documentation?

mruts · on Oct 3, 2019

+1 for OpenBSD. For C, it’s the nicest platform around. I love the system man pages and the like you said, they have a lot of nice system and library functions to use as well.

Also it’s pretty easy to copy the C stuff to new OSes as well for portability.

lima · on Oct 3, 2019

> Are there any HN readers here that use MINIX or possibly another OS that does not have the Linux kernel?

Like... Windows?

hunter2_ · on Oct 3, 2019

To be fair, Windows has the Linux kernel (WSL 2).

ygra · on Oct 3, 2019

Which is basically a VM. It's not that the Windows kernel isn't running even when using WSL.

bauerd · on Oct 3, 2019

It's just a compatibility layer satisfying the kernel API (= syscalls). It's an API on top of the Windows kernel.

ygra · on Oct 3, 2019

That's WSL 1 you're thinking about.

bauerd · on Oct 6, 2019

Crazy, wsan't aware of this. I stand corrected

hoffs · on Oct 4, 2019

Wsl2 has complete kernel not just syscall translation

mapgrep · on Oct 3, 2019

I’m setting up a SmartOS server on a little NUC right now and loving it. Zones are very nice, pkgsrc package manager is better than I expected, man pages wonderfully written, even ipfilter feels more elegant than iptables. Service and zone admin very well thought out. It feels much more like a cohesive /system/ than say fedora or Ubuntu.

Can run Linux without virtualization (native performance via syscall translation) in a zone if I hit any issues, which I’ve not yet (simple web stack).

Google around for some of Bryan Cantrill’s talks on smartos. Solaris derivative. He’s been quite critical of several linux engineering decisions and what he frames as sloppy thinking in the kernel community.

I found this helpful/inspiring - https://timboudreau.com/blog/smartos/read - if you follow his example be sure to change dataset_uuid to image_uuid in the zone manifest.

rvz · on Oct 3, 2019

> Are there any HN readers here that use MINIX or possibly another OS that does not have the Linux kernel?

I have both macOS and Ubuntu on my MacBook. I have always predominantly used macOS for all my day-to-day uses and only use Ubuntu to test software that needs to run on Linux before releasing.

I have known many people who use FreeBSD or OpenBSD daily and I have even heard some people installing Haiku as a secondary OS because they have now found it 'very interesting' to use.

I don't know anyone using MINIX though, but I believe that Fuchsia looks like one of the most technically interesting new OSes to be developed in terms of OS design paradigms.

chrisseaton · on Oct 3, 2019

> Are there any HN readers here that use MINIX or possibly another OS that does not have the Linux kernel?

I use macOS all the time. My use-case is systems programming.

I also uses iOS on my phone.

Or is that not what you meant?

mikorym · on Oct 7, 2019

I don't know if you go through old comments, but for your reference:

"Mac OS X is sort of microkernelish. Inside, it consists of Berkeley UNIX riding on top of a modified version of the Mach microkernel. Since all of it runs in kernel mode (to get that little extra bit of performance) it is not a true microkernel, but Carnegie Mellon University had Berkeley UNIX running on Mach in user space years ago, so it probably could be done again, albeit with a small amount of performance loss, as with L4Linux. Work is underway to port the Apple BSD code (Darwin) to L4 to make it a true microkernel system." [1]

[1] https://www.cs.vu.nl/~ast/reliable-os/

ksec · on Oct 3, 2019

I assume you mean Open Source, and that would exclude Windows and macOS. Apart from MINIX which we now know lives on every Intel chips, there are BSDs, NetBSD on Apple AirPort, FreeBSD on Netflix Open Appliance, Playstation, even Solaris is still being developed worked on. L4 is used in every Apple SoC with Secure Enclave.

And many many others.

jalons · on Oct 3, 2019

There's always the BSDs.

yogthos · on Oct 3, 2019

I'd be shocked if something written in C that's as complex as the Linux kernel didn't have race conditions.

adtac · on Oct 3, 2019

The language a program is written in has no bearing on whether or not the program is susceptible to race conditions. You can have data races in Python, Javascript and Rust as easily as you can in C.

loonyphoenix · on Oct 3, 2019

You cannot have data races in safe Rust.

adtac · on Oct 3, 2019

Race conditions aren't necessarily data races. This article and the parent comment was about race conditions in general.

loonyphoenix · on Oct 4, 2019

The parent comment literally says

> You can have data races in Python, Javascript and Rust as easily as you can in C.

Don't use "race conditions" and "data races" interchangeably if you understand the difference...

yogthos · on Oct 4, 2019

That's a false equivalence. Some languages make it much easier to create race conditions than others. For example, you have to work hard to avoid them in C, but you'd have to actually try to introduce them in Clojure.

ptah · on Oct 3, 2019

I wonder if any software rely on these bugs

EDIT: windows famously had bugs and had to add special code to preserve the buggy behaviour to keep certain applications that relied on them working https://www.joelonsoftware.com/2004/06/13/how-microsoft-lost...

chrisseaton · on Oct 3, 2019

Seems very tricky, almost by definition, to rely on a race condition?

nitwit005 · on Oct 3, 2019

My previous company preserved bad server caching behavior for one or two customers, as they had custom code depending on it. Given that it was a cache, it was a race condition, just not one likely to trigger (would only get evicted if there was a storm of requests).

asveikau · on Oct 3, 2019

You assume every race is a bug.

Part of the job of a kernel is to resolve races. Requests come to it from multiple threads and processes in a parallel fashion. Sometimes your timing will be one way and get you one result, sometimes it will go another way. That's OK. It's the nature of the beast.

pjc50 · on Oct 3, 2019

This is a misunderstanding. A situation where "either process A or process B gets a resource first" is not a race condition. A situation where both of them get it at the same time is.

asveikau · on Oct 3, 2019

Ok, I was being imprecise.

Even if there is parallelism and multi-core, it is still a question of ordering, and from there we can use language like who gets there "first" [indeed "race" suggests this]. Or, did this read of a machine word see somebody else's write, etc.

sanxiyn · on Oct 3, 2019

This tool finds data races. All data races are bugs.

asveikau · on Oct 3, 2019

You cannot say all data races are bugs independent of what the data represents or how it is handled.

Perhaps a read not seeing someone else's write in a timely fashion is OK in your situation. Perhaps you will resolve conflicts later using proper synchronization.

sanxiyn · on Oct 3, 2019

Data races are undefined behaviors hence bugs under the C standard. You do have a point in that Linux kernel is not written against the C standard and could declare to be written in the dialect of C where data races are defined...

wahern · on Oct 5, 2019

Undefined behavior is not necessarily a "bug", even in the context of the C standard. Undefined behavior is undefined behavior--the whole point of term is that the standard has nothing to say about the actual behavior. Undefined behavior includes behaviors about which the standard literally says nothing, like the effect of a supernova. But the term is normally used to distinguish certain behaviors that for various reasons the standard chooses (and chooses loudly) not to impose any requirements even though in context it could easily choose to do so; behaviors people might otherwise mistakenly assume are defined (or "natural") in reliance on their own outside experience and knowledge.

Considering that C didn't support threading or even an atomics API until C11, and that most C code was written well before C11, by your definition the entire Linux kernel has been one giant "bug" ever since it added SMP support.

Are data race conditions bugs? Usually--the term itself is conclusory of that fact. They're bugs not because they constitute undefined behavior as defined by the C standard, but because neither the hardware, compiler, application logic, or anything else guarantees consistent behavior; and if you have no guarantee of behavior then your software is ipso facto incorrect. If something were to guarantee the behavior, such as the notably generous cache coherency semantics of the x86 ISA, and nothing else conflicts with that guarantee, like an aggressive compiler, then it wouldn't be a bug, at least to the extent you knowingly relied on that guarantee.

Undefined behavior is not an epithet. Just because a language specification doesn't use the term doesn't mean it doesn't have undefined behavior. Gobs of code in languages like Python and PHP rely on undefined behavior because those languages have very loose specifications. Even Java has undefined behavior despite its otherwise extremely rigorous specifications. (What Java does have is a very generous memory model which usually cabins the consequences of undefined behavior. Even then you can never rule out undefined behavior resulting in nasal daemons through some cascade of unexpected code paths in the application.)

One of the reasons the term is so important in C and C++ is because they're two of the few languages with numerous, diverse implementations. The odds of two implementations diverging while programmers unwittingly rely on one of several potential behaviors is much greater. If there are only a few relevant implementations then there's less pressure to explicitly distinguish undefined behavior. Behaviors will tend to converge through informal communication. There's almost no pressure beyond the good diligence of the engineers if there's only a single implementation. Any accidental semantics relied upon by users can be made retrospectively well defined, and if there's no reliance then behaviors silently change (sometimes breaking code).

For reference, here's what undefined behavior means, per C11 3.4.3 (N1570):

  1 undefined behavior
    behavior, upon use of a nonportable or erroneous
    program construct or of erroneous data, for which this
    International Standard imposes no requirements

  2 NOTE Possible undefined behavior ranges from ignoring
    the situation completely with unpredictable results, to
    behaving during translation or program execution in a
    documented manner characteristic of the environment
    (with or without the issuance of a diagnostic message),
    to terminating a translation or execution (with the
    issuance of a diagnostic message).

  3 EXAMPLE An example of undefined behavior is the
    behavior on integer overflow.

asveikau · on Oct 3, 2019

> You do have a point in that Linux kernel is not written against the C standard and could declare to be written in the dialect of C where data races are defined...

You had my retort already written. Thank you.

And in fact the linux kernel does rely on very specific compilers, and breaks the standard's machine abstraction quite a bit.

gpderetta · on Oct 3, 2019

It is not a given that those compilers will support the unspecified memory model forever. And in fact the linux kernel has been slowly moving to the C++11 memory model for a while.

KirinDave · on Oct 3, 2019

Is this actually what the system was attempting to observe? I was under the impression that most of these race conditions were unknown and indeed were supposed to be implicit.

namanaggarwal · on Oct 3, 2019

I am wondering if machine learning can use used to solve these problems. I have heard that Google uses a machine learning to auto find the bugs and raise merge request against them. Someone from Google can confirm.

jacquesm · on Oct 3, 2019

You don't need machine learning, you need a better architecture. Micro kernels can be made in such a way that they are guaranteed race condition free.

adrianN · on Oct 3, 2019

If you're willing to throw a large part of the existing code away to move to a microkernel architecture, you might as well rewrite it in a language that doesn't have race conditions by design.

jacquesm · on Oct 3, 2019

Race conditions can be constructed in almost any language. It is more of a systemic thing than a language artifact though some languages are more race condition prone than others.

You can even have race conditions in hardware. Multi-threading or distributed software are excellent recipes for the introduction of race conditions, sometimes so subtle that the code looks good and the only proof that there is something bad going on is the once-in-a-fortnight system lockup.

Symmetry · on Oct 3, 2019

One of my professors in processor design had a story about debugging a register leak in an out of order processor with register renaming.

lokedhs · on Oct 3, 2019

I believe the undocumented instructions in the 6502 was caused by race conditions in the CPU. Some of these instructions are actually usable while others give random results.

jacquesm · on Oct 3, 2019

I wrote a 6502 assembler and had a really hard time not giving $88 it's own mnemonic in the main table rather than to load it at the start of every file from an include. It was just too useful :)

lokedhs · on Oct 7, 2019

Why wouldn't you? These opcodes seems to be used by most people these days, so there seems to be no reason not to treat them just like any other opcode?

mondoshawan · on Oct 3, 2019

Ah, no. Those are caused by don't care states in the logic of the PLA instruction decode block.

vkaku · on Oct 3, 2019

Race conditions can be created by I/O paths. I'm willing to state that unless the said language abstracts away all I/O, it will always be possible to introduce a bug :)

ukj · on Oct 3, 2019

Haskell Monads!

#Sarcasm

moomin · on Oct 3, 2019

The funny thing about this being that Linux was originally announced on comp.os.minix.

tormeh · on Oct 3, 2019

https://redox-os.org/

danmg · on Oct 3, 2019

How does rust negate data race conditions? These aren't memory errors.

jolux · on Oct 3, 2019

The borrow checker prevents it. Race conditions that aren’t data races are not prevented, though.

tormeh · on Oct 3, 2019

It's an OS with microkernel written in Rust. It's literally the first thing you see on the website. So it's both microkernel, and Rust, so it is an example of people doing what parent mentioned.

danmg · on Oct 3, 2019

Yikes.

tormeh · on Oct 3, 2019

Yeah, I exploded a bit. Sorry about that :(

not_a_cop75 · on Oct 3, 2019

Is the trade off efficiency? Because I'm sure a language that is willing to wait 5 seconds between operations can do pretty well at eliminating race conditions.

jacquesm · on Oct 3, 2019

Race conditions do not have anything to do with absolute time, it is simply an assumption about the sequence in which things are going to happen and finding out in practice that the implementation allows other sequences the output of which is undefined.

not_a_cop75 · on Oct 3, 2019

Most absolutes spoken about anything are natural fallacies. The truth is that they can. Hell, I see Apple devices that don't update anything including time until you take them out of screen lock, and you see the previous date and charge amount sometimes even after unlock. That is an efficiency based form of race condition.

atq2119 · on Oct 3, 2019

Perhaps the microkernel is free of race conditions. But now you've introduced a zoo of interacting processes (drivers, servers, etc.) and with that a new class of race condition in their interactions, which may or may not be harder to debug than what you started out with.

jacquesm · on Oct 3, 2019

That zoo will still have a lot of advantages over a monolith, and one of those advantages will be that if the components stick to their defined interfaces for interaction that your chances of race conditions are substantially limited.

gpderetta · on Oct 3, 2019

What magic in microkernels prevents race conditions?

jacquesm · on Oct 3, 2019

That they allow for reduction of scope to the point that race conditions can be mostly eliminated. A macro kernel has untold opportunities for race conditions because the kernel itself is multi-threaded. Effectively every driver is a potential candidate.

In a - properly implemented - micro kernel you will have each of those drivers isolated in its own user process, and if there is an issue (memory corruption, crash, logic error or race condition) the possibility of its effects spreading is limited.

Isolation is half the battle when dealing with issues such as these.

gpderetta · on Oct 3, 2019

> Isolation is half the battle when dealing with issues such as these.

having had to deal with race conditions between processes running at opposite side of a continent, color me unimpressed.

jacquesm · on Oct 3, 2019

That's a sign of a very poor architecture, see also: the fallacies of distributed systems.

joosters · on Oct 3, 2019

Do you mean that each driver is forced to be single-threaded, somehow? Or just that the drivers are isolated, so they can have as many race conditions as they like, but the microkernel will keep running?

Because I don't see the advantage here. So what if my filesystem driver is isolated, a race condition in there is still going to corrupt all my data and I'm not going to be cheered up by the fact that the kernel can keep on ticking.

jacquesm · on Oct 3, 2019

That's one way of looking at it.

Another is to think of this as layers of essential services: kernel, memory management, networking services, file services and so on. For a layer to be reliable the layers below it have to be reliable too. Isolating the layers in processes allows for faster resolution of where a problem originates which in turn will help to solve it.

And being able to identify the source of a problem allows for several other options: restart the process, proper identification of a longer term resolution because of the ability to log the error much closer to where it originated.

These are not silver bullets but structural approaches acknowledging the fact that software over a certain level of complexity is going to be faulty.

In my opinion the only system that gets this right is Erlang. (not the language, but the whole system)

joosters · on Oct 3, 2019

I take your point about Erlang, and its systems for restarting/replacing failed code & processes, but I don't see how this eliminates race conditions or their consequences.

One problem here is that it's not always a simple layer on layer order of dependencies, they can work both ways. Take swapping/paging for example. It can't be isolated as a separate driver or service. The kernel relies upon it, and so do processes, and yet the service itself relies upon filesystems, a separate layer again. A race condition in the swap system (like one of the bugs found by the tool in question) would break multiple layers, or a race in the filesystem could cause the swap system to fail. There is no real isolation here that saves an OS from failures.

jacquesm · on Oct 3, 2019

It's not 'magic', but will eliminate a whole swath of possibilities.

One reason why race conditions in monolithic kernels are very common is due to the requirement that code is re-entrant because of multi-threaded execution of the kernel itself. In a micro kernel situation the multi-threading can be avoided because you have enough processes to spread the cpu cores over to have a number of them active at the same time without having two or more of them active within the same process.

gpderetta · on Oct 3, 2019

> Another is to think of this as layers of essential services: kernel, memory management, networking services, file services and so on. For a layer to be reliable the layers below it have to be reliable too. Isolating the layers in processes allows for faster resolution of where a problem originates which in turn will help to solve it.

is there any reason to believe that most (or even a just considerable part) of race conditions in a kernel are cross module?

jacquesm · on Oct 3, 2019

Not necessarily, but the effects of a race condition (for instance: a privilege escalation) would be limited to the process the race was found in. That would substantially limited the potential for damage.

ergdgdsgdgs · on Oct 3, 2019

They're small :P!

Less facetiously, some micro kernels have been proven formally correct. You need a small code to prove formally correct.

And even on an intuitive level, if the code is small I can "hold" it all in my head - at least really grock it in a way you can 1E6 loc.

gpderetta · on Oct 3, 2019

why would a microkernel be smaller than than a monolithic kernel, given the same feature set? Unless you claim that async message passing leads to shorter code than procedure calls.

jacquesm · on Oct 3, 2019

Because it gets spread around over multiple processes you get process based isolation for each module that normally all lives within the same address space. So you may be able to break a module but you can't use that to elevate yourself to lord of the realm.

gpderetta · on Oct 3, 2019

I very much agree that a microkernel can be both more robust and secure. I'm arguing against being easier to understand. Modularity does not require address space separation, which is a runtime property, not a code organizations feature.

jacquesm · on Oct 4, 2019

The one tends to go hand-in-hand with the other, in my experience. Of course, that's anecdata, but the more complex projects that I've worked on that used message passing separated much more naturally and cleanly into isolated chunks of code that produced their own binaries. Essentially message passing and micro kernels dictate a services based approach, which is an excellent match for OS development.

Mountain_Skies · on Oct 3, 2019

It's very likely. Code linting has been around for a long time. In the security world, static code analysis tools are used to find security vulnerabilities. It makes sense that Google would have something similar for identifying all manner of code defects.

JaRail · on Oct 3, 2019

It'd be a really interesting project to data-mine a large number of IDEs at a company like google. Use machine learning to predict fixes to lint/compile errors, and surface those suggestions in the IDE. Would also be fun to do it with code reviews to see what kind of garbage suggestions you can generate there. Probably a lot of "I don't understand what this does" and "please write tests for this."

Would also be cool for auto-complete suggestions. I'm thinking more along the lines of extracting larger patterns than for loops, such as stubbing out classes that follow a projects existing style. For example, adding an auth check at the start of a web request handler if that's what it sees elsewhere.

adrianN · on Oct 3, 2019

I really like the development of all the sanitizers we got in the last decade or so. I only wish that more could be done at compile time as opposed to these runtime checks.

myu701 · on Oct 3, 2019

That's the killer feature why I gave F# instead of my usual C# a chance - it 'does more for me' at compile time vs. run time.

Yes, these are both much higher level than C, but if what I'm reading among other comments is correct, Rust accomplishes an equivalent improvement in compile time checks while still being a systems language.

gameswithgo · on Oct 3, 2019

and Rust will prevent data races, which F# won't.

AndyMcConachie · on Oct 3, 2019

As someone who has personally written Rust code that deadlocked under specific circumstances, I think I can say that Rust does not prevent deadlocks. Not sure about specific types of race conditions, but I don't think it's possible for the compiler to detect all race conditions at compile time.

It is a heck of a lot easier to not make them in Rust than in someting like C and C++. But Rust is not a panacea for asynchronous code.

steveklabnik · on Oct 3, 2019

Rust does prevent:

* Data races

Rust does not prevent:

* Deadlocks

* Race conditions

* Memory leaks

* Logic bugs

That said, it does make some of these things harder to accidentally introduce, but strictly speaking does not prevent them, it's true.

0xdead · on Oct 3, 2019

I thought resources were freed when they went out of scope preventing any memory leaks in Rust.

kinghajj · on Oct 3, 2019

You can still introduce them via circular strong reference-counted pointers. The solution of course is to have only one strong reference and many weak ones, but the compiler in principle will let you create a cycle that will never get deallocated.

Rusky · on Oct 3, 2019

That's the common case, but you can still make a reference counting cycle, or have a thread hang while holding onto something, or even just call `mem::forget` on an allocation.

This is different from Rust's guarantees about memory and thread safety, which you can't break without `unsafe`.

pornel · on Oct 3, 2019

They are, and in a typical program you don't need to worry about leaking memory.

However, it is possible to leak, e.g. if you use a reference-counted type and create a cycle. There's also Box::leak() that does what it says (it's useful for singleton-like things).

brighteyes · on Oct 3, 2019

Rust would have tradeoffs here, though. For example, in C# you'd be able to do (non-leaking) graphs and other structures in 100% safe code, while Rust would need unsafe.

PudgePacket · on Oct 3, 2019

You don't need unsafe. This point is brought up a lot, though I understand where the misunderstanding can come from.

Rust is perfectly capable of expressing graphs with zero unsafe code, just currently not in the most optimum implementation.

ClumsyPilot · on Oct 3, 2019

Is is fine, or suboptimal to the point of being impractical? Genuine question, how bad is this, if quanified?

pornel · on Oct 3, 2019

You need to use reference counting or a scoped memory pool. It's totally fine in almost all cases.

Refcounting in Rust is still faster than refcounting in ObjC or Swift (because Rust can avoid using atomics or increasing counts in many cases), but people who use Rust tend to insist on zero overhead, and a truly flexible zero-overhead solution can't be verified statically.

sanxiyn · on Oct 3, 2019

It is fine and practical.

staticassertion · on Oct 3, 2019

Adjacency matrices are trivial to implement in Rust. I leverage graphs constantly.

tick_tock_tick · on Oct 3, 2019

It's slow.

Rusky · on Oct 3, 2019

You don't even need "not the most optimum implementation" (assuming you're talking about index-based graphs or something).

You can make non-leaking pointer-based graphs in safe Rust using reference counting (basically what C# does, if you squint) or arenas (if your nodes' lifetimes fit that pattern).

BubRoss · on Oct 3, 2019

It doesn't need to be unsafe or unoptimal, just use indices into flat memory instead of pointers.

pron · on Oct 3, 2019

There are sound static analyzers, like Trust-in-Soft, that can statically guarantee the absence of undefined behaviors in C programs, but require interactive work to get rid of the false positives.

But the distinction between "runtime" and "compile-time" is not entirely binary. What you call "compile time" checks can be viewed as an abstract interpretation [1] of the program, or as running the program in some abstract domain -- e.g., where a concrete value could be 3, its abstract value could be called `int`, and the + operation could be interpreted as `int + int = int`; type inference could be viewed as abstract interpretation. The problem is that sound abstract interpretation (i.e., one without false negatives -- if it tells you your programs is "safe" then it will definitely be safe, for some appropriate definition of safe) is limited, and often comes with big tradeoffs, e.g. either false positives given by sound static analyzers or the pain something like Rust's borrow checker sometimes causes -- these are both instances of the same underlying difficulty with abstract interpretation.

One of the most promising directions in formal methods research is "concolic testing" [2], which means a combination of concrete (i.e. non-abstract, or "runtime") execution and symbolic execution (an instance of abstract interpretation). It is not sound, but can be made quite safe, and at the same time it can be very powerful and flexible, checking properties that can be either impossible or extremely tedious, to the point of infeasibility, with abstract methods like types. It might prove to be a good sweet spot between the power, expressivity and cost of concrete (AKA dynamic, AKA runtime) methods and the soundness of abstract (AKA static, AKA compile-time) methods.

Even dynamic methods are often "stronger" (more sound) than just testing assertions during testing. For example, fuzzers can sample the program's state space, offering more coverage, either by analyzing the program (whitebox) or just randomly (blackbox), and various sanitizers sample the scheduling space by fuzzing thread scheduleing.

[1]: https://en.wikipedia.org/wiki/Abstract_interpretation

[2]: https://en.wikipedia.org/wiki/Concolic_testing

lallysingh · on Oct 3, 2019

Sure, just look at the languages that have more stringent type systems.

adrianN · on Oct 3, 2019

They don't really help me make the mountains of C we have in the world less buggy.

lallysingh · on Oct 3, 2019

C code doesn't have enough information in it to do a whole lot of mechanical analysis. Runtime analysis of its behavior is what you've got.

nickpsecurity · on Oct 3, 2019

Have you never heard of program analysis for security? Genuine question. RVI, the Mayhem developers, and many others would strongly disagree with you:

https://runtimeverification.com/match/1.0-SNAPSHOT/docs/benc...

https://spectrum.ieee.org/computing/software/mayhem-the-mach...

The field evidence would support their position, too. Your claim about C is true just enough to encourage people to not use it if they want to get more out of program analysis. Yet, current tools make up for it enough to find most or all bugs in benchmarks with Mayhem fixing them, too.

The only problem is that almost nobody that cares about FOSS security is working on those tools that automate it. The companies doing it end up black boxing, patenting, etc the tools that they sell for exhorbitent prices. Something like RV-Match well-integrated with repos either FOSS or just price scaled to focus on mass adoption over high profit would be a game changer. Especially if 3rd parties could use it on repos.

lallysingh · on Oct 3, 2019

Aren't those runtime tools like I just suggested? The other ones in the 1st link are theorem provers (at least the one that I recognized, frama-c) that require a lot more than off-the-shelf C code for their inputs. You can't just run CC=my-compiler-frontend make to suddenly get better diagnostics.

nickpsecurity · on Oct 3, 2019

Companies are mixing static and running analysis more than they used to. Most static analyzers look at just the code itself, build models of it, check their properties, and report the presence/absence of errors. First links were more about effectiveness on C. I just dug around in search to find you something that explains it, shows its effectiveness, and threw in a case study you might enjoy:

https://www.cs.colorado.edu/~kena/classes/5828/s12/presentat...

https://cacm.acm.org/magazines/2010/2/69354-a-few-billion-li...

Here's the main competition evaluating them so you can see criteria and current state-of-the-art:

https://www.sosy-lab.org/research/pub/2019-TACAS.Automatic_V...

The provers do something similar but with more manual effort and logic-based approach. Frama-C and SPARK Ada are interesting in that they use programmer annotations plus supporting functions (eg ghost code). These are converted into logical representation (eg facts about the program), fed into logic solvers with properties intended to be proved, and success/failure maybe tells you something. Like with static analyzers, you get the results without running the program. A recent, great addition is that, if proof is too hard, the condition you didn't prove can be turned into a runtime check. You could even do that by default only proving the ones that dragged performance down too much. Flexible.

The best methods are a mix. I've always advocated mixing static analyzers, dynamic analysis, fuzzing etc since each technique might spot something the other will miss. They also cover for implementation bugs each might have. That adding runtime analysis can improve effectiveness does not negate my original claim that static, non-runtime analysis of C code can tell you a lot about it and/or find a ton of bugs. I also agreed with one point you made about it not conveying enough information: these clever tools pull it off despite C lacking design attributes that would make that easier. SPARK Ada is a great example of a language designed for low-level, efficient coding and easy verification. Although, gotta give credit to Frama-C people for improving things on C side.

lallysingh · on Oct 4, 2019

This is really interesting stuff, thanks. I've assumed that without source annotation, the only things that you can verify statically are violations of the runtime model (e.g., stack overflow, heap corruption). Mostly through inferences on possible value sets of function arguments and return values.

I'm going to have a look at your links more. I guess the question I don't have answered quite yet is how to tell the tool what to look for beyond the language semantics.

nickpsecurity · on Oct 5, 2019

What you've assumed is mostly correct. Almost all the annotation-free tools look for violations like you described since that's what most people want. The correctness-oriented work starts with annotated code. The only other thing I know that works without annotations, but sort of similar, is creating the specs from example inputs and outputs. Then, the code can be analyzed against them.

What pro's often do, though, is prove the algorithm itself in a higher-level form that captures its intrinsic details, create an equivalent form in lower-level code, and prove the equivalence. That lets you deal with the fundamentals first. Then address whatever added complexity comes in from lower-level details and language semantics.

salawat · on Oct 3, 2019

For some reason this feels like a rediscovery of the virtues of leaving asserts in production code to me. Though I need to give these a read. Thanks!

nickpsecurity · on Oct 3, 2019

That's a good comparison given asserts are specs. If you like asserts, the easiest, high-ROI method of getting into verification is Design-by-Contract. Here's both a way to sell it to your boss and some articles highlighting it:

https://www.win.tue.nl/~wstomv/edu/2ip30/references/design-b...

https://www.hillelwayne.com/post/contracts/

https://www.hillelwayne.com/post/pbt-contracts/

Many of us independently converged on a consensus about how to do it quickly, too. You put the contracts into the code, do property-based testing to test them, and have them turn into runtime or trace checks combined with fuzzers. Eclipser is one that runs on binaries finding problems way faster than others. This combo lets you iterate fast knocking problems out early. If you want more assurance, you can feed those contracts (aka formal specifications) into tools such as Frama-C and SPARK Ada to get proofs they hold for all inputs. Obviously, run lots of analyzers and testing tools on it overnight, too, so you get their benefits without just staring at the screen.

https://github.com/SoftSec-KAIST/Eclipser

Another lightweight-ish, formal method you might like was Cleanroom Software Engineering. It was one of first to make writing software more like engineering it. Stavely's introduction is good. His book was, too, with the chapter on semi-formal verification being easier to use and high ROI having excellent arguments. Fortunately, the automated and semi-automated tools doing formal analysis look good enough to replace some or all of the manual analysis Cleanroom developers do. Languages such as Haskell with advanced type systems might let it go even further. I advise coding in a simplified, hierarchical style like in Cleanroom to ease analysis of programs even if you do nothing else from the method. If you're curious, that technique was invented by Dijkstra in 1960's to build his "THE Metaprogramming System."

https://web.archive.org/web/20190301154112/http://infohost.n...

https://en.wikipedia.org/wiki/THE_multiprogramming_system

Have fun with all of that. I'm pretty sure straight-forward coding, careful layering, DbC, and PbT will help you on realistic projects, too.

salawat · on Oct 3, 2019

Are you Santa? Isn't it a bit early for Christmas?

Thank you so much! I've been looking into building a foundation with regards to formal methods and safety-critical systems, and besides getting familiar with the regulations and Standards around that type of software, I've also been looking to build up some familiarity with the tools to achieve those higher levels of provability!

This all looks like it fits the bill perfectly! Thank you! Thank you! Thank you!

nickpsecurity · on Oct 4, 2019

"Are you Santa? "

Shhhhhh!!

"safety-critical systems"

Interesting enough, safety-critical field doesn't actually use formal methods much. It's more about lots of documentation, review, and testing. Mostly just human effort. They do buy static analyzers and test generators more than some other segments. There's slowly-slowly-increasing adoption of formal methods in that space via SPARK Ada and recently-certified CompCert. The main incentive so far is they want to get bugs out early to avoid costly re-certifications.

DO-178C and SIL's are main regulations driving it if you want to look into them. DO-178B, the prior version, is also a good argument for regulating software for safety/security. It worked at improving quality. We even saw our first graphics driver done robustly. :)

KirinDave · on Oct 3, 2019

It does suggest we should delete it, which is pretty helpful. It's a terribly liability.

Gibbon1 · on Oct 3, 2019

I have an alternate opinion which is compile times should be fast because it done often. Where sanitization is something you only need to do in later release QA stages. SO it can be slow and that's okay.

stavros · on Oct 3, 2019

You can have different "thoroughness" levels while compiling, though.

vinceguidry · on Oct 3, 2019

Is there even an approach to doing that? Intuition tells me that's a mathematically intractable problem.

gameswithgo · on Oct 3, 2019

Rust prevents data races at compile time. Note that data races, while a common form of race condition, is not the only kind of race condition, and Rust won't prevent the others.

Ygg2 · on Oct 3, 2019

You're right to an extent. I think that because of Halting problems, a number of useful properties computer could tell you about itself are limited, because they boil down to solving Halting problem.

However, one way to avoid such a problem was basically settling for less accurate, but still strict subproblems. I.e. Rust doesn't stop all race conditions but solves them for a subset of data races.

In other words you reject some valid programs (false positive?) in order to make sure all valid programs are really valid (false negative?).

reirob · on Oct 3, 2019

ATS language. It compiles to C and I believe I have seen even that experimentally some kernel module was developped in it.

Here some resources:

The Wikipedia page: https://en.wikipedia.org/wiki/ATS_(programming_language)

"A (Not So Gentle) Introduction To Systems Programming In ATS" by Aditya Siram at the StrangeLoop 2017: https://www.youtube.com/watch?v=zt0OQb1DBko

Introduction to ATS, a series of screen casts: https://www.youtube.com/playlist?list=PL6BIXG1a4elsauhh56i5n...

I do not program in ATS because I am not doing anything that needs to be as efficient, so that I have the luxury to enjoy programming and learning Haskell, in the hope that by the time when I would need to write some super secure and performant program that has to run without interruption, and is used by more users than myself, well, I hope that by that time Haskell will evolve enough to allow me to write such a program. Otherwise I might end up using ATS, though honestly the syntax is so ugly.

rhinoceraptor · on Oct 3, 2019

Yes, just probably not with a language as semantically ambiguous as C. Rust for example has guarantees against data races.

nickpsecurity · on Oct 3, 2019

The good news is it's also the golden age of static analysis right now. There's open source and commercial tools that find most of the problems. Some are scaling up on codebases the size of the Linux kernel.

So, not only can you do that: it's standard practice for some developers already in and out of commercial space. Mostly commercial developers doing it that I see, though.

sanxiyn · on Oct 3, 2019

Note that this is a dynamic analysis, and precisely because equivalent static analysis won't scale to Linux kernel.

nickpsecurity · on Oct 3, 2019

It is dynamic analysis. I don't think it's that reason. Both Saturn and Facebook's tool scaled analysis to the size of the Linux kernel. I think most developers and companies aren't using or pushing them like they could for cultural reasons. Look at the number of people that used sanitizers vs low-F.P. static analyzers even if free. The latter would've gotten them results sooner. It wasn't empirical evidence or effectiveness that led to that decision. Human factors trump technical factors in adoption decisions most of the time.

http://saturn.stanford.edu/

Note: Linking to Saturn since it's old work by small team. I theorized big investment could achieve even more. Facebook's acquisition and continued investment with excellent results proved it.

If tech and money is the limit, tell me why Google hasn't straight-up bought the companies behind RV-Match and Mayhem to turn their tech on all their code plus open-source dependencies. Even if over-priced, it might actually turn out cheaper than all the bug bounties adding up. Maybe re-sell it cheap as a service Amazon-style. The precedent is that Facebook bought one that built a scalable tool they're applying to their codebase. Then, they were nice enough to open-source it. Google, Microsoft, Apple, etc could get one, too.

What's Google's excuse? They sure aren't broke or dumb. Gotta be cultural. Likewise, Microsoft built lots of internal tools. They do use at least two static analyzers: one for drivers, one for software. I heard they even ship 2nd one in VS. They don't use most of the MS Research tools, though. The reason is cultural: Operations side knows they'll keep making money regardless of crap security.

Far as FOSS developers, most don't care about security. Those that do tend to follow both what takes little effort (naturally) and whats trending. The trending topics are sanitizers, fuzzers, and reproducible builds. Hence, you see them constantly instead of people learning how to build and scale program analyzers, provers, etc with much better track record. Note that I'm not against the others being built to complement the better methods. I just think, if this is rational vs emotional, you'd see most volunteer and corporate effort going to what has had highest payoff with most automation so far. For a positive example, a slice of academics are going all-in on adaptive fuzzers for that reason, too. They're getting great results like Eclipser.

UncleMeat · on Oct 3, 2019

I know a bunch of the people at Google who are responsible for program analysis. The idea that they haven't bought some company doing symbolic execution because of some internal vendetta against static analysis is just ridiculous.

I also find it weird that you reference Saturn, since Alex Aiken hasn't been driving heavy SAT stuff for a while. Clark Barrett is at Stanford now and is doing more things related to what you want. And... oh wait he spent a few years at Google at few years ago.

Fuzzing plus sanitizers work like mad. They aren't magic and interpocedural static analysis provides value too. But your claim that Google is ignoring these techniques and doing so because of cultural idiocy just isn't correct.

nickpsecurity · on Oct 3, 2019

I know they do static analysis thanks to this article that we discussed a year ago on Lobsters:

https://cacm.acm.org/magazines/2018/4/226371-lessons-from-bu...

I'm saying they didn't care enough to do the kind of investment others were doing which would've solved lots of their problems.

The article also indicates they couldn't get developers to do their job of dealing with the alerts despite false positives. If Coverity's numbers are right, there's over a thousand organizations whose managers did it better.

Since they didn't address it, Google would be about the best company to acquire expensive tech like Mayhem that finds and fixes bugs so their developers can keep ignoring them. Alternatively, start double teaming the problem investing in FB's tool, too, moving in best advances from SV-COMP winners.

I mean, their size, talent, and finances don't go together with results delivered (or not) in static analysis. They could do a lot more with better payoff.

EDIT: Forgot to mention I referenced Saturn because parent said the methods couldn't scale to the Linux kernel. And Saturn was used on the Linux kernel over a decade ago. A few scale big now.

UncleMeat · on Oct 4, 2019

That's just one team. There's entire other teams at Google that aren't represented in this article.

nickpsecurity · on Oct 4, 2019

You keep missing my point. I keep giving examples of large-scale, catch-about-everything systems. I'm not talking merely has a few teams on something. This is Google, not a mid-sized business. You win if they already have their own version of Mayhem that fixes their bugs for them. Otherwise, they're behind Facebook in terms of financial effort they'll put in to get great capabilities. I'm also going to assume they're playing catch-up to Infer unless they've released their analyzers at least for peer review.

UncleMeat · on Oct 9, 2019

The infer team is only like 15 people, last I checked. That's certainly not more than "a few teams on something". Also, separation logic and symbolic/concolic execution are such wildly different approaches that it seems odd to pivot to Infer here.

I obviously won't be able to convince you since they aren't publishing at the same rate as facebook. So you'll just have to take my word that Google doesn't have a vendetta against static analysis.

nickpsecurity · on Oct 9, 2019

"The infer team is only like 15 people, last I checked."

The Infer team is claiming to get results on C/C++ with both sequential and concurrency errors via a tool they open sourced. I value scalable results over theories, team counts, etc. Does Google have a tool like that which we can verify by using it ourselves? Even with restrictions on non-commercial use? Anything other than their word they're great?

"that it seems odd to pivot to Infer here." "So you'll just have to take my word that Google doesn't have a vendetta against static analysis. "

You really messed up on 2nd sentence since I linked an article on Google's static analysis work. I told people they're doing it. I mentioned Infer was Facebook pouring a lot of money into a top-notch, independent team to get lots of result. I mentioned Google could do that for companies like that behind Mayhem that built the exact kind of stuff they seem to want in their published paper. They could do it many times over. If they did it, I haven't seen it posted even once. They don't care that much.

Your claims about team size and "vendetta against static analysis" are misdirection used to defend Google instead of explain why they don't buy and expand tech like Infer and Mayhem. And heck, they can use Infer for free. My theory is something along the lines of company culture.

jiveturkey · on Oct 3, 2019

golden age of static analysis PR maybe.

nly · on Oct 3, 2019

There are ways to achieve some of this statically with annotations:

https://clang.llvm.org/docs/ThreadSafetyAnalysis.html

Zigurd · on Oct 3, 2019

Linting for concurrency bugs is very hard. Modifying thread scheduling to make rare manifestations much less rare is much simpler.

asveikau · on Oct 3, 2019

Doesn't the kernel make use of some lock free data structures? Like RCU? I would be surprised if there are no races there, but the point is that they are harmless and handled.

sanxiyn · on Oct 3, 2019

Lock free data structures still use atomics, which this tool understands. All bugs found really are data races.

tytso · on Oct 3, 2019

The tool doesn't understand all lockless techniques, so there will be some false positives. For example, there are some cases where people have used the low-level primitives barrier() and cmpxchg(), which this tool (not possessing human-level intelligence) can analyze.

Also, not all data races can be exploited into serious bugs (in some cases, some stats might just be incorrect, for example).

That doesn't make the tool useless, of course! Just that one should take the numbers with a grain of salt.

sanxiyn · on Oct 3, 2019

I don't see how any lockless techniques can cause false positives on this tool, since this tool ignores them. The tool instruments plain memory accesses. For each access, with some probability, 1. setup watchpoint, delay, and then delete watchpoint, or, 2. check watchpoint. If there is a matching watchpoint, two threads made "simultaneous" accesses, hence data race.

nialv7 · on Oct 4, 2019

1) data races are different from race conditions. and data races are what this sanitizer detects.

2) data races are undefined behavior in C, but:

3) they don't necessarily translate to bugs in practice. for example:

https://godbolt.org/z/Z9RvpB

this is a data race, but in practice everything works as expected.

in cases like this, fixing the data race adds overhead without giving us much benefit.

slacka · on Oct 3, 2019

Any chance of this sanitizer being generalize for use outside of the kernel? Someday could the we use this like UBSan?

threadsanitizer · on Oct 3, 2019

I'm not sure about other platforms, but in Xcode, there is the ThreadSanitizer option in the Scheme diagnostics. Sounds similar to the Kernel Thread Sanitizer mentioned in the article. I assume it's part of clang, but am not positive of the under-the-hood implementation. If so, it may be available on other platforms. (I mention it because Xcode also has options to use Address Sanitizer and Undefined Behavior Sanitizer, so perhaps it's the same thing as the Kernel Thread Sanitizer but for user-space stuff?)

dang · on Oct 4, 2019

Could you please stop creating accounts for every few comments you post? We ban accounts that do that. This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.

HN is a community. Users needn't use their real name, but do need some identity for others to relate to. Otherwise we may as well have no usernames and no community, and that would be a different kind of forum. https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme...

elbelcho · on Oct 4, 2019

Reading the article, it seems like these are from an automated tool. I wonder how many of these are actual race conditions, vs false positives because of logic that the tooling can't decipher.

Simulacra · on Oct 6, 2019

This is a non-issue and really a false narrative blown up into something that isn't really an issue. Any concern of society can be fed into an AI and out comes a desired result. It's a computer program.

LeonM · on Oct 3, 2019

Boy, there are a lot of negative comments here. Google is contributing to the Linux kernel and released their tool as open source. Why all the hate?

Anyway, back on topic: would fixing these race conditions be only beneficial to stability, or would this also improve performance/responsiveness?

slashdev · on Oct 3, 2019

Fixing races generally means stricter synchronization, which makes for slower code. The world is complex and there are always exceptions, but I'd expect performance to decrease.

Edit: it goes without saying, but it's much better to fix the race conditions, regardless of what it means for performance. Some people seem to think I'm advocating for not fixing them.

typical182 · on Oct 3, 2019

As far as I understand, performance is not a valid concern.

For example, from dvyukov's KTSAN wiki [1]:

  Given a sufficiently expressive atomic API and a good
  implementation, you pay only for what you really need (if
  you pay just a bit less, generated code becomes incorrect).
  So performance is not an argument here.

[1] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONC...

slashdev · on Oct 3, 2019

You're talking about the sanitizer performance. We were talking about the kernel performance in the real world. Fixing these race conditions will have some effect.

rrss · on Oct 3, 2019

No, that page is about kernel performance, not sanitizer performance.

The idea is that if you omit READ_ONCE/WRITE_ONCE when they are needed, it doesn't matter if the code is faster because it is wrong.

dekhn · on Oct 3, 2019

"brakes make my car slower, so I'd expect my average velocity to decrease if you fixed them"

mindcrime · on Oct 3, 2019

"brakes make my car slower, so I'd expect my average velocity to decrease if you fixed them"

Just to be pedantic... in some contexts better brakes actually allow you to go faster. Consider racing on an oval track... a car with better braking ability can maintain speed longer as it approaches a corner, then scrub off speed more quickly to navigate the corner. A car with inferior brakes has to start slowing down sooner or risk crashing by taking the corner too fast. So better brakes can lead directly to faster lap times.

magduf · on Oct 10, 2019

>Just to be pedantic... in some contexts better brakes actually allow you to go faster.

Yes, and in other contexts, better brakes make you go slower. Better brakes means bigger brakes: larger rotors (discs), larger and heavier calipers, etc. Larger brake rotors take more energy to accelerate (they're basically a bigger flywheel), so they decrease the car's acceleration. So if your goal is to have a car that accelerates as fast as possible, better brakes are actually a big detriment.

nickserv · on Oct 3, 2019

Indeed, disk brakes were first used on race tracks before being available on street cars.

discreteevent · on Oct 3, 2019

And some people now argue that disc brakes on road bicycles are more dangerous because you learn that you can brake later and more suddenly. I.e You will more frequently approach the limit of friction between the (very narrow) tyre and the road.

magduf · on Oct 10, 2019

>And some people now argue that disc brakes on road bicycles are more dangerous

Some people argue that the Earth is flat. That doesn't mean they have a good point.

The morons you're referencing probably think cars should all go back to drum brakes and bias-ply tires so people think more about brake fade and tire grip. It's an idiotic argument. Disc brakes on bikes are better in every way, except weight (they add about 1 pound, maybe). Reducing performance available to a cyclist doesn't make any sense at all; you never know when you're going to need to stop suddenly in the real world.

mindcrime · on Oct 3, 2019

I had not heard that one. I always thought the main argument against disc brakes on road bikes was they they aren't good for long descents where you're riding the brake for long periods of time... due to heat fade or whatever.

Of course both things could be true...

magduf · on Oct 10, 2019

>I always thought the main argument against disc brakes on road bikes was they they aren't good for long descents where you're riding the brake for long periods of time... due to heat fade or whatever.

No, you have it the other way around. Rubber brake pads on wheel rims fade with heat; disc brakes are able to dissipate far more heat, and are basically essential for long descents.

The only valid argument against disc brakes on bikes is that they weigh a little more (maybe 1 pound). I suppose you could also argue that brake fluid is more trouble to deal with than a cable, and certainly not as easy to jerry-rig, but hundreds of millions of cars use hydraulic brakes without any trouble these days, and I wouldn't want to have jerry-rigged brakes anyway.

mhh__ · on Oct 3, 2019

Indeed. The insane thing about a modern F1 or LMP1 car isn't the acceleration (or at least not as much, the hybrid systems in a modern x car change that) but the braking.

Some drivers have complained about rain because the tears have been literally pulled out of them under the deceleration

loeg · on Oct 3, 2019

I think dekhn may be making the same point, so no need for "Just to be pedantic." You raise an interesting factoid that many people probably haven't thought about, although it's pretty easy to intuit once you are encouraged to think about it.

dekhn · on Oct 4, 2019

No I wasn't making that point. The side-pedants talking about racing cars, those represent a tiny amount of braking activity compared to the total amount of braking activity across the world, so my statement about expecting the average velocity to go down still stands.

Also, I drift corners instead of braking, it seems to be a lot faster if you nail the tire orientation on the exit.

mindcrime · on Oct 4, 2019

Also, I drift corners instead of braking, it seems to be a lot faster if you nail the tire orientation on the exit.

Fair point. As the old saying goes "looser is faster". The risk, of course, is that you wind up in the wall with your velocity = 0. :-)

X6S1x6Okd1st · on Oct 3, 2019

In factorio the research you perform to get your trains to run faster is braking force research.

Craighead · on Oct 3, 2019

That helps that posters arguments.

mindcrime · on Oct 3, 2019

That's fine by me. I don't have a dog in that fight, so to speak...

babuskov · on Oct 3, 2019

It's more like traffic lights being turned off.

vanderZwan · on Oct 3, 2019

I grew up near a city where they set up the traffic lights along one of the main roads in such a way that if you kept to the speed limit you would always arrive at the next light just after it turned green, so you never would have to brake or accelerate. I wouldn't be surprised if there are concurrency patterns analogous to that.

Fnoord · on Oct 3, 2019

Ah yeah, groene golf (green wave [1]). I grew up remembering my parents also using that. Except it never bloody worked.

[1] https://en.wikipedia.org/wiki/Green_wave

devicetray0 · on Oct 3, 2019

Wow, from that article:

> In the UK, in 2009, it was revealed that the Department for Transport had previously discouraged green waves as they reduced fuel usage, and thus less revenue was raised from fuel taxes.

Ugh...

vonmoltke · on Oct 3, 2019

Perverse behaviors like this are what make we worry about the calls to tax negative externalities without earmarking the funds for addressing those externalities. If you tax the externalities, but get to use the revenue for something else, what is your incentive to reduce the behavior causing the problems in the first place?

aeorgnoieang · on Oct 3, 2019

Yes! This is a point I raise frequently. My go-to example is cigarette taxes being used to fund public schools. They should be used to cover costs (borne by the relevant government) of people smoking cigarettes! Otherwise, as has happened, school funding can be hurt by people smoking less, which is not a good situation to have created.

The degree by which governments routinely violate the accounting principle of matching revenues to expenses is awful.

loeg · on Oct 3, 2019

I think you're describing the problem of earmarking funds at all. In general it's just a stupid posturing maneuver. Spending is fungible; it just replaces the amount contributed from the legal body's general fund. As needs change, general fund money can be spent as needed. Earmarked funds are stuck.

A carbon tax on consumers (incl. businesses) is effective at producing good market behavior regardless of how the tax revenue is spent.

vonmoltke · on Oct 3, 2019

> A carbon tax on consumers (incl. businesses) is effective at producing good market behavior regardless of how the tax revenue is spent.

In the example I reacted to, a government agency actively worked to inhibit that behavior in order to preserve its revenue. Your point would hold if governments didn't have the power to do things like this.

> Earmarked funds are stuck.

If the revenue is being raised to pay for damage caused by society, those funds should be stuck to that purpose and should rise and fall based on the level of damage being done.

loeg · on Oct 3, 2019

> In the example I reacted to, a government agency actively worked to inhibit that behavior in order to preserve its revenue. Your point would hold if governments didn't have the power to do things like this.

A governmental bad actor is basically orthogonal to the idea of a carbon tax. The government could equally try to increase sales of liquor and cigarettes for sin taxes, or go around murdering people for the estate taxes. The UK example is shameful but reflects on that government body rather than the specific tax on petrol.

thomasz · on Oct 4, 2019

This is very likely an urban myth.

aeorgnoieang · on Oct 4, 2019

The problem isn't earmarking funds in general. Money has to be allocated somehow and it's good to do so explicitly.

The problem is earmarking or allocating a variable set of revenue as funding for something unrelated, and in particular something unrelated that's otherwise valuable or important.

And NOT earmarking or allocating 'sin taxes' to pay for the supposed costs of those sins, borne by the public in the form of the government, seems pretty stupid (to me) if it's actually necessary or desirable for the government to 'manage' those sins or those sin's consequences.

red75prime · on Oct 4, 2019

Ave, Government, morituri te financiert.

smolder · on Oct 3, 2019

It makes me really disappointed in the world every time I hear stories like this. Supposedly we chose 110v power over a higher voltage in the US so we could sell more copper in wires. And everyone can see Health Care is a two-faced profiteer-filled industry here.

philjohn · on Oct 3, 2019

Yes - but smart motorways are doing a pretty good job of handling congestion at peak times by keeping traffic moving at a slower pace, which ends up with everyone using less fuel ... so at least UK roads are becoming slightly better in that regard :)

KirinDave · on Oct 3, 2019

Yes, well, the green wave is a good idea in principle, but it's sorta mistaking averages for reality.

There is this thing called the "waiting time paradox" [0] (or more generally the "inspection paradox") that suggests a surprising thing, the average time between discrete events observed by many other randomly interspersed observation is the same as the average time between events. This should be surprising, because it suggests that many more people experience times LONGER than the average wait to balance out people who experience a shorter wait because they were randomly closer to the event.

This happens because longer intervals, when they do exist, get a larger proportion of the random observations than the shorter intervals between events. More precisely and generically, when the quantity being observed affects the observer, the observations will be distorted by the quantity in question and have to be normalized.

In the case of a green wave, those rare times when someone waits too long on a green because they were looking at their phone or a naked person in a nearby window and create slowdowns and stalls in the pipeline? Those moments are longer, so there is a longer window where you could experience them.

This actually comes up a lot in observations of things in nature. Imagine if instead of light flips or bus arrivals we were observing clicks on a geiger counter and you'll realize how fundamental this is to experiencing the world.

tl;dr: The Green Wave only works in averages and it can be correct even as your experience of it never working is also correct. :)

[0]: https://jakevdp.github.io/blog/2018/09/13/waiting-time-parad...

mmjaa · on Oct 3, 2019

Here in Vienna, it works great. Its often quite amusing to me, as I ride a slow moped - about horse speed - and get passed by irate drivers all the time. However, I ride that green wave, while they are constantly stopping/starting at the lights ...

babuskov · on Oct 3, 2019

Works fine in my town (population around 100k). 50kmph (around 31mph) and it's all green for some 20+ crossroads.

jandrese · on Oct 3, 2019

There is a road near where I used to live that was the opposite. Speed about 10mph over the limit (55 in a 45) and it would be green all the way. Go the speed limit and you get stopped at every single light. It was shockingly reliable. The difference between speeding and going the speed limit was 15 minutes on a 20 minute drive. The only time it got broken up was the one intersection with a pedestrian crosswalk where people sometimes hit the button.

Of course traffic was heavy enough that you'd be stuck behind someone doing the speed limit most days, but sometimes you got to ride the wave.

perl4ever · on Oct 4, 2019

As I wrote in another post, there's a highway near me where you almost always hit every red light, but it's designed such that going say 20% over the limit won't work either. I believe some people deliberately go 20% under to reduce the waiting.

jsgo · on Oct 3, 2019

I'd be willing to give that a shot; I drive the speed limit in my town (population estimated at about 15.6k), but the lights seem more in relation to current traffic at the intersection. So you catch this red light and you're going straight, you stop and wait. Now when you approach the second light, because you and others were held back there was no one approaching the light so it will likely be red or turning red when you approach. To actually get out of this loop, you basically have to speed up a bit which seems like a behavior that shouldn't be rewarded as such.

mikestew · on Oct 3, 2019

Sounds like sensor-based lights and not timed lights? Regardless, Redmond, WA is full of sensor-based lights. Hence I catch myself speeding up to an otherwise empty lane and a green light. 'cuz if I don't hurry up, it'll turn red.

Timed lights work, I've seen it in action. Top of my head, Indianapolis 20 some years ago, Capitol Avenue going north I'd bet you could go from (what is effectively) Zero Street to 38th and not hit a red light.