Hacker News new | past | comments | ask | show | jobs | submit login
Try to make sudo less vulnerable to Rowhammer attacks (github.com/sudo-project)
179 points by trebligdivad 8 months ago | hide | past | favorite | 140 comments



No.

Do not want.

Rowhammer is a hardware problem ---- defective RAM --- not a software one.

The sooner everyone starts returning defective RAM and putting pressure on the hardware manufacturers to maintain correctness, the sooner we can stop this descent into insanity.

"They can always work around it in software" is the attitude that let Rowhammer exist, and continuing to fulfill that expectation will only make things worse.


This is misleading.

I recommend [1] as an introduction to the semiconductor physics behind the Rowhammer problem. Rowhammer is an instance of the "weird machine" problem behind many security problems, i.e. a mismatch between two abstractions: the abstraction we pretend describes the system, vs the reality of the system. In the case of Rowhammer, that is the abstraction of memory as a digital device, against the reality of storing bits with capacitors and wires, ie. analog devices. Clearly a leaky abstraction. The denser you pack those capacitors and wires, the more leaky.

[1] A. J. Walker, S. Lee, D. Beery, On DRAM Rowhammer and the Physics of Insecurity. https://ieeexplore.ieee.org/document/9366976


I think it's important to differentiate "a mismatch between two abstractions" and "hardware bug". Because you can frame any sort of hardware (or even software!) problem like this:

"Capacitor plague of 2000 was a mismatch between two abstractions: the abstraction that capacitor actually provides datasheet-described amount of capacitance vs the reality of the system"

"Toyota unintended acceleration was a mismatch between two abstractions: the abstraction that ECU properly responds to accelerator pedal release vs the reality of the system"

Yes, digital systems are made of analog parts, but that's not a reason to accept systems behaving out of spec. For the last 50 years, the specifications for RAM have been pretty clear: as long as all datasheet requirements are obeyed, the only way to change stored data in one location should be to do a write to that location. If a memory chip does not act according to its own datasheet, it's not a "leaky abstraction", it's a hardware bug.

(Now, can this be fixed economically? I don't know, I could believe the answer is "no". However, the solution in this case is not software workarounds, but rather to make a new spec: "RH-RAM is like regular RAM but cannot tolerate certain access pattern")


I think if you (row)hammer hard enough, every DRAM will eventually flip a bit.


Read the original Rowhammer paper where they tested various manufacturers and years --- this only started showing up around 2009, and DRAM from before that time was entirely immune to it.


Sorry, I should have said: ... (row)hammer hard enough, every sufficiently dense/modern DRAM ...


> Rowhammer is a hardware problem ---- defective RAM --- not a software one.

It always amazes me how people can be so confident yet so wrong.

It's a problem of physics - there's various ways to try to mitigate it but the only way to completely avoid it would probably be to use SRAM and that is going to be extremely expensive when talking 16GB and not nearly dense enough.

It's not some conspiracy by "Big RAM"


> It's a problem of physics - there's various ways to try to mitigate it but the only way to completely avoid it would probably be to use SRAM and that is going to be extremely expensive when talking 16GB and not nearly dense enough.

And yet, when the bit flip problem caused by physics got so bad in DDR5 it couldn't be ignored they did fix it - by adding error correction codes. It wasn't that expensive to do so. Notice that HDD's hit the same problem as the got denser, and solved it in the same way (ie, by throwing lots of ECC at it).

I agree with the original poster. It's a hardware problem, caused by the manufacturers pushing the limits. And it's their problem to fix, which they can do. DRAM that doesn't corrupt itself isn't a big ask.


There are plenty of problems of physics in the RAM design, and it's hardware designers jobs to find the operating regime where those problems do not matter.

I mean, if you had a DRAM labeled "DDR4-3200" but it can only work at a much lower speed (say DDR4-2400), it's clearly a problem of physics - the gate capacitance is too high, the driver transistors are not strong enough. And yet my reaction would be to take that RAM back to store and get my money back, not to defend manufacturers which claim false things about their chips.


> there's various ways to try to mitigate it but the only way to completely avoid it would probably be to use SRAM

Why wouldn't storing a cryptographically-secure checksum on every RAM row work?


"It always amazes me how people can be so confident yet so wrong" could apply to your post too.

Yes it's a problem of physics and it's because they are trying to make DRAM too dense.

It's a problem of physics - there's various ways to try to mitigate it but the only way to completely avoid it would probably be to use SRAM

It's not some conspiracy by "Big RAM"

Look at the evidence. This didn't start showing up until around 2009-2010, and the industry managed to convince authors of widely-used memory testing tools to downplay the severity and/or not enable RH tests by default, because they didn't want the truth to be known that almost all RAM is defective. It might not be a conspiracy, but it sure is corporate greed.

Would you rather pay a little more for RAM that will work correctly under all access patterns, or RAM that is certain to produce bit errors under some conditions that can be encountered in practice? Unfortunately, with newer DDR3 and later generations, it seems you don't get a choice.


You get a "choice".

The choice just boils down to "buy pricey server boards/chipsets that have ECC RAM available" or "get bent".


Apparently ECC does not prevent this

https://www.vusec.net/projects/eccploit/


Awesome link - the ECC side channel seems obvious in retrospect.

ECC still superior by a longshot, this information notwithstanding.


Came here to say the same thing... take my up-vote. ;-)



Not exactly; these constants in sudo are an enum of sorts (actually preprocessor macros). It's not just bool (and won't just be bool in many situations). It is cool to see GCC exploring automatic protection in this space; I just don't think it is relevant to what sudo did here.


Hardbool lets you use custom true and false representations with higher hamming distances. The sudo patch uses custom representations for their enum that have higher hamming distances. The only difference is that hardbool is for true/false and this patch is for AUTH_SUCCESS/AUTH_FAILURE/AUTH_ERROR etc. But that's irrelevant. It's the exact same technique.


> The only difference is that hardbool is for true/false and this patch is for AUTH_SUCCESS/AUTH_FAILURE/AUTH_ERROR etc. But that's irrelevant.

It's very relevant! The problematic comparison in this code isn't true/false! A feature that only protects true/false does not help here.


Couldn't you combine bools for bitwise masks to get this?


Define the enum to be represented as a two's complement integer made of hardbools.


i enjoyed this part:

  #define AUTH_SUCCESS  0x52a2925 /\* 0101001010100010100100100101 */
  #define AUTH_FAILURE  0xad5d6da /* 1010110101011101011011011010 */
  #define AUTH_INTR  0x69d61fc8 /* 1101001110101100001111111001000 */
  #define AUTH_ERROR  0x1629e037 /* 0010110001010011110000000110111 */
  #define AUTH_NONINTERACTIVE 0x1fc8d3ac /* 11111110010001101001110101100 \*/
going to see how i can work this into a project :)


I'm not sure how those values are derived. Yes, the Hamming distances between them should be maximized, but the current values don't seem to be optimized for that:

    SUCC FAIL INTR  ERR NONI
       0   28   20   11   16 AUTH_SUCCESS
      28    0   12   19   14 AUTH_FAILURE
      20   12    0   31   16 AUTH_INTR
      11   19   31    0   15 AUTH_ERROR
      16   14   16   15    0 AUTH_NONINTERACTIVE
Sure, AUTH_SUCCESS and AUTH_FAILURE have a Hamming distance of 28, but it takes only 11 or 16 bit flips to go from AUTH_ERROR or AUTH_NONINTERACTIVE to AUTH_SUCCESS. (AUTH_ERROR can only happen from an internal error, so I believe AUTH_NONINTERACTIVE is easier to trigger.)

A quick Python search was able to find some alternatives:

    0x0f7b74c5 0x810d2b99 0x63a64616 0xcab4a865 0xbe705abb
    ...maximizes all distances (17--19)

    0x28d803a4 0x352ef6d3 0xdb61dce1 0xb3edf85c 0xe62f7508
    ...maximizes a distance from the first and others (21--22), disregarding other pairs (14--21)
It seems that fixing one element to be a bitwise negation of the first element is not a good search tactic in my short testing. Also as notpushkin noted, if you really want to disregard other pairs you should just make one pair with the maximal distance and derive every other code from them (say, -1 0 1 2 3 would work for this purpose).

By the way, finding a binary code with maximal Hamming distance is an open problem [1] [2].

[1] https://www.win.tue.nl/%7Eaeb/codes/binary-1.html

[2] https://math.stackexchange.com/questions/4288902/generation-...


> By the way, finding a binary code with maximal Hamming distance is an open problem [1] [2].

This was my next question. It'd be great if there was an algorithm for finding N codes as close to equidistant as possible.


> I enjoyed this part

Very nice indeed. Such a simple mitigation and it makes evil people sad, which makes me happy.


This is for local sudo privilege escalation.

If the attacker is already running code on your system, you kind of lost anyway.


Not really. An example out of top of my head, where this still might be useful are login nodes (used in many research clusters to allow users to enter and sumbit jobs) or shared web-hosting servers (few of those definitely still exist). There legitimate non-privileged users can run their programs and the end goal is to prevent them from getting root.


The last time I looked at the statistics the majority of the internet was still running on PHP, mostly wordpress installs. I'm willing to bet those are mostly on shared hosting with accounts separated by nothing but their linux user.


Last time I looked, 70% of the internet runs on cPanel, which uses my perl compiler.


Another common one is for example minecraft / source engine game server hosts as they commonly allow customers to install mods.


Most of the cloud works this way unless you are using bare metal / largest size instances.


That's the section that made me post this snippet; crazy isn't it?!


can't edit original post, but i just realized after lining up the monospace how

  #define AUTH_SUCCESS        0x52a2925  /* 0101001010100010100100100101    */
  #define AUTH_FAILURE        0xad5d6da  /* 1010110101011101011011011010    */
  #define AUTH_INTR           0x69d61fc8 /* 1101001110101100001111111001000 */
  #define AUTH_ERROR          0x1629e037 /* 0010110001010011110000000110111 */
  #define AUTH_NONINTERACTIVE 0x1fc8d3ac /* 11111110010001101001110101100   */
AUTH_FAILURE is still just !AUTH_SUCCESS (and almost a palindrome)


It sorta is, but isn't actually, sadly. I mean, if they were 28-bit values, they'd be binary complements, but they're actually 32-bit values, so they're really:

  #define AUTH_SUCCESS        0x052a2925  /* 00000101001010100010100100100101    */
  #define AUTH_FAILURE        0x0ad5d6da  /* 00001010110101011101011011011010    */
Doing a '!' operation in C on one of them won't yield the other value unless you also zero out the top 4 bits. Close enough, though... I still enjoy the symmetry as you did.

Anyway, I'm curious why three of the values they chose have all zeroes for the top 4 bits. I wonder if there's a security-related reason for that.


I think the theory is that means the most number of bits need to be flipped.

Because rowhammer is attacking the physical memory structure, it can’t function at the level that knows what AUTH_SUCCESS is.

This attack just targets raw bits, so we need to protect these crucial state variables from bit-flips.


I don't get it, what's special about these numbers?


Takes many bit flips to go from one pattern to another.


If that is the only constraint, wouldn't the goal be to be as far as possible from the only success state?

the distance between success and failure is 28


Maybe, but in practice malware that makes sudo always fail is also bad. At the same time, getting 28 precise distance from row hammer is basically impossible.


i think their bitfields seem specifically chosen to mitigate rowhammer attacks (no repeating elements)?


I'm slightly bothered that the numbers don't have the same number of digits. The rows are not perfectly aligned!


I'm also wondering why the bits are alternated in the numbers. Why can't we just set AUTH_SUCCESS to 0xffffffff and all the denied / error states to mostly-zeros?


Because you don't want to have the wrong error code even if it's a failure code.

Thus you have to figure out how to otherwise handle an enum whereby you can be reasonably assured of its value even with a flipped bit, hence the hamming distance.


This is deeply interesting.

I've sometimes contemplated the possibility of doing things like this to guard against memory errors causing mis-entry to particularly critical control flow paths - this is certainly an example of that. But never heard of anyone actually trying to do this until now.

A "how to write rowhammer-resistant code" writeup would definitely be useful here - even if it is definitely something people cannot do for anything, I can certainly see cases where there is a case for it.


I remember someone making a rust library for hardened bools, though with the idea of protecting against protecting against random bitflips, not targeted rowhammer attacks (though it should work about the same).

The feedback from the rust subreddit was basically protecting the bool but not the if statement is of limited use. Potentially it can even make it worse, since there is now more code being executed that might become bitflipped.

That inspired me to make a crate that periodically checksums your program code while it's running, to make sure it hasn't changed. Got it working on Windows and Linux, but then it ended up like most side projects. Maybe I should polish it up and publish it.


it's a long-known hazard in embedded and highly reliable systems, there are terms like "single-event upset" that might lead you in interesting directions


Single event upsets are well-understood and easy (although not necessarily cheap) to mitigate in hardware -- ECC for RAM, CRCs for data in flight, and voting (or lockstep if detection is sufficeint) for computation. The challenge with Rowhammer is that it involves multiple, correlated bit flips; and depending on details of your system the correlations may be disguised by various remapping layers.


I thought that Rowhammer was a thing of the past. Out of curiosity I found code to test for this and ran it on some of my hosts. My old desktop - I7-4770K/DDR3 - was susceptible. My old server - Xeon X3460/DDR3+ECC - was not. I upgraded the desktop with components based on a Ryzen 7 7700X/DDR5. It tested not susceptible. I'm not sure if that's a result of RAM designed not to be susceptible or that (I think) DDR5 RAM uses modules with on die ECC.

I was not able to test any of my Raspberry Pis because the test code used some facility available on the AMD64 architecture that is not available on the ARM64 processors. The newer Pi 4B and 5 use modules with on die ECC.

It seems to me that ECC should prevent Rowhammer susceptibility. That should prevent it on server grade H/W for anything still in service and newer consumer systems.

I have no idea if Rowhammer affects other architectures than AMD64.


RowHammer is not a thing of the past. In fact, modern DRAM chips are significantly more susceptible to RowHammer due to their increased chip density [1].

[1] https://arxiv.org/abs/2005.13121


In the "countermeasures" section of the linked paper[1], it mentioned that there are some new techniques available, but repeatedly mentioned that they are not yet available in consumer systems. Maybe rowhammer will eventually be a thing of the past despite the increasing chip density.

[1] https://arxiv.org/abs/2309.02545


Thanks for the link. I guess I thought wrong. But I have more questions.

> with RowHammer protection mechanisms disabled

I wonder what this means. Is it S/W mitigations or does it include H/W factors like disabling on-die ECC.

It makes sense to me that with all other things being equal that higher density would lead to more susceptibility to Rowhammer. But as always, other things are not equal. I expect that on-die ECC would reduce susceptibility to Rowhammer and AFAIK that is used for DDR4 and DDR5 RAM, but perhaps not exclusively. Or did disabling "protection mechanisms" include disabling that (if it is even possible.)


The mitigations are usually about limiting the number of times you can access without refreshing. ECC helps in detecting and correcting (obviously) but it doesn't solve the underlying issue that accessing a cell over and over can cause bit flips in neighbors. ECC can be defeated if uncorrectable errors are not fatal or if the attacker can just crash the system over and over. Being able to introduce memory errors is a fundamental and unmitigatable issue that must be resolved by making these errors impossible. This isn't a problem software can solve.


Rowhammer is not architecture specific, since it's the DRAM rather than the CPU.

The paper linked in the patch references other works showing every defence at rowhammer can be bypassed somehow (I've not followed them all) - e.g. it specifically says that ECC and the like can be bypassed.


Interesting. I did not expect that Rowhammer was architecture specific, only that the test I found was.

I also did not expect that the various defenses, including ECC, could be bypassed.


It cannot be completely bypassed.

The attacker cannot control precisely which bits will be erroneous. When much more than 2 bits become erroneous, in a small fraction of the cases no error will be detected but a wrong value will be read at the next access.

However, in the majority of the cases an error will be detected, either non-correctable, or correctable in which case the corrected value will be wrong.

Despite the fact that wrong corrections are possible, in a system with ECC that is configured correctly it should be impossible for a RowHammer attack to escape detection, unlike for a system without ECC memory.

On a computer that is not defective, memory errors happen very seldom, typically one error after many months. Even only 2 correctable errors that happen in the same day represent an event that can be explained only by either a RowHammer attack or by a memory module that has become defective.

Therefore, a well configured computer with ECC memory should alert immediately its administrator when 2 on more errors happen in the same day, even if they had been correctable errors, because this requires immediate action, either stopping a RowHammer attack or replacing the defective memory module.

It would be pretty much impossible for any RowHammer attack to attain its target without triggering 2 or more ECC errors, which will reveal the attack attempt.

Only when there is no ECC the attack can proceed undetected for a time long enough to be successful.


Shouldn't an ECC non-correctable error trigger an immediate shutdown, because bad data could be committed to disk? (I guess unsafe shutdown could cause corruption elsewhere, but that seems like a reasonable risk) If attacks are a serious threat, then it would seem any alert that doesn't trigger immediate action would be risky (i.e. the attacker just erases alerts from logs).


LPDDR4 and above are supposed to have a feature to detect too many accesses to the same few rows and initiate a refresh cycle. Implementation quality may vary.


It has been possible to re-purpose such additional refresh cycles as an additional Rowhammer attack vector, see https://www.usenix.org/conference/usenixsecurity22/presentat...


So on RAM that needed 18,000 distance-1 accesses, they were able to mount an effective attack with 300,000 distance-2 accesses and 5000 distance-1 accesses.

That's not a particularly big assist, and doesn't sound hard to mitigate.

If TRR pushed the rows it refreshes 10% closer to triggering their own TRR, then those 300,000 accesses would have triggered multiple refreshes in the target row.


Have you ever seen any even moderately detailed specification of what the DRAM manufacturers do in this regard? I have not, and I looked. I am deeply sceptical ....

I don't believe that Rowhammer mitigations happen inside the DRAM chips themselves, I think that they are being put into the memory controller that talks to DRAM. Since DRAMs with built-in Rowhammer defences would have to spend transistors on this defence, those transistors would be 'wasted' in situations where Rowhammer is not part of the attacker model.


It makes sense to put it in the DRAM controller for many reasons. One is that the DRAM silicon process is optimized for memory but terrible for logic. Also, a DRAM bank is several chips in parallel to get the data bus width, and they would all have to duplicate the logic.

The disadvantage is that the controller and memory are made by different companies, so standards are required to agree on what access patterns are acceptable.


Agree. The extreme secrecy of DRAM manufacturers about the innards of their chips puts an additional obstacles in the way of memory controllers (MCs) implementing efficient Rowhammer defences. In particular, if the MC doesn't know which addresses are corresponding to neighbouring rows, how can an MC know with certainty that any concrete row is being attacked? (And, to the best of my knowledge, DRAM manufacturers don't give away this information.)


It might be good enough to detect a large number of accesses to any single row and then initiate a complete refresh. This wouldn't be triggered often by normal software. Most exploits have to use cache flush instructions, and with modern several-way-associative caches it would be rare for normal code to trigger it accidentally. In that case, the DRAM maker just has to specify the maximum number of accesses to any row.


My understanding from some previous papers was that many chips put in a small hash table to count accesses, and this table could be worked around.

It's easy enough to deal with if they stop cheaping out. Each row is so wide. A 10 bit counter to trigger neighbor refreshes would barely take any space.


Are you running the Ryzen DDR5 at stock speeds (4800 MT/s) or at some XMP profile.


Everything is at stock speed.


Someone in a comment suggested ...

> gcc -DRND1=0x$(openssl rand -hex 4) ...

That would cause grief to reproducible distro initiatives.

It is perfectly good enough for the error code enumeration to be statically randomized into hard coded constants. The attacker is very unlikely to flip every single bit of one valid value so that it resembles another valid value.

Even if the values were randomized at compile time, if the executable is readable to the attacker, the attacker can learn what those values are.

If the executable is not readable to the attacker, the attacker can just pull a copy of the executable from the distro package: executables are installed from widely used binary packages, not freshly compiled for every system.


> It is perfectly good enough for the error code enumeration to be statically randomized into hard coded constants.

A comment points out that they aren't randomized:

> The values used were chosen such that it takes a large number of bit flips to change from allowed to denied. Using random values doesn't really protect against this attack.


I don't understand, isn't this pointless? I could just change some other data structure or variable, hell, I'll just change the sudo input buffer size and do a stack overflow, or a memcpy size into a heap overflow, or what stops me changing a jne (Jump if Not Equal) instruction to a jg (Jump if Greater) and bypassing the if's?


I'd argue its worse than pointless, at best it does nothing and at worse it seems to make the code harder to understand and audit, which could result in more future vulnerabilities.


The associated paper abstract claims to have broken sudo by rowhammering register values. It stands to reason that these mitigations thwart the found attacks - the commit message points to the paper as the reason for these mitigations, after all.

Preventing known attacks is not pointless at all.


I think the point is that if your known attack I'd "target was shot in the right hand" making them wear a protective glove on their right hand isn't a good defense. You would want two protective gloves, a helmet and bulletproof vest.


Indeed. Trying to write code that can essentially work correctly with arbitrary memory corruption is not something that should even be attempted.


Feels like a language with opaque enums and pattern matching could implement this kind of thing behind the scenes.


Once you assign the values in the C enum, a switch statement is "opaque".

Just inefficient; a jump table optimization is impossible on the values. Speed is sacrificed for security. A jump table itself could be row-hammered to jump where the attacker wants!


Given that the most common use of sudo is to give yourself root to run a command, and malware looking to elevate root can just rig up ~/.bashrc, what use is this patch? What use cases does it apply to and how common are they?


Sudo has much more fine-grained abilities for more surgical use-cases, like giving users the ability to only execute certain commands as a certain user, with detailed logging and auditing. It has a pretty involved config file (the pdf docu for it is 80 pages long), a plugin system, a seperate log format and log server, etc

I also believe those use-cases aren't that common anymore since multi-user systems fell out of favor. There is an argument that most of us could use a vastly simpler tool instead to reduce the attack surface. But that tool wouldn't be sudo, because sudo is built around supporting all these use cases.


doas [0, 1] in OpenBSD is somewhat simpler.

[0] - https://man.openbsd.org/doas.1

[1] - https://man.openbsd.org/doas.conf.5


doas.conf makes things clear to me what I'm enabling.

And we have the OpenBSD folks focused on clarity and security.


Switched to doas a couple of months ago on my FreeBSD box; it’s been a seamless switch.


>and malware looking to elevate root can just rig up ~/.bashrc, what use is this patch?

Apologies for self promotion, but I wrote a relevant blog post that discusses this[0]. Is there any way of mitigating this trivial attack?

I feel like the Unix/Linux security model is broken.

[0]: https://cedwards.xyz/sudo-is-broken/


I’m not following your logic. How does the malicious-but-unprivileged user have write access to anywhere in the sysadmin’s PATH?


The 'exploit' runs under the sysadmin's user. It gets there when the sysadmin inadvertently installs something malicious under their own user, or something they're running is exploited for example.


Haha I have done exactly that as a joke in highschool https://github.com/Visgean/fakesudo


Interesting it’s the exact opposite of Gray code’s goal, I suppose this has been studied, with fixed size words maximal distance problems are tractable.


This got me thinking - why we live with rowhammer and how do you take it seriously.

This is such a crazy hack to get some protection around key variables - by requiring 32-bit manipulation.

Why isn’t this just “done” on the phy layer, or somehow detected automatically in-flight? Does the compiler protect against it? At what performance penalty?

It’s really much nicer just not thinking too much about it and going on believing our bits.


Couldn't compilers be configured to use such values for for any enum type? And maybe even auto-insert the appropriate check in the final unchecked else anywhere that enum type is otherwise exhaustively checked?


The problem with C and C++ is that enum values are explicitly incrementing, even when the user does not specify a literal value for each one. So if anything depends on a specific value (e.g., disk or network formats), this will break it. I think Rust enum values are similar, but I'm not a language expert.


Rust enums are incrementing too so that they can generate machine code with dense jump tables. If you also want the discriminant value for something, you opt into that with e.g. #[repr(u8)] and you can even override the values. Note that unlike in many other languages, casting from a number to the enum is a falliable operation because not all values are valid.

Making something like this into a panic is not a good fit for Rust as-is. Because enums are proven to have only correct values, not only is code written to assume pattern matches cannot panic, but compilers are free to optimize around only having valid values as well. That goes not only for the enum discriminant, but for any associated values being properly initialized values of their respective types.

In a sense, Rust lets you write code as if invalid values never happen, so there's less to check for in your code. It's understandable from the perspective of the abstraction needed for computer code to be "correct" and not just temporarily getting away with Undefined Behavior. There are simpler ways to violate it than just rowhammer, write straight to process memory for example, which can also violate invariants that compilers assumed while optimizing.

If you wanted to compile Rust (or anything else) with a hardening mode that does check what should be redundant values, it would be a lot slower and code that never panicked before would now panic, but it would probably be a worthwhile tradeoff for some programs to opt into. After all, if you built for CHERI or arm64e and got a machine exception from an unauthenticated pointer, you'd be thrilled you mitigated a vulnerability even if it violated your higher-level language model. Defense in depth and all that.

Maybe someone feels motivated enough to write an RFC and prototype for this. It just wouldn't stop at enum values, it should mean all sorts of other things too, such as not eliding any other checks that appear redundant given assumptions like immutability. That's what makes it slow and hard to reason about.


Yes it's possible, but it's not desirable. It wouldn't be backwards compatible, and not safe for shared libraries. It's better suited for a linter-type error/warning.


It's not plausible in C, for the reasons you mention, but it might be more possible in other languages -- Rust, for example, only guarantees specific representations when instructed and doesn't allow for shared libraries without a specified representation, so it wouldn't have either issue for most application code. Dynamically-typed languages similarly should be able to choose enum values at runtime in many cases.


> Dynamically-typed languages similarly should be able to choose enum values at runtime in many cases.

I wonder, is choosing random enum values at runtime more secure against Rowhammer than just having fixed values that were chosen randomly once and compiled in, since presumably the attacking code now has no way to know which bits it needs to flip? If so, it might even be desirable to implement this as a "secure enum" in a compiled language.


From the commit: “The values used were chosen such that it takes a large number of bit flips to change from allowed to denied. Using random values doesn't really protect against this attack.”

It would be neat to see an algorithm that generates suitable values.


The basic algorithm for the 2-enum case from the commit seems to just be `enum { A = rand(), B = ~A}`.

Although I'm not sure if it's optimal, the many case seems to be the same as the 2-case but repeated for every 2 items. I expect they double checked that the amount of bitflips is still pretty high.

Maybe a better algorithm for the many case would be something like the popcnt parallel patterns:

* 0b0101010101010101

* 0b0011001100110011

* 0b0000111100001111

* 0b0000000011111111

Since they would all have equal hamming distance between each of the entries.


The n=2 case also occurs in the commit: https://github.com/sudo-project/sudo/commit/7873f8334c8d3103...

And indeed, the two values ate bitwise complements.


Yeah, I was specifically wondering about n>2. Your approach seems reasonable.


Yes; gcc and clang could, in principle, support an extension like:

  __attribute__((rand)) enum e { FOO, BAR, ... };
which randomizes the values, as an extension.

You only need this in specific places, like setuid programs.

Randomization can be bad because it wrecks build reproducibility; it would have to be tied to the GNU Build ID.

If such an enum is used in any interface between files, the randomization has to be the same in every translation unit.

Maybe the syntax could specify a seed: rand(42).


A comment in the commit notes that randomisation does not necessary mitigate the issue.

Which is why only a couple of the values are random, with the others being those values, XOR'd with 0xff*


Peter Gutmann goes over this type of mitigation in the context of glitch attacks: https://www.youtube.com/live/IyeDSyvYvZs?si=wkapFNXp8-N28vEb...


Is there any reason for the void cast here? Theres no return value in use.

(void)strlcpy(des_pass, pass,sizeof(des_pass));


strlcpy returns a size_t, so just to silence the discarded return value warning

https://linux.die.net/man/3/strlcpy


Sure; it's just irrelevant to the diff's logical change.


I had the same q! Perhaps a comment would have helped given the context for the PR.


Seems like an interesting compiler level protection, a decorator for enums so that they are compiled to values with maximum hamming distance between them.


Moron disclaimer: I may not be one entirely, but I frequently emulate one.

As a full-time Linux user, I haven't used sudo for years. Rather, I do 'su root'. However, I noticed several years ago (Debian) that upgrades would iterate twice, seemingly accommodating two accounts. Emulating a moron as I do, I never exerted the effort to learn why. I simply began, after 'su root', entering 'sudo su', which despite always having sudo disabled, seems to make me proper root.

I'll often use synaptic package manager when I want a cleaner, easier interface to explore packages. If I only 'su root' it won't open unless I append .... something similar to 'pkex' to the end, but if I do 'sudo su', I can run it using only 'synaptic'. Regardless, I refuse to use sudo otherwise, even when I'm emulating something sentient.

Be afraid. There are morons using Linux, and some are quite productive despite.


You should use 'su - root' to get a root login shell. Otherwise you will still keep your old environment, including $HOME and $USER.

(The man page recommends using --login over the single dash, but it also says they are equivalent. Maybe I'm too much of a moron to understand the difference, but the single dash is less typing)


This software mitigation technique for Rowhammer could also be useful for improving the reliability of programs running on microcontrollers in high radiation environments, e.g. satellite in Earth orbit.

By maximising the Hamming distance between binary values used in enumerations (enum variables) and boolean variables, one could detect if the code has entered an invalid execution path and then trigger a watchdog reset, e.g. an else statement or a switch statement with a fall through case that would not normally be reachable.


Having to code like this everywhere would be hell!

Scroll to the top for the reference to the Mayhem attack it's trying to guard against.


Does anyone have any opinions on doas vs sudo? I've heard doas recommended as being more minimalistic, and various advantages that brings. What are the pros and cons between the two?


My opinion is to have neither. Requiring users to switch to an account that has different privileges is evidence of poor design of the operating system. Having a root user who has full privileges over the entire system is also poor design as it is the opposite of the principle of least privilege. If a user has the privilege to do something they should be able to do it with their normal account.


I'm happy that my mistakenly typed rm -rf / fail when I use my non root user (normal account) even though I also have root for when I do want to mess things up.


If your use of sudo featureset is minimal, it's true that doas can act as "reduced attack surface" replacement.

Cons: humans (and some software) expecting "sudo" will work while interacting with your system.


When do these values get into memory, shouldn't they be in registers generally? Maybe it's when the program loaded to memory to execute, but is that part rowhammerable?


If a task is preempted it will have its registers saved to memory. So you can never assume that some state won't live in memory at some point.


I wonder how this might work with bit-dense systems (e.g. databases like PostgreSQL) where every bit has meaning and it's wildly impractical to reassign bitpatterns like this.


Really struggle to understand the thread from Rowhammer.

It would seem to be particularly dangerous, but then I don't see everybody getting pwned.


Good job, though it doesn't really help when you have a buffer overflow leading to RCE in a suid binary (remember sudoedit CVE?)


Even without an explicit attack, today's memory is pretty fragile, I'm regularly seeing bit flips.


I've seen bitflips on overclocked systems. Some ram is overclocked out-of-the-box; try disabling BIOS features like "XMP" (Intel Extreme Memory Profiles). Always use MemTest86 or similar to make sure your hardware is good when building a new system or changing BIOS settings.


> I'm regularly seeing bit flips.

How?


For example 7zip decompression CRC failures that resolve on a second try. It would be longer to explain how, but I tracked it multiple times to a single bit flip in the decompressed output.


Thanks, that's pretty interesting. I'd actually be really interested if you would care to explain how, I think others would find it interesting also. I'm not even sure how I would approach trying to do that.


Typically decompressing software deletes the output file if there is a CRC error.

But I use 7zip as a library, so if I get a CRC error I still have the output. Which in my case is JSON files. By careful diffing and going over them I could identify single bit flips like '{"base": 10}' being decompressed to '{"bbse": 10}'. I made this example up, letter 'a' becoming 'b' might be a multi-bit flip, but you get the idea.


I've also seen that, often enough that a hardware issue is extraordinarily unlikely. It's almost certainly just a boring old software bug.


Popular compressors like 7zip are some of the best tested software on the planet, because corruption is immediately detected.

And we know hardware bugs are real, that's the whole point of rowhammer.

You might want to look at Facebook research:

> “Silent data corruptions are not limited to rare one in a million occurrences within a large-scale infrastructure. These errors are systemic and are not as well understood as the other failure modes like Machine Check Exceptions.” A large part of the responsibility should be shared by device makers, Facebook says.

https://www.nextplatform.com/2021/03/01/facebook-architects-...


tfw hardware becoming so unreliable people start using 32-bit maximum distance codes for enumerations with like four values.


This wikipedia article must surely be inaccurate:

https://en.wikipedia.org/wiki/Row_hammer

    The initial research into the row hammer effect, published in June 2014, described the nature of disturbance errors and indicated the potential for constructing an attack, but did not provide any examples of a working security exploit. [1]
[1] (June 24, 2014). "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors"

By my recollection there was a discussion of rowhammer and making it work on a (original) Freenode channel circa 2010 (or earlier) in response to a related thread on a reddit security hacking subreddit.

ie: it was being discussed in public channels some four years prior to a paper cited as "initial research".

Addendum: Mind you, lots of things get kicked about and implemented before actual papers appear on them for the first time in public.


Well, that quote from Wikipedia says the published paper didn't include a working exploit. That might be true even if a working exploit was available after the paper was written and submitted but before it was published. (I don't know if this is the case here, but this sort of thing is common in scientific publishing.)


To clarify:

    Mind you, lots of things get kicked about and implemented before actual papers appear on them for the first time in public.
I was thinking of many examples I know where a technique is developed and used in industry (mineral exploration | remote imaging | secret spook stuff) and much later (five years or more) gets a first mention in acedemia .. where they may or may not get a robust working version happening .. just a rickety proof of concept.


The possibility of flipping bits in DRAM in a Rowhammer like fashion, was known in the DRAM industry since at least the 1990s (sorry, no reference handy), and Rowhammer-like access was used in DRAM quality testing.

As silicon density increased, the issue became more urgent.


That matches my recollection- I started in broad STEM at university in the 1980s and as chip sizes were pushed smaller and denser there was always thought given to signal bleed | harmonics from too many lines too close together.

I suspect "observed in fabrication lab | not disclosed" dates back some years before the paper .. once observed there's always a path to exploitation - but why would anyone broadcast that?

By the time it was chit chat on IRC the general feeling was that some TLA has a working exploit. (obviously unpublished).


[flagged]


Rust makes a particular class of bugs harder to write. That’s it. It doesn’t magically eliminate all bugs. “Susceptible to rowhammer” is not in the class of bugs that Rust helps with.

No experienced Rust programmer actually believes it magically prevents all bugs or magically makes security-sensitive code immune to side channel attacks, so I don’t think anyone is being lulled into a false sense of security, no.


What about the Non-experienced Rust programmer? A lot of open source code are written by inexperienced people who understand the nuances of computer science primarily through hype. I think those are the kinds of people OP was asking about.


An inexperienced programmer is much more likely to make security-critical mistakes writing C or C++ than Rust anyway, even if they’re aware of the concept of undefined behavior, so I think Rust still has an advantage in this case.

You should avoid using security-critical tools written by people who don’t understand security, regardless of language.


That’s not a rust problem. Misunderstanding and misusing a tool is an inexperienced person problem. It’s one thing you pay veterans more for.


I don't think there are going to be a large number of inexperienced Rust programmers. Inexperienced programmers write Javascript or Python, not Rust.


Yet the vocal majority of Rust users present themselves as being inexperienced programmers. I expect it is not just an act – that the vocal majority truly are inexperienced, and I expect the segment of users who are novices is much larger than you suggest.

The fact of the matter is that the novices have always been drawn to the 'hot new technology' and it is unlikely that Rust, being today's 'hot new technology', is the exception. Indeed, Python and Javascript had time in the sun when they were considered hot, and novices were attracted in that direction at that time, but time continues to march forward.


> the vocal majority of Rust users present themselves as being inexperienced programmers

No they don't.


They do. To be fair, vocal majorities generally are inexperienced, even outside of Rust communities. Of course they are. What would someone with experience gain from such vocal exchange?

The vocal majority of Rust users are especially vocal in tech circles right now, though, because of it being the "hot new technology" and thus most attractive to inexperienced programmers. This does make their inexperience stand out in an especially prominent way. All "hot new technologies" have gone through this pain period.

That doesn't mean that there aren't experienced Rust users, but experienced users have no reason to talk about it. Their experience has already collected everything that could be gained through vocalization. Talking about it becomes boring at that point.


Some novices do in fact write Rust, just like how many of them gain experience in C and C++ during undergrad. With any luck, universities will adopt something safer like Rust…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: