RAM encryption for sensitive data is overlooked in so many applications, even "highly secure" applications like veracrypt [0] only recently started adding it.
In my opinion server-applications of all sorts should encrypt their private keys by default; this makes cold-boot attacks and other memory-escape attacks so much harder, since now two totally unrelated memory chunks have to be combined in order to retrieve the private key (in fact one could argue that, the higher the quantity of memory chunks is, the harder it is to correctly decrypt the original private key; although to me this has the security through obscurity smell).
In practice however, this is rarely done. OpenSSL for example has a whole lot of memory-management functions [1], but none of them seem to include RAM encryption (AFAIK it is not present in the source code elsewhere either, but it's a large one, so I am not familiar with all code. Related: in fact some people even argue it's way too large for it's own good [2]).
Even better still, would be to use the CPU Cache for private keys, since this memory is even more difficult to access through any sort of cold boot. An interesting paper with much more information about this subject can be found here [3], to whom it may concern.
Yup, we added this feature to Varnish Cache a few years ago, random key encryption. It generates a random key at startup and encrypts all memory with it. Since this kind of memory is only resident for the lifetime of the process, it works. We stored the random key in the Linux kernel using the crypto API [0] just because its not safe storing any kind of keys in a memory space used for caching (Cloudbleed [1]). We then use the key to generate a per object HMAC, so each piece of data ends up with its own key, which further prevents something like Cloudbleed. Since we used kernel crypto, overhead was about 50%. If you stay completely in user space, its probably much lower.
Could this be implemented at the OS level, i.e. whenever a proces launches, the OS generates a key that it will keep to itself and use to transparently encrypt all memory allocated by that process?
My first thought was to try to use 'containers' (cgroups) combined with the AMD secure memory extensions to achieve this type of isolation using as much off the shelf hardware as possible.
From the quick description it sounds like this provides a way of encrypting, per memory page, based on a symmetric key that is backed by some level of hardware encryption. It was not clear (in a quick read) how or where to specify the key by which an individual page is encrypted. That would be a critical component of comprehension with respect to identifying if this could be used to encipher individual processes and further isolate memory. It sounds like it might be possible to establish per-process memory isolation, which is probably the best level of security possible without resorting to entirely isolated hardware.
Per-process keys aren't really possible because memory can change process ownership (vmsplice) or be shared across processes (fork, page cache, memfd). It might be possible for pages marked MADV_DONTFORK
Additionally a per-process key does not help against spectre style attacks where you would trick the process into speculating on protected memory.
You'd probably want a hardware module to do that lest performance plummets. Memory controllers can already deal with ECC efficiently, adding a simple cypher on top of it should definitely be feasible.
Possibly, but memory is accessed using plain CPU instructions, so it would be hard to transparently encrypt all memory for an application at the kernel level. You do have virtual memory, but I dont think that could be leveraged for this. But who knows whats possible there, maybe if you align and address each memory value at the page boundaries and always force a page fault you could have a really poor implementation :)
Transparent disk encryption, not a problem since devices have filesystems which can implement encryption at that layer.
Modern Intel chips can encrypt memory on the fly without performance loss (SGX does this). However I think it's not exposed for non-enclave use. Perhaps it should be.
Note: inside the enclave there is a performance loss but that's due to MAC checks. If you just want encryption without integrity against tampering you don't need that.
Yes. Most research makes CPU modifications since that makes the most sense. Sometimes they try to use OS-level techniques. Here's a survey showing some of each:
Just to clarify my understanding, the reason for doing this is so that random sampling / leakage of the contents of the RAM stops being useful, you need to specifically get the key (and then presumably a whole chunk of encrypted RAM to decrypt?)?
Yup. When something goes wrong in these kinds of applications, you sometimes tend to just randomly dump memory, which is a huge data leak. Or even worse, if someone figures out a way to force a data leak, then your are completely compromised. Having each piece of data with its own key and that key is a combination of data outside of the process address space drastically lowers the chances of data leakage and total compromise.
Of course you can, and there are plenty of ways to do so that don't break the site guidelines. Cheap, snarky one-liners are not the way. If someone's posting about their own work, there's no need to be disrespectful.
It's also not helpful to post such a clichéd dismissal of what someone else says or their work. That's in the site guidelines too.
We have no problem sharing our codebase with customers, especially if there are concerns like this. Shoot me a msg if you are genuinely interested in anything you have read.
I've tried implementing a couple of (toy) password managers over the years and dealing with private keys is genuinely complicated. Even something as trivial as making sure that the memory gets zeroed correctly when you discard the key is trickier than it seems. Compilers these days are very good at detecting "dummy" memsets to memory that's never read afterwards and optimize them away. You have to use some dirty tricks to get the compiler to do what you want (copious amounts of volatile helps).
There's also the problem of making sure that your internal API is sound and doesn't copy the key around. In languages like C it's relatively easy to achieve (because copying a buffer in C normally requires some explicit code) but in higher level languages you might easily mistakenly pass the key object by copy instead of reference, leaving some duplicate of the sensitive information in some other location in RAM.
I think there really should be some kind of industry-standard multiplatform library meant to deal with secret keys that would implement all that behind the scenes and offer a simple API. It's simply too easy to make a dumb mistake when implementing these things, and you won't notice anything is wrong until somebody attempts to actually recover the key somehow.
> Compilers these days are very good at detecting "dummy" memsets to memory that's never read afterwards and optimize them away. You have to use some dirty tricks to get the compiler to do what you want (copious amounts of volatile helps).
Or you can use explicit_bzero(), which is designed for that use case.
Sure, although this is a non-standard and potentially non-portable extension. memset_s in Annex K is standard, but all of Annex K is optional, and like the rest of Annex K, it has an awful interface.
I agree with you on a standard platform. Id get a company like Galois to build it. They already have and open-sourced the necessary tooling from thrir prior contracts.
Yes, ALL private keys should be encrypted and stored elsewhere. The onion/knot should be unwound from some remote place, so you have to get access to a FEW places in order to get any private keys.
Now, in MY opinion, the whole idea of servers storing OR MANIPULATING unencrypted user-submitted information is what’s wrong with our current Internet culture. This needs to stop. It has massive social and economic effects and creates honeypots for hackers.
The sooner we make end-to-end encryption a basic expectation, the sooner we change our society to have a far more private and safe environment for everyone, and correct massive power imbalances: an organization or user should have power and data and connections and ability to SPAM others because people VOLUNTARILY gave them those things, not because they happened to build proprietary software and run it on infrastructure that is needed to operate the application!!!
> in fact one could argue that, the higher the quantity of memory chunks is, the harder it is to correctly decrypt the original private key; although to me this has the security through obscurity smell
I am not sure this is a bad place for security through obscurity. ASLR is similar. The thing is, here we are not using this is a primary mechanism. (that would be preventing RCE) Instead, this would be a secondary 'defense in depth' measure meant to make it harder to deploy an exploit after you have been compromised.
For such a second layer of defense in depth, obscurity is a decent option. Things go wrong if you start relying on your obscurity to do anything but slow down an attacker.
I feel like the should be a different name for that class of mitigations. Probabilistic security maybe? Security through obscurity for me is "we won't publish our algorithm". But that's not the same as ASLR or memory encryption which targets specific type of attack by adding probably-too-high randomness to the attack path.
Curiously, RAM encryption, and its relative - clearing secrets from memory when not in use - cannot be added to the Bouncycastle library because they use java BigInteger (unless reflection is used of course).
You can't treat memory, even arrays, as singular memory locations in Java. Stuff moves around because collectors are cimpacting.. If you want to "write in place" in Java you need to either use a non-moving garbage collector, use sun.misc.Unsafe to read write memory directly, or mayyyyyybe take advantage of humongous allocations (arrays big enough that the JVM won't move them).
Yes, a direct byte buffer wrapped in class that ensures you can never copy the data somewhere else. That would also require cryptographic routines that can operate on bytebuffers or varhandles, which is quite an obstacle since many crypto libraries consume keys as byte[].
once the operation finishes, the memory is cleared (and possibly subject to gc)
the problem is that some java classes, for instance BigInteger, are not designed for cryptographic operations - it's underlying arrays are not easily accessible (save for reflection).
In that code, after generateKey, during the sensitive operation, the system might need to do a garbage collection, at which point this array might have been copied to a different location in memory before your call to zero. You have to also "pin" (afaik this is the usual terminology for this) that array to a fixed location (which would then have to be a feature of that runtime and garbage collector) after allocating it but before generating the key.
Java also has no mfence or clflush support, so crypto.zero would never be secure. It might overwrite the key in the store buffer only, but not on the heap immediately, so prone to sidechannel attacks.
In .NET you can pin memory in the GC heap, either using something like the fixed expression in C# or GCHandle. A quick Googleing did not turn up something usable directly by Java, except ByteBuffer.
I believe he is referring to compiler optimisation which removes wasteful or extraneous operations. I don't write in Java though, so can't comment more than that.
No. The CPU cache can easily be read out with sidechannel attacks via hyperthreading. It's the similar problem as unencrypted keys at the absolute location, which doesn't get cleared with explicit_bzero, which e.g. libsodium refused to fix. Hopefully crypto maintainers will get to their senses eventually.
libsodium has the sodium_mshield()/sodium_munshield().
For libhydrogen, storing a large prekey can be a problem, so it may only be implemented on some platforms.
If I were writing a cryptographic algorithm in C++, how would I ensure the CPU cache was used for private keys? Would it have to be written in a lower level language, or does there exist a library for C/C++?
If everyone follows this advice, who will write the crypto code? If anything we need lot more people who are formally trained to write proper crypto code and find bugs in such code etc.
Let me qualify that. You're right, we do need a lot more people. But the answer is, don't write your own, write as part of a team. Ideally, a public and peer-reviewed project. The short answer is many people will work on it together, but don't write your own.
When I first saw this post, I got excited that someone had finally addressed the side channel from 17 years ago, which gathered inter-packet timing from ssh packets (and the fact that passwords are not echo'd) to a) detect when passwords were being typed, and b) use Hidden Markov Models to extract keystroke timing to try to extract the likely passwords.
There have been defenses proposed over the years, but none were accepted by OpenSSH AFAICT.
The proper defense is to not use passwords. It's 2019, nobody should be using passwords on SSH-enabled shell accounts.
Heck, I use secure tokens for authentication so this memory encryption hack is useless to me--even I don't know what my private key(s) are, nor does OpenSSH or any other software on my computers. But I appreciate that we're a long way from ubiquitous hardware-based authentication.
The solution there is to have the app use line buffering rather than a "raw" term mode that exposes inter-char timing on the network. How widely that's followed in practice, I do not know, but one would certainly hope that sudo does it.
If you really cannot use keys, then one mitigation is to use copy/paste to paste the entire password instead of typing it one character at a time. That can open some copy/paste vulnerabilities e.g. in X11 where any app can then read the password until you copy something else in its place. And a network observer may still determine the password length. But it closes the inter-key timing channel that permits direct character recovery.
If I understand correctly it's effectively a one-time constant cost when setting up an encrypted connection, so it should be negligible unless you have a use case for setting up many extremely short-lived connections.
SSH has multiplexing [1] specifically to speed up multiple connections... would be interesting to know whether it incurs the overheads once vs. on every connection.
Just keep in mind that multiplexing makes phishing risks significantly higher. ssh wont log multiplexed auth and phished connections are auth free, up to 9 more sessions by default.
Saying it's about the handshake is not a comment on performance impact, and doesn't need a patronizing "TFA". Especially when the article just talks about 'signatures', in a way that doesn't make it obvious exactly when this key is being used.
This code is probably not adding more than a millisecond, but it's not at all true that handshake speed is irrelevant. If you rewrote the RSA code in a slow secure way, going from 100ms to 5s on a slow chip, that would have very real effects.
Ssh isn't just for interactive terminal sessions. It's not that uncommon for people to use ssh tunnels in situations where proper VPNs etc. are blocked. Others use sftp or tunnels plus a SOCKS proxy in one specific application. In those applications, performance regressions are a legitimate concern precisely because it's not expected to be all that great to begin with.
I really don’t see a problem here. If you have some weird edge case that requires a less secure version of OpenSSHd then you should either stick with the old version or fork the new version.
I certainly wouldn't want scp to suddenly slow down compared to what I'm used to. I use scp and sftp rather heavily to transfer very large files and do backups, that might not be performance-critical but it's definitely performance-sensitive.
I haven't had to mess with that in a while but there was a time where you'd get a significant boost with scp on underpowered hosts if you used the arcfour cipher for instance (I believe that it's now fully deprecated, and for good reasons).
If you were willing to sacrifice best practices for performance, one obvious option is to accept only hmac-md5, which is very fast and still somewhat secure.
If the systems involved lack "AES-NI" native CPU opcodes, then you might revert to chacha20-poly1305, which is supposedly faster on CPUs that lack acceleration. This also overrides the hmac-md5 above, as it is an AEAD.
If you needed to go faster still, then you could instead choose ARCFOUR (RC4) as you say, but this has been removed from the latest versions of OpenSSH because it is not safe.
The worst of the above configurations are still not as bad as classic FTP.
Using LFTP's mirror sub-system plus SFTP can be faster than all of the unencrypted protocols. It gives you the behavior of rsync, but can work with sftp+chroot logins and can spawn multiple threads per job or even per file. Working demo [1]
I find it very useful when I want to give people the ability to transfer files quickly, but I don't want to give them a shell.
Why do you think that this specific change would make ssh not "work at all" as opposed to the hundreds of other changes that happen? Do you even pay close attention to those changes? I for sure don't.
Why are you concerned about this in the first place? Obviously there's always a general concern about any changes, but what makes this specific change a big deal to you?
This particular change got a hacker news article. It seems to be inherently more noticeable than others.
Seems reasonable to ask about the performance (and memory) implications of the change. If they're minimal, than this solution can be easily added to other situations where encryption is being used. If it's heavy, then different solutions would need to be developed on a case by case basis.
A vulnerability that has been shown to work should not be patched in the software more widely used around the world to connect all kinds of linux/unix servers and even other systems? They should wait for it to start getting exploited "in the wild"? I'm just glad that the security of my systems does not depend on people with this kind of attitude.
If you want to know wheter it was found "in the wild" and think that this is a relevant factor in deciding if you should care, that is information enough about your usage to be confident it is wrong.
What I was trying to say was I think library users care to know how urgent it is. If this is being used in the wild then application vendors might need to provide out-of-band patches somehow, and end users should rush to get those patches. If OTOH this is known to be extremely hard to pull off and not known to be used, then it'd be nice for users to know that just the same. Nowhere was I trying to suggest they should avoid providing the patch altogether or something.
I agree with the idea that availability of information is good, and that information about the context for a security-related change should be made transparent. But how relevant is it? I would think relevant enough for FAQ or other reference information. I wouldn't include it in announcements, though.
The headline is "patch available, mitigating known exploit". "Not yet widely exploited" is barely a footnote. The release of a patch can bring enough attention to make the window between release and full deployment of the patch the single worst time to be vulnerable. If I tell you it wasn't being exploited yesterday, and you delay patching based on that information, and then the storm of exploits blows through ... I'd feel bad.
That's not how security works. If it can potentially lead to a software like openssh leaking secrets, it is of the highest urgency, period. It doesn't matter if it is thought to be hard to exploit and it doesn't matter if it was already found "in the wild".
Okay, but if 1 of my 3 highest urgency vulnerabilities is known to have been exploited in the wild, and is easy to exploit, then I may want to focus on that one over the other 2.
> That's not how security works. If it can potentially lead to a software like openssh leaking secrets, it is of the highest urgency, period. It doesn't matter if it thought to be hard to exploit and it doesn't matter if it was already found "in the wild".
Really? This isn't how security works? Yeah, I guess I forgot security is a 100% binary thing. That's why you never read actual security bulletins advising you when vulnerabilities are actively being exploited in the wild. It's insane to think that should matter or raise the urgency of a patch. [1] [2]
None. There hasn't even been a real world demo where given normal conditions: a running sshd and other programs where a running browser script exfiltrates a key with meltdown.
Spectre probably isn't possible either and that is the easy one. The load store buffer attack seems completely impossible. Even the POC had to essentially write a program specially to be exploited.
The attacks are very interesting and neat, but I think things like this and other techniques effectively remove any last chance.
Don't underestimate what hackers with enough motivation can achieve. Specially when the stakes are so high and the geopolitical power coming from a software vulnerability can be significant.
I don't think that makes sense. For starters, threat model is everything. To grab an obvious example here, speculative execution attacks are only a concern if an attacker can cause execution of code on your physical host. A secure program facing the network will be secure even if it fails to defend against CPU side channels. Further, reducing to a binary de facto means that literally everything is "insecure", the end. (I defy you to show me a piece of software that never has had any vulnerabilities and never will, since of course undiscovered bugs still make the system insecure)
There are a lot of knee-jerking in this thread. I don't think anybody is arguing that this is a bad patch, but if it does have a notable performance impact (which it probably does not) it's still worth knowing. If only to be able to answer questions like "why do our backups suddenly take 10% more time to complete? Do we have a problem with our network infrastructure?"
It's not about tradeoffs, it's about understanding what's going on and anticipating potential problems.
By your logic we shouldn't have to discuss the performance regressions on CPUs who implement spectre/meltdown mitigations because "it's irrelevant". Obviously these patches are necessary but the performance impact is very relevant for many users.
By the parent's logic, there's no concept of improving security, because if there remains attacks, then it is still not secure. You can't take a position of "it's either secure or it isn't."
> It's not mentioned because it's irrelevant. There is no security / performance trade off for a secure program. It's secure or its not.
Everything in the world is about trade-offs. Security is no exception. Humans figured this out a long time ago with physical security. Somehow it hasn't sunk in for cybersecurity. I would recommend mentioning this to a well-known security expert if you ever come across one and seeing their reaction.
Then there is no concept of "improves" security. Either it is now secure or it is not. Do they have proof that there are now no side channel attacks that can be made on their software?
EDIT: sigh This comment is not arguing that they need a proof for their change. It is arguing that you can improve security without making something completely secure, which undermines the idea that code is 'either secure or it is not'.
This kind of mitigation really only makes sense on shared machines (such as servers). On a desktop OS, if an attacker is in a position to read memory from other processes, it's pretty much game over already.
Browsers implement Spectre/Meltdown mitigation on desktop OSes because without that, JS could read secrets from other JS contexts executing in the same process. One of the mitigations is in fact to just segregate JS contexts into different processes depending on the domain they belong to. But most apps don't execute untrusted code so most apps don't have this sort of in-process attacker to worry about.
It's only using the insecure freezero, which is using the insecure explicit_bzero. A simple compiler barrier only, no memory barrier. so it's unsafe against the advertised spectre/meltdown sidechannel attacks, the secrets are still in the caches.
> Attackers must recover the entire prekey with high accuracy before they can attempt to decrypt the shielded private key, but the current generation of attacks have bit error rates that, when applied cumulatively to the entire prekey, make this unlikely.
It seems the real mitigation isn't the prekey size, but the temporal sparseness of the symmetric key -- since I would've imagined attackers would just try to obtain the symmetric key rather than the prekey. Weird to see they they didn't even mention this... I imagine attackers would try to find a way to get the symmetric key to stay in memory for a while.
The prekey is hashed into the symmetric key. Both the hash function and the symmetric cipher have avalanche effects that mean that N bit errors require the attacker to bruteforce 2^N combinations.
unprotected RSA keys on the other hand have structure and are dense in memory. That means fewer bit-errors and and the ability to guess the missing bits faster than O(2^N).
Oh yeah, but the temporal sparseness doesn't just apply to the symmetric memory encryption key. The most important part is that it also applies to the asymmetric host keys, which are the actual thing one wants to have protected.
You can donate to OpenSSH[0], whose "funding is generally done via the same donation framework" as the rest of OpenBSD, to which you can donate either directly[1] or via the OpenBSD Foundation[2]. If you're serious about donating obviously please check that these links are legitimate and I'm not a scammer. (I'm not affiliated with OpenBSD in any way.)
For your donation, you also received preliminary code for quantum-resistant key exchange.
$ sed -n '/NTRU/,/enabled/p' ChangeLog
sntrup4591761x25519-sha512@tinyssh.org using the Streamlined NTRU Prime
4591^761 implementation from SUPERCOP coupled with X25519 as a stop-loss. Not
enabled by default.
It could use the x87 floating point register stack to store the encryption secret. These registers are not used unless there would be some assembly in the SSH libs that accesses them.
That is true - it's probably more secure than storing them in the process, but several of these side-channel attacks can apply to kernel space (depending on hardware and security patches applied).
It's also not portable - OpenSSH runs on non-x86 architectures, and they might not have spare basically unused registers lying around.
Finally, I'm not sure the x87 registers have enough space to fit these keys. You have 8 80-bit registers, for a total of 640 bits. Your typical SSH private key might be 2048 bits or more.
So it's a fun and creative line of thinking, but probably not practical in this case.
Couldn't they also move the keys around in memory to every second or keep the bytes of the key separated (this seems like it would be similar to encrypting)?
These side channel attacks even under ideal conditions take a very long time and part of the problem is they basically need to guess at memory addresses. Even when data is in a known location, it is sketchy. Anything that slows down locating data would help immensely.
Hm, I wonder if the symmetric encryption with a sophisticated cipher is really necessary in this scenario. The aim is require an attacker to read 16Kb of memory in addition to the much smaller key data, e.g. for each bit of the key, the attacker needs X additional, random bits.
Wouldn't it be possible to block-wise xor the random data onto the key?
Maybe use a windowing mechanism, where the window is moved forward depending on the random data (e.g. for a 16 bit key, xor random bits 0 to 15, then move forward 4 to 16 bits, depending on the current window; iterate until the maximum number of window movements necessary to go over the whole random data is reached [leak less bits via timing]; if the end of the data is reached, again start at the beginning, but with some offset to avoid result_bit_0 = secret_bit_0 xor random_bit_0 xor random_bit_0).
You want every single bit error to avalanche to the whole key. You also don't want bit errors at some offsets modulo X to cancel each other out. Cryptographic hashes provide these properties. For any custom solution you would first have to prove that it has the security properties needed.
The asymmetric crypto of a connection setup takes the lion's share of CPU cycles. I don't think it's worth the risk just to beat the cheap (relatively speaking) symmetric algorithms.
I believe all consumer AMD Ryzen CPUs support Secure Encrypted Memory (SEM), but for some reason it's not enabled by default on most Ryzen devices/motherboards. It's a shame.
I'd also love to see AMD bring an updated and patched version of Secure Encrypted Virtualization (there have been some attacks against it, although still fewer than against Intel's SGX) to consumer Ryzen in the near future. With so many cores available in consumer AMD CPUs (up to 16 now), people will start to use VMs more. Even Windows 10 has the easy-to-use Windows Sandbox now, as well as the App Guard sandbox for Edge.
Not to mention they could use this as yet another "killer app" of their many-core CPUs, because otherwise people will eventually start to wonder why even get CPUs with so many cores over CPUs with fewer cores but higher singlethread performance. No different than say Verizon promoting high-quality 4k streaming on its new 5G network.
I would've already preferred to see this in Zen 2, but at least Zen 3, which will otherwise bring few performance improvements and remain on the 7nm node, should come with these as some sort of "security-focused generation of Zen".
Isn't AMD's encrypted memory meant more for the case of someone with physical access aggressively cooling a running system, then cutting power and removing the chilled memory for analysis (which will preserve contents much longer when cold than at normal temperatures)?
The case of an attack on the SSH Agent would take place within the CPU, perhaps even in the same core on a separate HT/SMT thread, where the memory would be cleartext.
But why? For security critical software, like this, they should assume as little as possible. In essence you want to make the algorithms immune to side channel attacks when possible.
Because the more complexity you have in software, the harder it is to keep it secure. Even security mitigations can potentially introduce another vulnerabilities. This is one of the reasons that as a general rule we should strive for software to be kept simple.
Given the nature of this software, it's natural to have mitigations against side channel attacks. They happen multiple times, and will happen in the future, no matter how secure we believe the hardware is.
With that in mind, it's probably the better strategy to use slower and more complicated algorithms to protect the user. This would mean that when a side channel attack becomes known, if the algorithm already protects against it, nothing have to be done. Unlike if a fix needs to be made, you not face the problem you've outlined. I believe it's better to have a better baseline security at the cost of complexity, because it means less hotfixes needs to be released.
Side channel attacks are only possible because the hardware is currently vulnerable. They are not a law of nature. Once you solve the vulnerability at its root and it becomes physically inexistent, and there's no more running hardware in the market that has such vulnerability, it would make no sense to keep such software mitigation.
Clive Robinson on Schneier's blog predicted lots of these problems after arguing they were a law of nature. He said any form of matter or energy connecting two machines might create a side channel. He said we'd have to clock all the inputs and outputs, make them predictable, and then "energy gap" the systems. We both already knew about CPU leaks since that was described as risky in 1990's. Sure enough, air-gap-jumping malware and processor leaks showed up.
Later, getting a high-level view of hardware reinforced it was a law of nature. First, there's all kinds of RF leaks that attackers might pick up from normal operation. Second, most systems aren't fault/leak-proof if attackers actively hit the system with different physical effects or RF. Finally, each process shrink increases how easily chips, including mitigations on them, break. It looked like the stuff at 28nm was kind of broken by design with fixes and stuff built in to delay failures user would notice.
This all sounds like the laws of physics are a huge obstacle to computers (a) working at all and (b) keeping secrets. Achieving (a) takes hundreds of millions to billions in R&D each year. I can only imaging what (b) might take.
Well yes, but that's a pretty special case. Those running such ancient hardware keep their own software and patches, and you don't see many software vendors supporting them unless their are being very well paid. OpenBSD itself does not support even VAX anymore, and developers felt pretty happy when they finally deleted large portions of specific code. Ubuntu is talking about dropping i386. Given enough time, the burden of supporting old hardware outweight the benefits for pretty much everybody, so it makes sense that it should fall on the shoulders of those who decided keeping the old hardware was a good idea.
In my opinion server-applications of all sorts should encrypt their private keys by default; this makes cold-boot attacks and other memory-escape attacks so much harder, since now two totally unrelated memory chunks have to be combined in order to retrieve the private key (in fact one could argue that, the higher the quantity of memory chunks is, the harder it is to correctly decrypt the original private key; although to me this has the security through obscurity smell).
In practice however, this is rarely done. OpenSSL for example has a whole lot of memory-management functions [1], but none of them seem to include RAM encryption (AFAIK it is not present in the source code elsewhere either, but it's a large one, so I am not familiar with all code. Related: in fact some people even argue it's way too large for it's own good [2]).
Even better still, would be to use the CPU Cache for private keys, since this memory is even more difficult to access through any sort of cold boot. An interesting paper with much more information about this subject can be found here [3], to whom it may concern.
[0] https://github.com/veracrypt/VeraCrypt/commit/321715202aed04...
[1] https://www.openssl.org/docs/man1.1.0/man3/OPENSSL_zalloc.ht...
[2] https://queue.acm.org/detail.cfm?id=2602816
[3] https://www.ieee-security.org/TC/SP2015/papers-archived/6949...