Hacker News new | past | comments | ask | show | jobs | submit login

Usually security nuts like to override the clear-text string with zeros or random characters before calling free() on it. This way, if this chunk of data stays in memory (which is most likely the case with libc's free()) it cannot be read by exploiting a buffer overflow.

With garbage collected language, programmers don't know when their variable is "free()ed", since it could be held in multiple thread, and the last thread dying will release the memory for this variable. Since programmers often don't know when the password variable will be "free()ed", it is very unlikely that they scrambled the password before releasing the memory. Thus, leading to the password being kept in some region of the program memory, in clear text, exploitable by a buffer overflow & co. for an indefinite amount of time.




Another thing to note about garbage-collected languages (like Java with the G1 collector) is that blocks of memory will often be *copied* to other parts of the physical memory as part of compaction phase [1] (see quote), in order to be able to provide large segments of sequential space. Since the GC is designed with performance in mind rather than security, the original copies are not zero'd as part of this compaction, and instead their raw data will potentially stay in memory indefinitely until overwritten by another part of the application, and will not be zero'd upon termination of the application. Because of this, even with meticulous tracking of references and "raw" memory access only, GC'd applications can leave traces of secrets on memory. There may be workarounds for this by exclusively using "Unsafe" methods, but that drastically limits your API-interoperatility with libraries and such.

> G1 reclaims space mostly by using evacuation: live objects found within selected memory areas to collect are copied into new memory areas, compacting them in the process. After an evacuation has been completed, the space previously occupied by live objects is reused for allocation by the application.

[1] https://docs.oracle.com/en/java/javase/18/gctuning/garbage-f...


> Usually security nuts like to override the clear-text string with zeros or random characters before calling free() on it.

If you are worried, you can store the password in a byte array and zero that out.

But further, a buffer overflow is practically impossible with a GCed language (especially a popular one). A programmer using a GCed language cannot write code which has a buffer overflow. That must come from a bug in the runtime itself. Not likely.

As far as I'm aware, every gced language has bounds checks on arrays. Certainly all the popular ones.

> the last thread dying will release the memory for this variable.

Really not how GCed languages work. Memory lifetime is not bound by thread lifetime except in the rare case when memory is bound to a thread (static/global variables).

The majority of GCs on the market are tracing collectors. Memory is periodically collected when it is not longer referenced (mark and sweep). What triggers that collection is a whole host of potentials interactions. How frequently and deep it runs also depends on a lot of application characteristics.


> If you are worried, you can store the password in a byte array and zero that out.

if it's a copying collector this is no guarantee of anything

but the same is true of general memory allocated with the OS (swap/THP/...), unless you use something like mlock()


Fair point, but really one of those situations where if an attacker is in the position to look at a password stored in a tombstone, you've got bigger problems (for example, would be trivial in that scenario to simply intercept the password rather than looking for it in VM memory).


Could it still add some defense in depth for other scenarios?


> But further, a buffer overflow is practically impossible with a GCed language (especially a popular one)

Because the JVM had no buffer overflow? [1] Also I purposefully wrote "buffer overflow & co" because buffer overflow are not the only possibility. Shellcodes could ptrace() and inspect the memory of the program.

> > the last thread dying will release the memory for this variable.

> Really not how GCed languages work. Memory lifetime is not bound by thread lifetime except in the rare case when memory is bound to a thread (static/global variables).

That was bad phrasing from by part. You should have read "the last thread releasing the variable reference".

[1] https://www.cvedetails.com/vulnerability-list.php?vendor_id=...


> Because the JVM had no buffer overflow?

Happens extremely rarely and is frequently not in JVM core code but rather something like the 2d renderer or applets. Code not likely to be executed on a server.

Take a deeper look into those CVEs and count how many are for Java 8+ and server code (it's a pretty short list).

You might as well argue the linux kernel is insecure because there's been buffer overflows in the various drivers.

> Shellcodes could ptrace() and inspect the memory of the program.

Certainly, and they can redirect socket traffic and inject a MITM for any process to directly intercept a password. Even if you are zeroing memory, there will be a period of time when a password is present in memory which means the ptrace attack also works with C.

The bad part of a managed language is that passwords stay in memory for longer, but that risk is somewhat moot considering exploiting requires a compromised system. In which case, there's little reason to pull out passwords by sniffing memory.


By checking if things are referenced or not as the basis for collection is a bad idea.

Take 2 classes that reference each other, but nothing references either.

In your example they won't get cleaned up. You'll want a tree-based collector. If an object is no longer referenced from the tree it can be cleaned up.


Funnily enough, if you are using the clipboard to enter your passwords most of this is pretty moot.


>With garbage collected language, programmers don't know when their variable is "free()ed", since it could be held in multiple thread, and the last thread dying will release the memory for this variable

And there is simply no way to deal with this, technology just isn't there yet.


> it cannot be read by exploiting a buffer overflow

What does the attack that does this actually look like? Lastpass reads text off an html page and decides when to inject auto-fill prompts and/or enter in a password. Is it possible for a buffer overflow to be exploited there, that lets an attacker (who controls the site) gain access to a password for a different site? How does that work?

Is there some other attack possible here? Ie: is it possible another user-space application can read passwords from memory used by the password manager? How does another app know it's a password? How does it trigger a buffer overflow?

(I'm ignoring apps running with kernel or privileged access: that seems like game over)


CPU side channel attacks are a possibility. In the worst case, it could let an attacker cross OS process boundaries, making it exploitable remotely through Javascript executed in the browser. Although such vulnerabilities would be hard to find and even more difficult to exploit on scale, the possibility seems more realistic than it did pre Meltdown & Spectre. It would be nice if something as high stakes as a password manager was prepared for this kind of scenario.


You should be able to know all the references to a variable, if you are careful when writing your program. Also, you could write a destructor that scrambles the memory location before the object is collected. However, you would still not have control over copies that the GC may decide to make and it is a bit trickier to force a free, since the GC is in no obligation of freeing an object as soon as it has no more references (though you can probably force that to happen in most GCed languages).


In most GC languages, the String type itself is not a managed handle to a single memory buffer, but a reference into a whole copy-on-write datastore. So you can't even scramble the underlying memory - mutating the string to erase it will not zap the original, it will just create a copy.

Your best bet is to ensure no references to the password string exist - including in library code you may use, which means constant revalidations whenever you update a library or your underlying language runtime. Once you do that, you can force garbage collection, either by some explicit language mechanism to request garbage collection, or by trying to allocate gobs of memory in some way that can't be easily optimized out.


You don't need to use the String type for your sensitive data, though. You can create your own type that acts the way you want.


There are strong incentives to use the built-in String class. APIs for UI will use it. The first google result for "c# Clipboard" certainly uses string. It'll be a lot of extra effort, that you might not have with a different language. That is, if actually you consider the risk worth it... i probably wouldn't.


To avoid the risk outlined here, knowing all the references to a variable is not sufficient; without an understanding of their lifetimes in every possible permutation of the program's use (and misuse), you cannot act on that information to minimize the exposure posited here.

I won't get into the question of whether a programmer should also know all that, as there is another consideration which renders it moot: Even with a non-GC'd language, a programmer could leave variables with sensitive information in memory for the remaining duration of the program (e.g. local variables left deep in the stack when a function returns or throws an exception - do you know when they will be overwritten?) What matters here is that the programmer understands the risks, knows what constitutes the most sensitive data, and acts accordingly - but, armed with that knowledge, the programmer can just as well ameliorate the risk in a GC'd language as one that is not.


> You should be able to know all the references to a variable, if you are careful when writing your program.

The issue is: what if thread A and thread B hold a reference to the password variable? And you don't know in which order they will execute. In which thread do you scramble the password before releasing the variable reference?

From what I understand, OP's point is: because of the nature of C, you have to know where your variable is "free()ed", because you have to do it yourself. Therefore you can scramble there. Even if you have two thread doing:

    n = decr_reference(password)
    if (n == 0) {
      scramble(password)
      free(password)
    }


That issue seems the same in a GC language as it is in a non-GC one. Either way you need to know when you're done with the data and it's time to scramble.


One possibility would be to reference a password variable always from t he same thread. But even if you use multiple threads referencing the same memory location, you could make some of them weak references (which the GC doesn't take into account when deleting the object) or you could make sure you stop referencing them in both threads and force the GC to run on that object. The destructor could handle the scrambling.


> Also, you could write a destructor that scrambles the memory location before the object is collected.

This assumes you can write destructor at all and that it's run deterministically.

> though you can probably force that to happen in most GCed languages

I'd actually be surprised if any GC'd language allows this. You normally need to resort to different kinds of tricks for RAII-like behavior.


Python has `gc.collect` which should work for user created types. Besides, it does have a reference counting GC. So, as long as you don't create cycles, you should be able to force the collection of objects similarly to C++'s shared_ptr.


Java has try-with-resources for this. Python has with. Unlike say C++ you do have to indent your code once for every such resource though which can be cumbersome.


Neither of these things are equivalent to destructors, however


They aren't equivalent but they can do the task in question:

   with prompt_password() as password:
     stuff
Once the scope exits, the context manager can scramble the memory.


Nono, a lot of GCs copy the whole known variables to a new area, then delete the old area in bulk (“but it consumes twice as much memory!?” yes, but it’s very efficient - doesn’t require tracking). Therefore you may have a constant char[] and it could still be in 2 places in memory.


An interesting point and I wonder if there are thoughts on if the JVM should implement flags that do overwrite before gc (obv worse perf but some cases have use for it).

Or at the least, let a program implement parts of the gc api and plug it in themselves.


You can with the JVM.

https://docs.oracle.com/javase/9/docs/api/java/lang/ref/Clea...

However, the issue isn't so much that doing that, rather it's that GC will potentially never collect the memory. In the JVM, the GC primarily runs when enough allocations happen. It's not on some timer.

So the concern would then be having a password sitting in memory for hours (or even days) on end.


Typically you would be using something like JCE for these sensitive functions, and a JCE implementation could be written (potentially with VM support) to have such functionality. These also can do things like zero out binary data/keys after use.

This wouldn't help once you create your own objects however, e.g. JSON parse from a decrypted block.


There are definitely mechanisms to do this in the JVM, such as the `finalize` method or in more recent versions, the `java.lang.ref.Cleaner` class.

However, using them correctly in the context of memory leakage and safety is non-intuitive and is definitely a bit tricky.


they don't know when it's going to be freed, but they do know when they're not going to need it anymore and can overwrite it exactly like in C. or am I missing something?


Yes. Except unless of course you're not certain if the buffer you have your secret in was copied at some point. Maybe by the text control you rendered it in.


Can we not simply overwrite the the data when we're "done" with it and then not worry about when it's actually unallocated by the GC?


Depends on the type and language. For example, in Java strings are immutable. Therefore you can’t really write over it.


That's fair, a lot of GC languages use immutable strings. That said, all(?) of them have some form of mutable buffer, albeit some care would have to be taken to avoid accidentally turning it into a string.


Then for sensitive strings you wouldn't use the built-in string and have to build your own?

I'm not sure I see the issue, you just implement a memory-safe string class that manages arbitrary bytes, it's slower but that wasn't the goal.


I can’t remember off the top of my head which major Java library I was using a few years ago but that’s exactly how it took sensitive string params - via a char array instead of a String object. I was scratching my head why they would do that for a bit until I learned.


The GC may copy data around, though. So, when you scramble the data there may be already be copies of the previous value in memory. I think that's the biggest drawback.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: