The string that is vulnerable is actually not meant to be user input, but a formatting string.
EDIT: nevermind, the issue apparently arises outside of formatting strings—though it would have been nice if the example had demonstrated this.
The issue occurs in incorrect logging code such as:
> logger.info("Data: " + data);
But the correct way of logging the above data is:
> logger.info("Data: {}", data);
It's analogous to using something like the following in C:
> printf(data);
In the incorrect cases (log4j or C), the user input is being used as a format string, and the user can likely cause an RCE. This is an issue in C for reasons that should be obvious. Java has historically been used very reflectively, so whenever there's some expression interpreter or deserialiser involved, there's a good chance it could be RCEd with arbitrary input.
It was a few months ago, but if I recall correctly, there were two overrides for info (and the other equivalent methods). info(String, String...) would do {} expansion like you mentioned, but info(String) would log the string directly, not doing format expansion on it.
I'm not sure how this interacts with the RCE issue reported here.
EDIT: That's because I was thinking of Slf4j, which has additional smarts here.
I'm not aware of that feature, but I guess it would mitigate this issue, since the problematic code:
> logger.info("Data: {}");
would effectively turn into something safe:
> logger.info("{}", "Data: {}");
And the issue would only arise if someone mixes the two patterns:
> logger.info("Data for " + username + ": {}", data);
Overall, I don't like the sound of that feature, since it blurs the line between correct and incorrect use of the logging API. The first argument should always be a constant formatting string.
Why though? It is not typical in Java to use format strings, unless you call String.format explicitly. It's not like C where printf-style APIs are common.
It should work one way or the other, not both. For current logging APIs, the format string is used. It actually turns out that the call using string concatenation corresponds to log4j1 rather than log4j2 (looks like this was an error in the post, though it's being fixed to use log4j2).
I guess aesthetically you could argue either way, but I think the main purpose of the formatting string method is that you can write:
> logger.trace("Updates: {}", longListOfUpdates);
and if trace logging is disabled (which can be done dynamically), it's not going to invoke `longListOfUpdates.toString()`, which is what happens when you perform string concatenation. If it didn't work that way, I suspect people would end up writing extra `logger.isTraceEnabled()` conditions around their logging code.
EDIT: nevermind, the issue apparently arises outside of formatting strings—though it would have been nice if the example had demonstrated this.
The issue occurs in incorrect logging code such as:
> logger.info("Data: " + data);
But the correct way of logging the above data is:
> logger.info("Data: {}", data);
It's analogous to using something like the following in C:
> printf(data);
In the incorrect cases (log4j or C), the user input is being used as a format string, and the user can likely cause an RCE. This is an issue in C for reasons that should be obvious. Java has historically been used very reflectively, so whenever there's some expression interpreter or deserialiser involved, there's a good chance it could be RCEd with arbitrary input.