I do think that "Face Screaming in Fear" is indeed the only appropriate reaction...

zerocrates · on Nov 1, 2022

The fact that Java allows and parses Unicode escapes just everywhere in source is a real surprise. That it's the same syntax you see other languages use for string-literal-only escapes and it's the same as Java's own other character literal sequences really adds to the surprise.

Is there any non-sneaky usage of this feature out there in the world? I suppose given the massive amount of Java, there probably is.

bazoom42 · on Nov 1, 2022

When Java was introduced many text editors did not support unicode directly. Escapes allows you to use unicode identifiers with an ascii-only editor. You probably wouldnt want to define you own non-ascii identfiers, but a third-party library might have them.

cxr · on Nov 6, 2022

You can escape identifiers in JS, too. But what you can't do is escape just _any_ source character like the way it's defined in Java and used in the example. Writing `foo = \u0022bar\u0022`, for example, is invalid.

usrusr · on Nov 1, 2022

Fixing the parser as in properly switching back from comment coloring to code coloring? Won't tell you anything unless you look, and you might still miss it. The linter will ring alarm bells that someone is trying to hide code even if you never had the file in question on the screen.

I feel tempted to bring up the counter argument "but someone should look at untrusted code thoroughly enough to not miss it anyways!", but that would be like abolishing safety belts and airbags because surely people would drive more safely.

lolinder · on Nov 1, 2022

The only reason why this makes any sense as a threat vector is because Java code highlighting parsers and javac work differently. Sure, someone can still try it, but if they knew that every code highlighter would render it correctly they wouldn't bother.

usrusr · on Nov 2, 2022

I'd assume that there are plenty of editors that just do general C-style comment highlighting without diving into java-specific Unicode escapes. As long as you can't be sure that no bad actor could ever consider this obfuscation a net positive for their goals, the lint approach remains worthwhile. Worthwhile because its cost is so low.

lolinder · on Nov 2, 2022

True. I'm not saying the lint is a bad idea, just that fixing the parser is a better one.

cbhl · on Nov 1, 2022

My understanding is that the highlighting parser _is_ fixed to intentionally mis-parse to guarantee reasonable runtime complexity, and the lint check is a band-aid on top of it for a common type of adversarial input.

If you do the full parse you can end up with adversarial inputs that result in cubic or exponential run-time complexity (see, for example, Pygments CVEs for comparable examples in this domain).

cxr · on Nov 6, 2022

Correct handling of Unicode escape sequences isn't something that's going to increase time complexity to cubic-or-more.

kllrnohj · on Nov 1, 2022

Maybe jetbrains didn't want to fix the IDE parser for some reason? Although there's still value to that lint warning as lint isn't coupled to the IDE. So you'd get value out of that warning regardless of if you're using Android Studio or not, as well as value in CI or similar.

origin_path · on Nov 2, 2022

Android Studio is a fork of IntelliJ, Google don't seem to contribute much stuff upstream in general. It seems to be more like, JetBrains pull from them sometimes when it's possible.

slaymaker1907 · on Nov 1, 2022

I'm guessing it's because handling escape sequences like these would have to be done in a separate pass over the source code. The escape sequence could be part of a separate token so you need to first resolve these escape sequences and then do tokenizing.

moffkalast · on Nov 1, 2022

And is also the appropriate reaction to using Android Studio.

tadfisher · on Nov 1, 2022

I see you didn't use the Eclipse plugin (with Ant scripts, yay!).

Dang I'm old.