Hacker News new | past | comments | ask | show | jobs | submit login
CVE-2024-4367 – Arbitrary JavaScript execution in PDF.js (codeanlabs.com)
207 points by todsacerdoti 7 months ago | hide | past | favorite | 102 comments




Accepted answer:

"Potentially, there is a tiny possibility... Keep in mind, it's a web application, and worse it can do is XSS attack"

One sad thing about security is that everyone and their uncle is a security expert.


That answer goes on to say,

"If you serve untrusted PDFs in a PDF viewer and you are hosting at your location, it is better be located at different origin than your main app, www.example.org vs pdfviewer.example.org."

My understanding is that this CVE is an XSS attack, so isn't this advice sound? The RCE portion of this CVE is for Election, where every XSS attack is twice as fun.

Is there something about his answer that is wrong that I don't see? I hardly think we can fault someone simply for having faith in the integrity of software that everyone else trusted until now.


Things have gotten so complex that it's difficult to reason about security (even for experts). This is especially true when we're talking about JS code that is running on the client and accepting untrusted input from the global Internet.

Is the origin right? Are all the security headers correctly set? Is it even possible to keep up with all the stuff that is published, today, to sort of try and secure a web app? I don't think so.

My approach is... no JavaScript (script-src 'none'). Just don't do it.


Your approach to sandboxing browser PDF rendering is to just not do it..?

You realize this particular problem requires JS to solve, right? It's not just there arbitrarily.

That's like saying your approach to web application hardening is to disable all inbound connections. You quite necessarily need those for a web application.

I can't see how this would be pragmatic or productive, or maybe it's not meant to be.


> You realize this particular problem requires JS to solve, right? It's not just there arbitrarily.

Most of the time, you get better results by just serving up the PDF as Content-Disposition: inline. 90% of JavaScript PDF viewers are crap (although PDF.js is decent), but the browser on all dignified platforms still does better than any fancy JavaScript I’ve ever seen. Loads faster, too.


I mean firefox's built in pdf reader is PDF.js. Also the desktop version of adobe reader used to run javascript embedded in pdfs by default as a "feature". I'm not sure you can get away from it at this point, but it would be nice to have the option to just use a desktop pdf reader that doesn't even support javascript or running any code embedded in the pdf. Maybe my use case is limited but I don't see the point of it. If you want interactivity you can use a website so people can expect that sort of stuff.


People don't usually use PDF.js just to render PDFs. You can also extract text to use within a web application, preview a PDF, have users sign PDFs and fill forms in the browser (where their authenticated context resides) and a host of other things.

I don't think anyone but a bored hobbyist is implementing or using a JS PDF renderer solely for rendering and the people who are using it for the aforementioned reasons don't care about load times, the document has to be processed somehow by the user agent.

The suggestion that JS is optional here is nonsense and that the built in browser PDF renderer (written in a lower level language than JS) is faster is common knowledge.

Again, I don't see how this suggestion is pragmatic or productive, but still, maybe we're not trying to be here.


From personal experience browsing websites, especially sites that display catalogues, manuals, or specifications, a whole lot of them produce PDF content and then go out of their way to render it with JavaScript. The worst use horribly performing systems that make consuming the content miserable. Better commercial systems are a step up (Issuu is a whole company that seems to be developed around doing this). Even better ones use PDF.js. But the best ones... drumroll please... just link to the PDF as Content-Disposition: inline. If users want to save it, then they click save. And this is great -- you want your users to have the easiest time possible interacting with your marketing material.

> fill forms in the browser (where their authenticated context resides)

I have never, in my entire life, seen a web form implemented as a PDF where this was anything other than miserable. Use a form, thank you very much. The less SPA magic and the fewer intermediate submit buttons, the better. (Maybe, if the actual goal is for a user to fill out a form and print the result, offer PDF.js to users on Windows who might otherwise be stuck with Acrobat. I distinctly remember Acrobat being the best PDF-form-filling software maybe 20 years ago, but Acrobat has gotten, if anything, worse, and basically every other package out there runs circles around it.)


PDF forms don't require JS.


They do if your want to fill them out in the browser coupled with an authenticated context, which is usually pretty important for forms. Usually you don't just want random users filling out any data within your enterprise.

You also can prepopulate the form with user input from previous webpages/the user account. This is how a lot of HRISes do it.

Usually people want onboarding to be as frictionless as possible. Downloading the PDF and using some editor outside the website then coming back to upload it counts as friction.

And this conversation is still far away from the point: you need JS for a JS based PDF renderer, and there are valid uses cases where one is required.


> They do if your want to fill them out in the browser coupled with an authenticated context, which is usually pretty important for forms. Usually you don't just want random users filling out any data within your enterprise.

> Usually people want onboarding to be as frictionless as possible. Downloading the PDF and using some editor outside the website then coming back to upload it counts as friction.

Wait, are you saying there's a workflow for which the most frictionless solution is to have the user fill out a PDF, on an authenticated website and submit the PDF in the browser? As opposed to, say, <form>? Can you elaborate?


Tax documents. I'm going to discontinue this conversation though.


I once had the displeasure of submitting a Form W-9 on an authenticated website. It was an unmitigated disaster involving one time codes distributed over email via a team of apparent actual people outsourced outside the US, a website, PDFs POSTed to said website, and marked-up copies of said PDFs emailed back by said outsourced team asking inane questions. And, presumably since the whole mess wasn’t actually machine-readable, the form still got entered wrong and incorrect tax forms were issued.

It would have worked much better as an emailed PDF, or a simpler form. Or an ordinary non-PDF form that would generate a filled-in PDF that the user could then sign.

> I'm going to discontinue this conversation though.

Oh, well.


Sometimes the PDF itself has to be signed. No conversion from forms. Also, tons of people would rather not have to download the PDF and email it, especially if they are on mobile. There's almost 0 usability advantage to doing that, because Adobe acrobat or whatever other pdf reader that users have and lets them sign stuff is often worse than even a JavaScript implementation. I'd much rather just sign on the browser than send an email back and then... wait or not have an immediate confirmation, or just having to deal with anything else than just pressing next.

I'm sure a lot of people here will disagree but most websites don't target the minority that prefers an email based workflow.


Or they could just implement full support for PDF. It had forms from the get go in the 90's. There is zero need to implement the functionality with JS.


> There is zero need to implement the functionality with JS.

I tend to agree outside the context of a browser, but the post is about PDF.js.


The advice is sound... mostly: there are ways to relax the different-origin nature of subdomains so you'd have to ensure that you're not using them, and some web properties have relaxed SOP by default e.g. cookies, renderer processes, ..., the public suffix list exists to try and mitigate these issues.

Frankly I'd just disable script evaluation if you don't specifically need that.


> I'd just disable script evaluation if you don't specifically need that.

This vuln works even with scripting in PDF.js disabled.


>Frankly I'd just disable script evaluation if you don't specifically need that.

And how do you know whether you "specifically need that"? As the answer says, it's not for scripting within the pdf itself, it's for optimizing font rendering. For pdfs that you don't control, it's basically impossible to know whether that'd be needed or not. Even for pdfs that you do control, in a large company it's very likely that the team that's configuring pdf.js isn't talking to the team that generates the pdfs, which means you have a similar problem.


What is wrong with that accepted answer?


For one, every security disaster starts with people listening to a random guy claiming that the probability of something being exploitable is virtually zero :)

People who have been in this game more than six months would never making such a claim.

And only XSS? What does that mean in the context of the page, or an electron app? How can this guy know "just an XSS" is not catastrophic?


I think you're being unnecessarily harsh.

First off, are we not supposed to have "random guys" writing stuff on Stack Overflow and Wikipedia? Because that's kind of how those websites work: they rely on "random guys" to do all of the writing, rather than relying on credentialed experts only. I sure think Stack Overflow and Wikipedia are very useful resources despite having "random guys" do all the writing.

Secondly, you attack the random guy for... correctly identifying that "the worst it can do is an XSS attack". This is very useful and accurate information. Information like this is typically absent from all kinds of vulnerability disclosures. When you read on the news that something something has a vulnerability, they typically they don't give you the practically useful bit of information, like what is the practical scope. Is it a 0-click RCE or is it a XSS inside a web app? They don't tell you. Except this random guy, who accurately identifies this information.

> How can this guy know "just an XSS" is not catastrophic?

"Just an XSS" is the correct description of the severity here.


> I think you're being unnecessarily harsh.

More like dogpiling and coattail-riding of the current in-focus topic. Both comments smack of smug know-betterness but are accompanied only by vague remarks and no real claims that might be subjected to scrutiny. It's almost like dogwhistling for karma.


Another sad thing is that no matter the accolades of the "security expert", the truth is that they havent been "hacked" because they or their business has nothing of value to those skilled enough to take it. "I can" doesn't mean "I want to", or "it's worth our time". Which is to say, everyone is an expert, until they are not.


That poor incorrect post in that thread from 2018.


Like I asked the other person in this thread, what's wrong with that answer?

Not only does it correctly identify the attack vector of this CVE, but I think his advice on how to mitigate it is sound. Is there something I'm missing? The only flaw I see is that it doesn't consider the implications of using PDF.js in Electron.


It's not even incorrect.

The option isn't supposed to allow XSS-by-design (which the original requester was worried about), the possibility of a vulnerability is mentioned, the impact of a vulnerability is correctly described (XSS not RCE or similar), and mitigations that would effectively limit the impact of such a vulnerability are presented (separate origin).


Arbitrary code execution, though only of Javascript, so (as far as the browser use case is concerned) the risk compared to visiting any website (other than the potential for XSS) is that the context that it's running in is slightly elevated (though still much less than having full control of your machine):

> PDF.js runs under the origin resource://pdf.js. This prevents access to local files, but it is slightly more privileged in other aspects. For example, it is possible to invoke a file download (through a dialog), even to “download” arbitrary file:// URLs. Additionally, the real path of the opened PDF file is stored in window.PDFViewerApplication.url, allowing an attacker to spy on people opening a PDF file, learning not just when they open the file and what they’re doing with it, but also where the file is located on their machine.


If you can upload a PDF and have it served on the root domain, eg, gmail.com, then you can do session hijacking and other XSS. It’s actually pretty bad. XSS used to be thought of as “not that bad”, but today it is considered pretty bad.


Yes, I'm not here to downplay the severity of XSS. Rather, I'm trying to be specific about the potential attack vector here.

If you're just viewing a PDF using the built-in pdf.js in Firefox, then (AFAIK) it doesn't matter what site you downloaded it from, because pdf.js isn't running in the context of the website, so it doesn't have access to that site's locally-stored data (including cookies). Instead it's running in the origin mentioned above, with the accompanying concerns.

So the XSS (again, as far as a web browser is concerned) would be if the site itself is shipping pdf.js for viewing PDFs inside the webpage itself. As you suggest, Gmail lets you preview PDFs, so XSS would be a concern there, but only if Gmail is using pdf.js.


How would serving the PDF on a sensitive origin help the attacker? Wouldn't they need to serve the vulnerable PDF viewer on a sensitive origin?


Right, that is correct, it is what I meant to say. I’m not sure that moves the needle much as far as risk goes though.


Wouldn't that take a separate vulnerability? This is why Google serves attachments off something like googleusercontent.com.


The risk is higher in Electron/Tauri applications, which often expose native code paths to the application Javascript, as the script executed is supposed to come exclusively from the application developers.

This too can be harderend against, but it's a significant attack vector in quite a few desktop applications if users don't update.


The impact here is XSS or possible RCE for electron.

> In applications that embed PDF.js, the impact is potentially even worse. If no mitigations are in place (see below), this essentially gives an attacker an XSS primitive on the domain which includes the PDF viewer. Depending on the application this can lead to data leaks, malicious actions being performed in the name of a victim, or even a full account take-over. On Electron apps that do not properly sandbox JavaScript code, this vulnerability even leads to native code execution (!). We found this to be the case for at least one popular Electron app.


I remember MS Teams opening various files within itself instead of launching an appropriate program. I wonder what they're using for rendering PDFs.


At least edge uses a module from word as far as I know and I'd expect the same for teams. They likely are not affected.


That's why I use MS Teams (on desktop) via browser only. Keeps it nicely sandboxed.


And if they had used PDF.js incorrectly in the browser, it would allow XSS attacks on the domain used for MS Teams.


Haven't tested but I'm almost certain the Electron app they're talking about is VS Code. Wouldn't make sense for a code editor to sandbox extensions


I don't believe they are talking about a VS Code extension embedding PDF.js but rather an Electron app that has PDF.js embedded by default. My guess is Slack.


> The PDF format is famously complex. With support for various media types, complicated font rendering and even rudimentary scripting, PDF readers are a common target for vulnerability researchers.

So still no chance in the foreseeable future for this monstrous "paper-based" mockery of docs in a digital age to get phased out?


Paper is still very much a thing in business and office work. PDFs allowing a near perfect translation between computer monitor and paper is an absolutely critical piece of technological infrastructure.


Sure, if you come up with something that covers all PS and PDF use cases.


What about OpenXPS (ECMA-388)?


I opened https://en.wikipedia.org/wiki/Open_XML_Paper_Specification and searched for "form field" and got no hits, so if nothing else the IRS couldn't use it. The Licensing section is filled with all kinds of nonsense, but I guess if it's an ECMA standard ... how bad can it be?

https://ecma-international.org/wp-content/uploads/TC46-XPS-W... and https://ecma-international.org/wp-content/uploads/TC46-XPS-W... are interesting in that they're different packaging of presumably the same data for compare-and-contrast. I will say that exploring .xps files is much easier via $(unzip) than using qpdf or friends


What's the alternative?


PDF/A. All the good bits of PDF (compatibility, standardization, encapsulation), without the worst bits (media extensions, JavaScript).

(Except PDF/A-4, which reintroduces JavaScript for some horrific reason).


The problem here was neither media extensions nor embedded JavaScript though.

It was pdf.js handling of fonts


That said, a smaller spec may help people focus on more solid code. Potentially.


PDF/A requires fonts to be embedded rather than linked, would that have saved the day?


No, /FontMatrix is part of a metadata object which can be present whether the font is embedded or external.


The worst bit of PDF since its inception (the worst since it covers the most common use case before media/JS) is that it's not a real digital document as in: simplistic digital things like selection&copy&paste are broken "by design"


>is that it's not a real digital document as in: simplistic digital things like selection&copy&paste are broken "by design"

Copy paste mostly works fine for me. I only have trouble when it's generated in a weird way (eg. scanned from a paper document then fed through OCR), or has complex formatting (eg. math equations) that have no hope of working correctly in any system. In those cases, I don't see how it's the fault of the PDF format, any more than HTML (or whatever you think is a "real digital document" format) can embed a picture of a scanned document that totally breaks copy-pasting.


How is inserting random line breaks, making it impossible to copy&paste a simple paragraph as a paragraph instead of a bunch of lines "fine"??? This is very common for regular non OCR pdfs, you don't need any math complexity

(but also math equations have plenty of hope even though they're complex indeed, you can copy&paste some kind of "latex" representation that is sometimes used to ... produce those PDFs)

> whatever you think is a "real digital document" format

whatever supports basic digital interaction we've had available to use for many decades in alternative formats, or whatever doesn't have those rigid pre-digital-paper-based layout limitations where you can't use one of your most popular digital devices - your phone - to read a doc since the phone is smaller than a sheet of paper


It's not fine when it happens, but the issue you describe is a property of the PDF viewing application more than the PDF file format (which supports semantic paragraph tags, for example). Adobe Acrobat Reader handles copy & paste well.


Of course Acrobat Reader doesn't handle it well since it's an inherent design flaw of the format despite your trying to deny the obvious. Just tried it - same issue, a paragraph of 3 lines is pasted as 3 lines

> PDF file format (which supports semantic paragraph tags, for example).

These are called newlines and have a pretty widespread support outside of some paper pockets of resistance! You only need some other semantic tags because the format fails at basics


Example PDF? Because I tried it too and it worked. Does your PDF use tags?


Any PDF from a generic google search?

Here is one from Adobe https://www.adobe.com/support/products/enterprise/knowledgec...

Or even better: their annual investor docs a team of professionals has spent time carefully preparing...

like this https://www.adobe.com/pdf-page.html?pdfTarget=aHR0cHM6Ly93d3...

(but don't look at the annual report, that marvel of a public disclosure document not only doesn't copy&paste paragraphs, but has another nice niche use of PDF - you get garbage chars instead of text, rather ironic)

https://www.adobe.com/pdf-page.html?pdfTarget=aHR0cHM6Ly93d3...


I tried a few documents and got the same result (ie. each line being treated as separate paragraphs), but was able to find that the fed FOMC meeting doc[1] actually worked properly, but only on adobe acrobat. It was still screwed up on pdf.js. So I guess the format itself technically supports it, but implementations rarely do it properly.

[1] https://www.federalreserve.gov/mediacenter/files/FOMCprescon...


The first two work just fine in Adobe Acrobat Reader on iOS. The third is garbage, probably because the producer didn't include a ToUnicode map or equivalent.

The format supports a lot that is not commonly implemented by PDF readers (or PDF producers).


How does this help me on Windows?

And a good format wouldn't require any ToUnicode maps for simple text in the first place

And poorly supporting a lot without common implementations isn't a defence against the charge of high complexity and bad design, but a reinforcement thereof

(also, no, the first document doesn't work on iOS, I select title and two paragraphs, copy, paste, and I get a single line instead of 3, so a different manifestation of the same common fail of PDFs)


“Here’s a nickel kid, get yourself a better OS”?

Still, the fact that some PDF processors can make this work shows that the format isn’t broken “by design”.



Eps, but it is not much better.



Strict validation could theoretically have helped here, as /FontMatrix is required by the PDF spec to be an array of six numbers. The exploit string was syntactically valid but semantically invalid.

Unfortunately, applications that produce broken PDFs are rife, and Postel's law sets the expectation that we should consume garbage and be happy.


Postel's law should not be applied so broadly, and certainly shouldn't be used as an argument against further validation of inputs.

Garbage inputs are the responsibility of the sender, not the receiver. You can and should accept a small margin of error in inputs where errors may logically appear, but if the receiver accepts too much error then it becomes responsible by creating a complicit norm. If the responsibility of error remains on the sender then introducing further validation is less likely to cause breakage in communication.



A good Content Security Policy [1] could prevent this as well as nullify the impact. If you're embedding a PDF in your app, you really should have one set up.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP


Does setting pdfjs.enableScripting to false in about:config protect against this? IMO, permitting PDFs to run Javascript violates the Principle of Least Astonishment:

https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...

Common sense suggests PDFs are the digital equivalent of paper documents. Paper documents can't run Javascript, so PDFs shouldn't either.


No. From the article:

> You might be surprised to hear that this bug is not related to the PDF format’s (JavaScript!) scripting functionality. Instead, it is an oversight in a specific part of the font rendering code.

It goes on to explain that pdfjs dynamically constructs and executes javascript functions as an optimization for rendering older fonts. Certain arguments pulled from the PDF were not escaped, validated, or delimited (the values were expected to be numbers), so you could inject arbitrary JS. (At least that's how I read it.)


> JavaScript-based PDF viewer maintained by Mozilla. This bug allows an attacker to execute arbitrary JavaScript code as soon as a malicious PDF file is opened. This affects all Firefox users (<126) because PDF.js is used by Firefox to show PDF files, but also seriously impacts many web- and Electron-based applications that (indirectly) use PDF.js for preview functionality.

I don't use Firefox to view PDFs. But it seems that the new MS Windows echosystem might be affected.


>But it seems that the new MS Windows echosystem might be affected.

Why? Does the microsoft pdf viewer use pdf.js internally? Edge at least is based on chromium, and chromium AFAIK uses pdfium rather than pdf.js.


> but also seriously impacts many web- and Electron-based applications that (indirectly) use PDF.js for preview functionality.


Which parts of the "new MS Windows echosystem " does this apply to?


Is Android Chrome the same Chromium engine as desktop? I don't know much about the mobile browser internals.


I recently got a couple of dodgy emails with an invoice.pdf or equivalent. Never seen that type of email before, I thought it prudent not to open the PDF

Wonder if it was related...


While pdf 0days do exist, they're patched fairly quickly once known and therefore attackers are likely not going to waste them on non-targeted attacks. Assuming you keep your software up to date and aren't a high value target, you probably don't have to worry about pdf 0days. The spam you're getting with pdfs are likely using pdfs because it's easier to evade spam filters, not because there's a pdf 0day embedded inside.


> This affects all Firefox users (<126)

This made me chuckle


I guessed this is a type of XSS but it seems not. The TL;DR is a bit vague on the impact. It says "This bug allows an attacker to execute arbitrary JavaScript code as soon as a malicious PDF file is opened" but PDFs can already execute arbitrary JavaScript as a feature (as noted in the article).

Hidden in some paragraph it does say

> Instead, PDF.js runs under the origin resource://pdf.js. This prevents access to local files, but it is slightly more privileged in other aspects.

Seems like it's not an XSS letting you take over the website origin, but it lets you run JS under this resource://pdf.js origin. Could be an interesting vector when combined with other weaknesses, but not an instant knock out as I expected when I read the title and saw the points :)


Original author here. This is indeed a bit confusing.

You are right for the case where Firefox's PDF.js is used (local or remote file in a tab or iframe). The XSS problem however is with web-applications that themselves use PDF.js. In that case, it does not run in a separate or special origin; that is a Firefox thing.

You are also right that the PDF format supports JavaScript, but that is something unrelated to this, and indeed highly sandboxed in all cases.


Thanks for the explanation! That makes it more clear. Nice research and thanks for the reply.


> This affects all Firefox users (<126)

I suppose/hope this is version number and not a dig at the market share


126 was released just on 2024-5-14. So the majority of FF users especially those who get FF from a more stable package manager repo.


I'd guess reasonably confidently that the majority of firefox users, are on window or mac, got firefox from the web, have auto updates enabled, and are already on 126 (they've had a week to update at this point).

With the exception of LTS releases, if you haven't got firefox 126 yet because you're on a "stable" package manager, I'd encourage you to promptly download firefox from mozilla.org (which will come with auto-updates) and uninstall your package managers insecure version. Given the state of the web and software security web browsers aren't something you should be delaying updating by a week.


>With the exception of LTS releases, if you haven't got firefox 126 yet because you're on a "stable" package manager, I'd encourage you to promptly download firefox from mozilla.org (which will come with auto-updates) and uninstall your package managers insecure version.

Which distros have this problem? AFAIK debian-based distros (eg. debian, ubuntu) package firefox ESR which is kept up to date with security patches.


At one point I realized Arch's firefox was greater than a week out of date and I promptly did exactly that. I don't know if it was a regular occurrence or something weird with that release though.


Nixos have this problem a bit. I didn't rebuild my system in a while and my Firefox is really old at this point. Well, time to update my system.


The poster was joking that it looks like there are less than 126 Firefox users


Hahaha, I (OP) am actually a Firefox user myself! So this was I guess just poor writing on my part :(


Given the abundance of desktop electron apps, it seems maybe irresponsible to publish this blog post (with an advertisement for Codean in the middle (which itself was at least tastefully done)) only 6 days after the fix was released and the CVE was published.

Yes, a fix landed in Firefox, but the vuln is in pdf.js, and now I’m giving the ol side-eye to the four or five electron apps I have running.


https://security-tracker.debian.org/tracker/CVE-2024-4367

It's already fixed in Debian stable (firefox-esr version 115).

It's fixed by default in FF 126+. But, as I understand it, older versions like the one in Debian stable, can be (and are already) patched.


    $  sudo snap refresh firefox 
    firefox 126.0-2 from Mozilla refreshed
I'm not always happy about the snap mechanism, but this time I'm glad about a quick release/packaging channel. Kudos to the firefox snap maintainers!


Right. But you also get a recent one if you just use the Mozilla PPA, without snap. Debian sid also ships 126, and I assume the fix should be backported to Firefox LTS, which is used by stable.

I don't see the relevance of snap here.


If you had to manually refresh it and were on a vulnerable version until now, snap failed to get you into a secure state for about a week after the release of the fixed version.


How is that any different from any other package manager? The sandboxing features of snaps have no major role in how easy it is for a publisher to update a package in one of their repositories.



This doesn't make any sense, this vulnerability is in the context of browsers, not server side runtimes.


"Web browser executes javascript" is not exactly shocking. Every single browser in it's default configuration does before the proper NoScript (or NoScript-alike) is added. This doesn't seem like a significant vulnerability since it is the default mode of operation of modern web browsers. Use a pdf reader if you want something moderately more secure.


The problem is embedding a browser into an app that may not maintain the same security boundaries as a “proper” web browser.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: