Hacker News new | past | comments | ask | show | jobs | submit login
Xerox scanners randomly alter numbers in scanned documents (2013) (dkriesel.com)
187 points by gurjeet on June 13, 2023 | hide | past | favorite | 60 comments



Even though this happened a long time ago, whenever I hear or think about it, I am a mazed that it didn't put Xerox out of business, or at least hurt a little more. After all, some big players were already doing digital archiving at the time. :-/ BTW, the CCC had a pretty neat presentation at that time as well: https://www.youtube.com/watch?v=c0O6UXrOZJo


PRISM didn't kill companies, Apple home button scandal didn't touch it, sony infecting its users with a rootkit didn't make it sell less playstations, HP lying cartridges had no impact on its long term business...

Fines and green washing are budgeted and only a small fractions of the profit they make by misbehaving.

On this very site you can read regularly people telling you we should forget about the MS of the past because now they are good guys. So they can get away with corruption, sabotage, monopoly abuse, insulting the competition, stealing, lying, patent trolling, etc. No problem. Bill Gates is a wonderful person and a great philanthropist nowadays.

The corporations and billionaires have found the recipe to get away with everything. The humans were always hackable.


Not so sure about HP. I mean yeah, their inkjet ploy still pays for itself I suppose, but at this point I consider home photo printing as basically dead, killed by greed, and those printer divisions as little more than autonomous external marketing divisions for those ubiquitous b/w Brother boxes.


I can see the other things but "Apple home button scandal" either does not belong in the same league or there might be something I do not know about. Can you expand a little bit on your thinking?


When Apple introduced a fingerprint unlock in the home button, it wanted to keep the fingerprint scans secure. The security chip that stores the fingerprint scans needs to verify that the home button's fingerprint scanner is trustworthy, to prevent man in the middle attacks.

However, when an unauthorized or unofficial button is used as a replacement for repair, the phone will permanently brick itself. No warning is given that the fingerprint scanner's trustworthiness can not be verified, no ability to just use the phone with the fingerprint scanner disabled. Just straight to a permanent bricking.


IMO it wasn’t nearly as egregious as the other examples. I only defend them because they didn’t do this when you replaced the screen etc.

You don’t want phones to work if someone swaps out that specific piece of hardware without your knowledge. Bricking the phone forever makes it harder for people to find back doors around that security feature as they would risk large numbers of expensive phones. Presumably people developing replacement fingerprint readers would notice the issue before most customers where harmed. Further, anyone actually harmed would have gotten hardware from an very untrustworthy source.

They reversed course after a backlash, but I can see an argument for them standing their ground on this one.


As a user, that’s what I’d want it to do. If someone is trying to bypass the fingerprint sensor by replacing it because they know that’s where the authorization is stored, that’s exactly what I’d want the phone to do.


The fingerprint scanner is just a scanner, it doesn't handle authorization, that's what the security chip does. The scanner has two ways of communicating with the security chip. It can authenticate itself with the chip, and it can send the chip images of fingerprints.

If a compromised scanner fails to authenticate, then the security chip can just ignore the scanner. Not much it can do if its only avenue of communication is cut off. A warning message telling users to not touch their compromised fingerprint scanner would have been sufficient.


You want your whole phone bricked by an update when it worked before, even though they can just disable the fingerprint scanner instead?


You are assuming it is fine to swap authentification hardware for incompatible parts? I guess this is from the spirit of "right t repair". While I get the idea in princple, I still think going dark is the best option you have if essential hardware was apparently tampered with. Find a back-alley smartphone shop which at least swaps your FP reader with compatible hardware. But if someone gained access to my phone, and put a piece of hardware in which is not recognized by the OS, I want it to stop right there. That doesn't feel like bricking, more like a security feature.


This attack scenario doesn't make any sense. If your phone is out of your sight and unsecured for long enough to take it apart and replace the fingerprint sensor, it's unsecured and out of sight long enough to be entirely replaced by a clone that will steal all your credentials and send everything to whatever bad guy you are imagining


And it won’t work anyway because the phone will detect and reject the sensor and just fall back to PIN authentication which is how it worked before the update


Ah, I see the use case now-where you get it replaced by a 3rd party or buy a stolen phone, do you want it bricked by a software update? I don't know. I don't know that I care much about that use case TBH.

What I don't want is this: someone steals my phone and then replaces the fingerprint sensor and has access to everything, including the ability to reset and resell the phone.


That’s not possible anyway because the phone can detect and reject the replacement sensor. If it couldn’t then how would it know to brick itself? Instead it should just fall back to PIN authentication, which is actually more secure and how it worked before the update


Not just bricked but permanently and securely wiped, would be my preference.


You want your own phone that you paid money for wiped and bricked remotely at random without your permission while you’re using it for no security advantage whatsoever (since it can just fall back to PIN authentication which is actually more secure than a fingerprint) until you give Apple money to “repair” it?


Nice how some people try to justify Apple here.

I think the problem lies in this point:

>No warning is given ... Just straight to a permanent bricking.

There should have been a warning, at least, but there was none.


The the home button scandal I'm talking about is older than that.

Circa 2010-15, the iphone home button was having more and more problem: https://osxdaily.com/2011/12/22/iphone-home-button-not-worki...

For a while, half the users were just using a software button as a workaround: https://osxdaily.com/2012/07/02/broken-iphone-home-button-as...

It was funny to see everyone with a very expensive phone just moving around this fake button everywhere on their screen because their physical button was not working.

But then people stopped finding it funny.

Because Apple said it was a hardware problem, and said the fix was to buy the next iphone generation.

However, a few weeks later, jailbreak iphone received a patch from the community fixing the home button problem, showing not only that it was a software issue that was easy to fix, but Apple just conned their entire customer base.


I guess part of the problem with boycotts is that it's very hard to know who will not buy your product because of e.g DRM, and who won't buy it because they're not interested in it.

I haven't bought a single Sony product since the rootkit fiasco, but I've never been able to tell them this directly. I don't buy products with DRM in them. I just don't. I'm not going to be appearing in the balance sheet, however, and won't appear in the calculus of lost growth for an MBA.

The only time I think boycotts do work is when they cause a direct negative action to a company's bottom line that is sufficiently obvious that an accountant can see it. Long, slow protests -- such as exist against Nestlé for example -- are worthy but likely for the birds.


Oh they have an entire department dedicated to understand what people think about each product, so they would know.


Market research departments notwithstanding, companies can be quite oblivious about why their products are bought and, especially, not bought.


Well said, I couldn't agree more, unfortunately... I also noticed the "MS is cool now"-cult, felt surreal and almost like deliberate misinformation. Classic counterpart to FUD, Honeypotting. However, more people seem to fall for that because it is less known, and games the "assume good intentions"-virus.


ony infecting its users with a rootkit didn't make it sell less playstations

They sold atleast one less PS3. I caved and let my kid get a 4 and 5 when they came out. But he's gotta keep them at his mother's house.


What's even more amazing is that it wasn't retracted from the PDF standard.

Everyone knew how dangerous it was, changing digits, but it was left in the standard and in many implementations.

Years and years later, someone implements a turing machine using it, hacks journalist's iPhones, and the question is: why did modern systems implement it?


> implements a turing machine

Just searched for some details, it's amazing, all this inside of a JBIG2 image:

"Using over 70,000 segment commands defining logical bit operations, they define a small computer architecture with features such as registers and a full 64-bit adder and comparator which they use to search memory and perform arithmetic operations."

https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-i...


> changing digits, but it was left in the standard and in many implementations.

The format has a lossless mode that would have saved Xerox a lot of trouble, of course it was just that mode that was used to hack the iPhones.

Also for many texts you don't need every digit to be correct, hell any single typo, smudge or bad font can fuck you over if that is the case, no need for a Xerox. Ideally your numbers either include a checksum or have some other redundancy, like requiring that they are written out as words.


Maybe, these – the demise of Xerox not happening and the size of the players involved – are related? As in: none wants their archives and data bases put in question? (Meaning, in order to claim damages, you'd have to admit that there is damage, indeed.)


But Xerox has admitted that the character substitution was enabled on all compression levels even though they claimed it was only active on one specific level.

So theoretically, everyone who archived during that time using Xerox WorkCentres has to question their documents.


Which might be exactly, why nobody wants to talk about this. (This may be especially nasty where archives had been digitized and then the analog originals were disposed of. What can you even do about this?)

PS: In order to have learned anything from this, there should be a rule for the digitization of archives containing core documents to use (at least) two unrelated scanning technologies, with unrelated processors and unrelated software from different sources (much like Airbus does it with flight computers). A sole, single scan should attribute to nothing.


> In order to have learned anything from this, there should be a rule for the digitization of archives containing core documents to use (at least) two unrelated scanning technologies, with unrelated processors and unrelated software from different sources

But what do you do in case of conflict? There's no way of resolving it, so for this type of thing to be more effective, you'd need 3 different scans, and a quorum of 2 to decide the real version.


Knowing that the data is unreliable is something already. Then, you can apply heuristics. And, if there is a systematic error, you should be able to detect this, eventually. (E.g., what is the unreliable series, what is the bug and when is it triggered?) But, yeah, this is why I wrote "at least".


I'm pretty sure that Kriesel said in his C3 talk that at least one company called and told him that they scan and dispose of the originals. You can't do anything at this point. You're probably right – they rather just want to forget about it because the only correct solution would be to scrap everything.

My original comment was more aimed at the point that damage would need to be admitted. But they did admit it, which makes it evermore confusing why nothing really happened after that, at least publicly.

I don't think that TMR or similar overhead would be practical. Just ban compression algorithms that do pattern matching and substitution for archival purposes (like the German BSI did).


There's a scene in the Cryptonomicon, where Goto Dengo's (a lieutenant in the Japanese Imperial Army in WWII) convoy is sunk by a US air attack. This involves a new weapon, which also requires a crucial change in tactics – and he knows that the Imperial Army has lost the war: this is not how the Americans are supposed to attack, how could this happen? Notably, this requires that several high ranking figures observe, comprehend and admit that their tactics are ineffective, that they are wrong – and invent and implement something new and adjust their tactics, instead, involving the entire chain of command. This shouldn't happen. They must have all killed themselves. No warrior of principles would admit to such a fault, commit to the shame, and change tactics in the midst of the war…


Given that science progresses one funeral at a time, the practice of seppuku would seem to allow the Imperial Japanese officer corps to evolve their tactics more rapidly than most.


The problem being that many of our big corporations and institutions are much like an Imperial Army, but without the culture of seppuku. They have overrun considerable territory and established the Greater Welfare Zone, including its glacis and satellites, by adhering to concise, even gruel tactics. This works. This is how it works. If we just stick to the tactics and principles, this will continue to work. This is how we do things… Admitting fault and accepting shame, just to veer off proven and established procedures, isn't only a serious danger to your career, it also endangers the greater good, we've all committed to… There's too much at stake. (Compare how we are all just forced to throw out even the basics of AI safety out of the window. There's no alternative. Let's suppose that Banzai attacks do work…)


> A sole, single scan should attribute to nothing.

That's not much less than a sole, single repository of paper documents that may not be climate controlled or fire protected (or the fire protection may be sprinklers which will ruin most of the documents anyway).

Document digitalization is a cost saving program, but it doesn't save money if you need to have two vendors, validate the vendors are actually separate and remain actually separate, have a comparison process and an exception process. If the requirements are too high, paper remains and nobody uses the documents because the retrieval costs in labor and time are too high.


The youtube link is David presenting his work that he describes in the originally linked article. One of the legendary presentations in CCC's history.


This was an amazing talk. He did a great job keeping it entertaining.


too big to fail


This can happen in some compression modes of DjVu as well at high compression factors, where the background and foreground is separated and the foreground (text, usually) is split into glyphs that can be shared by different instances. Mess up the recognition and the letters on the page appear literally different in the compressed "oulput".


Using OCRmyPDF, I applied lossless JBIG2 compression to a scanned book, after some consideration.

* the OCRmyPDF docs point to the JBIG2 Wikipedia page, and the Disadvantages section - https://en.wikipedia.org/wiki/JBIG2#Disadvantages - so it's easier to avoid this bug

* I'd hoped OCR would fall out of the process, but nope

* from the Wikipedia page, huh, the Pegasus malware exploited iOS's implementation of JBIG2


This is basically the same thing Samsung and friends doing to their camera app.

Expect, in Samsung's case, some user love this.


The issue was, that a small font sizes and default print quality settings, the caching mechanism would consider some digits like 0 and 8 to be similar enough, to use the cache - which would end up switching numbers around.

This finding did completely invalidate the financial book keeping and backup of tax records of many companies.

I sure hope nobody uses the Samsung camera app to file their taxes and keep their books...


I'm sure many people do. There are tons of apps to scan and parse invoices. Makes sense as the current mid range phone cameras work just as well as mid range scanners but don't take any space and do that job an order of magnitude faster.


> I sure hope nobody uses the Samsung camera app to file their taxes and keep their books...

A lot of people take a photo of receipts for their records.


No, it really isn't. Xerox had a bug in the image compression algorithm. Smartphones use neural networks and multiple exposures to achieve picture quality far beyond what the hardware is capable of. This is very different technology for a very different purpose.


It is the same "let's use image from something else to fill in this gap and hope nobody would notice" mindset.

xerox use image snippets from same document, samsung use something the their "AI" made up. Both of them are trying to be smart and replace stuffs with something "similar"


It's not quite the same, in that at least the false numbers come from elsewhere in the document being scanned. Samsung is pulling data from elsewhere on the internet to "correct" images.


Other recent related submissions/discussions:

https://news.ycombinator.com/item?id=34815391 - "Xerox copier flaw changes numbers in scanned docs (2013)" (theregister.com), 36 points, 3 months ago, 8 comments

https://news.ycombinator.com/item?id=32537073 - "JBIG2 Undetectable Data Corruption: Destroying Our Past, One Character at a Time" (circuitousroot.com), 67 points, 9 months ago, 34 comments


One of the more amusing parts of this blog is that it contains at least two typos: "arrors" and "ancoding" and I can't tell if they were on purpose.


I remember this talk quite well. Also the other talks by David are interesting.

Somehow there are not really consequences on this. So either archiving stuff, at least in the business context, is not really important. Or we simply trust these copies. The latter one is of course scary.


Xerox invented generative image ML for document scans, off a single training sample 10 years ago!


Is this a Xerox specific issue or industry wide problem? The author suspected that this is not an OCR issue What about out scanners? HP, Epson, Cannon, Ricoh, etc?


Please add (2913) to the title:

https://hn.algolia.com/?query=Xerox%20scanners%20randomly%20...

Edit: oops, 2013 - typed on my smartphone without reading back :)


Hello time traveler.

I think you mean 2013 :-)


You missed the joke


I doubt any humans will be left in 2913. (Nice joke though)


Or was it 2018?


that joke might be too meta


(2014) or (2015)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: