Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What's the weirdest bug you ever had?
26 points by martinbaun 6 months ago | hide | past | favorite | 41 comments
What's the weirdest bug you have encountered?

For me, it occurred during university on an assignment. We were supposed to create a SOAP C# integration, and I kept receiving the same error. I spent three days trying to resolve it, finding only two pages on the internet that detailed the bug—one in Russian and one in Chinese.

When I finally asked my professor, he simply mentioned, "Oh, you just installed the wrong version."

Yeah... the wrong version. I never figured out why this was a problem.




We were developing an Android multimedia application in the bad old 2.x days, and our QA person was running into a crash recording with the camera in our app.

She walked over to my desk to show me, and the crash wouldn't happen anymore. Went back to her desk and was able to reproduce it again, walked back to my desk and suddenly it wouldn't happen anymore.

After a bit of this, I finally brought my laptop over to her desk and discovered it was an out of memory error issue. She sat by the window, and with more sunlight at her desk she was getting higher frames per second from the camera, whereas in my dark corner the camera was recording way fewer frames and using less memory.


I love these kind of bugs that are due to physical environment factors that are typically invisible to us when thinking about software/hardware. Reminds me of that issue where someone's car wouldn't if they got vanilla ice cream but but any other flavor. Turned out it was because that flavor took longest to serve (or quickest). https://www.managementpro.com/my-car-wont-start-after-i-buy-...


Here's a fun one that's not really code-related:

I was working on a web application with a maintenance mode feature where it would show a "system offline" while changes were rolled out etc. We noticed there was a typo (or so we thought) in the message, so it said "system olffine" instead.

Because this message could be overridden by site admins, there was a somewhat convoluted data flow to produce the final message. I grepped the codebase for "olffine" and found nothing. Then I checked my local DB to see what message was stored - it was spelled correctly. Then I stepped through the backend code to look at the message being returned - still correct.

Eventually I was back in my web browser with the page open on the left (spelled incorrectly) and devtools on the right (spelled correctly). What the hell?

It turned out the problem was that the font we were using had made a mistake with the "ffl" ligature so it rendered as "lff". We contacted the font author and they fixed it.


I assume when you copied that text out of the browser it appeared as "offline", unless you had an email client that did rich text.

Would have been funny to get a bug email saying it should be "offline" but it's "offline" instead.


I don't think anyone had tried copying it until after I figured out the problem, the bug report would've just been a screenshot.


Around 10 years ago, my friend and I were working on an assignment for our intro to programming course. The assignment involved controlling an LCD screen with an Mega 2560 board.

Finally, after many failed attempts and a few too many coffees we managed to complete the assignment. I left a comment in the code “// We did it!!!!”, and we called it a night.

The next day we tried to demo to our TA and suddenly our code wouldn’t upload. Tried multiple PCs, multiple arduino’s, and had multiple TAs look into our code. No idea.

Finally one brilliant TA heard our story and deleted the comment I left. Suddenly the upload worked! Turns out anytime the bootloader saw “!!!” anywhere in the code it would drop into debugging mode and cause the upload to fail. Even if it was in a comment! That bug gave me major trust issues working with that 2560 that semester haha


Once upon a time I had to track down an issue where very rarely, with no discernible pattern the web app would produce garbled PDFs. We would restart all servers, everything would be fine for a month or two, then random bad output. Turned out this happened when an admin account remotely connected to an app server, which caused a reset of default screen resolution (only for admin accounts, not regular remote connections), which messed up the PDF library that relied on a specific resolution (it was HTML-to-PDF conversion). Lived with it for a couple of years at least before tracking down the root cause (after many many failed attempts). The fact that only the server with the reset resolution caused the problem confounded the issue.


Oh man, PDF generators are like the least deterministic thing ever coded.

I don't know if they somehow depend on external video drivers or printer drivers or font engines or what.

But what you're describing sounds exactly par for the course to me.

There were a couple of years when everything I printed to PDF on my Mac looked right, but if you selected the text and copied and pasted it, it was gibberish, because it inexplicably shuffled all the character codepoints.


> it was gibberish, because it inexplicably shuffled all the character codepoints.

I forget what they called it, but that was a feature touted by Adobe and others a few years ago as a sort of pseudo-DRM. I have encountered many pdfs like that years ago, and think many pdf readers nowadays have features to undo the text scrambling.


Yeah, I'm aware something like that has been a feature.

But this was just printing from Chrome and using the macOS built-in Save to PDF in the printer dialog.

This was a straight-up bug which I filed with Chromium but there was nothing they could do because it was a bug in macOS.

It did finally get fixed after a couple of years by Apple.


We had an instance where a literal semicolon was accidentally added to the top of every page on our e-commerce website. Nobody noticed it for a while because it was hidden underneath the nav bar thanks to z indexing. The weird part is that after we found it and deleted it (someone erroneously terminated a line with a semicolon in a Razor template), it crashed the site entirely.

In the end, I found that the semicolon was in the <head>, and the inclusion of literal text caused document.body to be non-null. A later script in the head relied on the existence of document.body, making that particular semicolon load-bearing. (Angular 1 times, y'all.)


We were having a record breaking heat wave in the UK, we joked that see even Jenkins can't cope with the heat all builds are failing It was not uncommon to get transient failures so no one looked into it, it was too hot, in the UK we are not used to dealing with that. Eventually someone did look into it, there was some little used functionality in our platform which queried a weather API, someone had written a unit test that used the real live API and set the expected temperature value to be lower than the record high temperature in London, so our pipelines really had been broken by the hot weather.


Our health database was being used in many states but one hospital reported error messages of "Invalid patient postcode" to our Client Support. C.L. couldn't duplicate the problem so I was sent to investigate, after checking that our same version was not giving the same errors. Long story --> Short; the operator inputting the postcode data was had been using `l' instead of '1'. The font in use made it difficult to detect the differences.


I had an 'l' for '1' error half a century ago in FORTRAN code, keyed in by me on punch cards. The line printer printout of my code distinguished the two symbols, but the ribbon was old and the distinction was nearly--not entirely--obliterated in print.


These are annoying. Reminds of two bugs I had as well.

1. One guy used tabs instead of spaces for fun and we couldn't find the records. 2. One.... "person" decided to put some values as null, for fun. :D Didn't break the system, but the client thought it was a bug.


An executable from a nightly build kept crashing and I got tasked with fixing it. I was a novice back then and spent most of the day trying to figure out what was happening, and when looking at the disassembly I saw it was crashing at a 'hlt' instruction, which shouldn't have been there.

Next day, after another nightly build, no more crashes. I did a binary diff between the crashing version and the new one, it was a single bit. A bit flip on the build server.


Around 2000 I worked for an ISP. We had some customers reporting that they couldn't get to some websites. Narrowed it down customers who were:

  * Running Mac OS 9 (not Mac OS X)
  * dialed into an Ascend NAS (Not other vendors)
  * Assigned a Dynamic IP
  * Accessing ASP-based websites
they would get a blank page on the website.

We actually had a computer in the office we could replicate it on but gave up at that point since we didn't have something to debug the network traffic. For the single-digit number of customers we gave them static IPs or something.


we didn't have something to debug the network traffic

What do you mean? You couldn't mirror traffic to a linux box and use tcpdump there?


I can't remember if the computer had an internal modem or it was connected by a serial port. Either way it would have been harder to just mirror the traffic. We looked into some tracing tool but it cost hundreds of dollars.

Decided the workaround was easier, especially since the Mac OS 9 was a few years old at that point and the users would eventually upgrade.


I have been in IT for like 30 years, I have no idea what you just wrote here :D Though, I never worked for an ISP.

How was it working for an ISP? I could imagine crazy weird edge cases like this one


It was an interesting job. Things were evolving fast and the Internet had only really just gone mainstream. Also lots of small companies rather just just Telcos. Some wild West stuff like the time our CEO came in and demanded we block another company because the other companies CEO had pissed him off.

Back then we were still running things ourselves like Email, Usenet, customer Homepages, FTP servers, Software mirrors. The technologies were also a bit weird, we had Satellite Internet and sold customers a cards and a dish to use it. We also used the same tech from broadcast towers in cities.

We even used Satellite to move our own data (one way) because it was cheaper than undersea cables for a while.


the start of all new tech is always super exciting. I have considered moving to more exciting tech like Bitcoin or AI. But at some point you also become too old :))


Earlier this year I was looking at an NPE that started happening within a rules engine used by an insurance quoting application for discount eligibility. Ultimately the nature of error was evident - a rule was written in an unsafe way and necessary data was not present, but it wasn't immediately clear why the data wasn't being mapped properly.

The error had only been reported happening a few times in a development environment. I was able to discern that the first time it happened was the morning after an update to Spring 3. Debugging locally with code written just a few days earlier didn't trigger the error, so I knew the Spring 3 upgrades had to be related. The missing data was supposed to be derived as the result of a library call to another rules library maintained by a different team, used to derive pricing attributes from information on a request.

After a bit of debugging, I could see that the data in question should have been derived by this other rules engine, but no data of any kind was being mapped from it. No errors were logged in the scenario, and debugging was very fiddly. Notably, the error messages at different points in the debugging process differed on subsequent requests after the first request was submitted. This required restarting the application locally after each pass of the error. This made me think that some static structure was at play.

This rules engine made use of the popular Jackson package to parse YAML files containing lists of rules to be executed subject to constraints. I could see that this parsing initially worked, but failed shortly into the execution flow. No rules were being executed even though they were being scoped for execution. After a few hours of incrementally debugging the scenario, I saw the true culprit: a class from the Apache Commons library was missing at runtime. The ClassNotFoundException was silently ignored and allowed processing to continue, only resulting in a NPE for a limited number of scenarios that required this additional rules engine. The class in question should have been provided transitively from our dependency on the other rules engine maintained by another team, but migrating to Spring 3 seemed to cause some incompatibility with that error. Adding Apache Commons to our build config (and fixing the unsafe code) fixed the issue, but I still don't know perfectly why the issue was happening. I'll probably look back at in the near future


This happened over 20 years ago, but I was helping a co-worker debug an issue they were having with a Windows application written in Delphi. This was before Google was a thing, and waaay before Stack Overflow, so getting help to solve these kind of problems was a bit more involved.

As far as the issue, if they ran the offending code in the debugger, it worked flawlessly. But it would fail every time in the production build. Usually, this would point to some kind of race condition, but the code section was innocuous. It was essentially running the Delphi equivalent of strpos on a local variable.

I was comparing the build flags between the debug version and the release version and one thing that caught my eye was the optimization flags for the compiler. Lo and behold if you brought the optimization level down two notches the bug went away.

I don't think I ended up getting into the disassembly to submit a bug report, as the optimizer was almost certainly doing something it shouldn't, but at least we found the source of the issue.

Since we didn't want to actually disable optimizations on our release build, the "permanent fix" was to re-write our own strpos-equivalent in such a way that compiler optimizations didn't break it.


Similarly, found a bug in clang around 2010 that would only happen with max optimization. Actually did manage to track down the root cause; an array access would fail if the index > 255. It went something like this: on ARM (this was building for iPhone) the LDA (Load Accumulator) instruction can store the memory offset (array index) within the instruction itself if it would fit within 1 byte, otherwise the offset would have to get loaded from a memory location pointed to by a given register. One of the two cases was faulty.

Was just about to report this, but my Mac got upgraded to the next version of OS X, which magically solved the problem. What does the OS upgrade have to do with compiling? In the world of Macs, Xcode was also upgraded along with the OS, and in the newer version it was already fixed. Dangit!


I once had a Bug in SQL Server, where a SUBSTRING(field, 1, 255) in a varchar field that was 255 in size improved the query performance about 1.5 times.

We never found out, what the problem was, but we traced it down to the SUBSTRING - removing it made the query significantly SLOWER.

That was weird.


Seems likely that the substring gives a hint to the optimizer/executor about the size of the field, otherwise it could be whatever the max varchar size is.

There are likely specialisations for text fields under 256 bytes (and more generally, all common fixed-sized subsets of variable-length fields) which are made available when substr is used. For example, these values could be stack allocated rather than read from the heap.


Did you check the execution plans before and after? Probably caused a different plan that happened to be faster by luck.


It was long time ago when I was an intern. I had no clue of Nothing back then... :)


Thanks, I only knew about execution plans after doing a DB course, so understood!


We once got paged because our app was down. But… it wasn’t. Everything was green and API requests where being processed.

After some digging and a lot of luck, it turned out that BT (huge ISP in the UK) and unceremoniously added our hostname to some internal blacklist, meaning it would never resolve, specifically and only for BT ISP users.

Getting it removed was non-trivial, and it as only through complete luck that an employee had a friend who worked at BT and was able to escalate the problem internally. Without that connection we would have been screwed, as there is nothing we could find about this blacklist on the internet or how to contact them about it.

Terrifying.

Sort of a bug, I guess? Maybe with BT?


Windows BSOD on boot due to clear plastic tape on motherboard. Would boot and pass memtest86+ with no errors, but would consistently fail boot to Windows until tape removed from area opposite lower PCI slots. If I remember correctly, I could also successfully boot Linux while tape was present, but I may be mistaken as it's been a few years.

This issue was also present when booting a completely different hard drive with Windows 10, which usually would work fine on other systems.

Once tape was discovered and removed, normal boot was restored without incident.

BOFH's who want a good prank take note.


Reminds me of the MIT's magic switch story http://www.catb.org/jargon/html/magic-story.html


I love that story, and I remember it from many moons ago! Thanks for the reminder. A choice quote:

> It had two positions, and scrawled in pencil on the metal switch body were the words ‘magic' and ‘more magic'. The switch was in the ‘more magic' position.

That reminds me of this post I recently saw a friend post on Twitter, but apparently was popularized on Reddit many years ago:

https://twitter.com/lauriewired/status/1801488387272282284

https://www.reddit.com/r/techsupportgore/comments/1xq8h8/eve...

"Everything stopped working after I upgraded the memory"

https://imgur.com/everything-stopped-working-after-i-upgrade...


Weirdest? Some memory trashing which just so happened to replace a branch-if-equal with a branch-if-not-equal, and that code just happened to be in a multiplication subroutine (no hardware multiplication) causing multiplication to always give a negative result, post-trashing. This was showing up in some compiler-generated offset multiplications which were optimized out unless it was a debug build, long after the memory trashing took place.


Our JSP server would crash but only when the moons of Jupiter were aligned and the bakery across the street was serving the blueberry scone (CNR with the ham & cheese)


Reminds me of the recent https://x.com/CupiaBart/status/1793930355617259811

Engineers train a AI model to play an old game. One day their AI performs worse, on multiple hardware. Turns out the game has extra logic when it's full moon. "The player character is luckier, werewolves appear in their animal form, and the dogs howl ominously"


sounds like a delicious bug, Klaus Schwab would be proud :D


Randomly could not send emails through outlook. No amount of screwing with accounts or outlook files would resolve this. Then things would magically start working again for a time. Finally pinned it down to the wireless mouse. Still don’t know why, but if that specific model of Microsoft wireless mouse was plugged in, outlook wouldn’t send emails. Confirmed with another of this model mouse on a separate person’s computer.


At the UNH-IOL in 2004 the departments all had their own /24 subnets. We were warring with one of them over their winning where we got lunch, and so we put a line of javascript in the webmail html client that would open the print page dialogue box 0.5% of the time if the request came from that subnet. Oh, I misread 'had' as 'put'.


Couple years ago on employer's Macbook a JVM profiler kept crashing midway through profiling a local webserver. No error message. After digging deep, on some forum I found the suggestion to unplug all external monitors. I did indeed have an external monitor connected and thought "No way". Yes way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: