A debugging journey - debugging a weird bug in macOS

chrissnell · on April 1, 2018

In twenty five years with various *NIX-based OSes, I've encountered a few of these. You spend hours and hours and sometimes data digging to get to the root of some bug and it feels so awesome when you find the ultimate cause and fix it. Then you step back and say, what did I just do for the last two days? Two days of your very finite existence on this Earth, spent chasing some stupid software bug! I hope that it makes some difference in the grand scheme of things but it almost never does. We fix these things because they are there.

maplant · on April 1, 2018

I feel that, but who cares? Nothing makes a difference to anything. The things that are important are the things we think are important

bollockitis · on April 1, 2018

Exactly. Meaning is subjective. Chasing down a bug can be an exercise in patience, diligence, and an opportunity to improve one's skill. The activity itself is irrelevant.

duncan_bayne · on April 1, 2018

And because it's a joyous experience when it works.

I just introduced my five year old son to the passwd utility, when he asked me if I could change the password on his laptop[1]. The look of pride on his face when he mastered it (blind password entry was a challenge) was priceless.

[1] ThinkPad X121e w/ SSD and Xubuntu. X series ThinkPads running Linux make great kids machines. Cheap, decent keyboards, and hard to break.

keithpeter · on April 1, 2018

X series Thinkpads running Linux make pretty capable machines for adults as well for the same reasons.

Back on thread: As an outsider, it seems to me that tracking down obscure bugs is a similar challenge to sorting out edge cases in obscure mathematics.

duncan_bayne · on April 1, 2018

That they do :) My work machine is a maxxed-out X220 w/ FreeBSD:

https://www.instagram.com/p/BXPY61pFkRK/

I keep my setup scripts in GitLab, so I can spin up a new dev machine pretty quickly:

https://gitlab.com/duncan-bayne/freebsd-setup/wikis/home

nikanj · on April 1, 2018

Tangentially related https://xkcd.com/722/

matthewbauer · on April 1, 2018

Re: apple open source version numbers: You can find the matching macOS release by looking at the versions listed at opensource.apple.com. For instance copyfile-138 is listed on this page:

https://opensource.apple.com/release/macos-10126.html

Which corresponds to 10.12.6 release.

nicostouch · on April 1, 2018

Interesting bug! Reading through this made me add a couple of new questions to http://www.debug.coach/

Could there be a bug in a library/3rd party code? Could there be a possible race condition?

I remember recently hitting a bug in a 3rd party library in a Jar provided by Oracle in WebLogic to do with an incorrect implementation of the W3C DOM API causing XML validation to fail whenever attributes were present. Of course the J-Unit tests provide a different Jar and thus a different (correct) implementation and no bug. Took ages to track down.

oneweekwonder · on April 1, 2018

http://www.debug.coach/

You should add a ace md <textarea> below each question where I can fill in the answers and save a pdf to share with my peers.

jakobegger · on April 1, 2018

Ah.... very interesting. Now I know why [NSFileManager copyItemAtURL:...] failed to return an error when the file already existed, but only with a debugger attached. It presumably checks the errno after calling copyfile, and ignores the return value.

I remember debugging this issue -- it took me a few hours to find out why our test were using old data, until I found someone else reporting the issue [1], and implemented a workaround.

[1]: https://forums.developer.apple.com/thread/75927

tzahola · on April 1, 2018

errno was one of the worst ideas ever.

emmelaich · on April 1, 2018

Bad implementation*, not so much a wholly bad idea.

Being sometimes a macro, sometimes not, not threadsafe? And more.

tzahola · on April 1, 2018

Well... Yeah, there's some merit to errno: it lets you handle errors.

Otherwise it's a horrible piece of crap:

- It's a global (sorry, thread-local) variable, updated behind the scenes.

- You can't tell whether a function will update errno or not by its signature. You have to consult the documentation.

- You can't tell the range of possible errno values unless it's explicitly documented (and the documentation is kept in sync with the implementation).

- Errno only communicates error metadata; the success/error bit has to be communicated in-band via the return value (e.g. printf returning < 0).

- Except for some functions which return void (or like readdir, which returns NULL when reaching the end of the directory, or on error) and set errno on error.

- In which case you better set errno to 0 before calling the function, because while these functions set errno on failure, they not necessarily set it to 0 on success...

gok · on April 1, 2018

Don’t know why people are downvoting this; errno is truly horrible.

oneweekwonder · on April 1, 2018

Because no alternative is suggested. What could have been, give a url?

Just saying {x} sucks... well sucks.

asveikau · on April 1, 2018

Errors as return codes are harder to clobber as a side effect. Some examples, on Windows you have HRESULT or NTSTATUS, on Mac you have osstatus. Some posix functions also return errno values, eg. pthread_create, likely due to early perceptions that errno cannot be made thread safe. (Fixed these days by making errno into a macro that resolves to a function that does thread local storage.)

Probably fans of higher level languages might prefer exceptions, but this would be inapplicable to C.

Some frameworks (glib, parts of cocoa) have a concept of "error object" which is an extra pointer to a struct parameter that can receive rich error context.

mwfunk · on April 1, 2018

OK, put another way, errno is using a global variable to return an error code from a function call. Alternative: don't use a global variable to return an error code for a function call.