Hacker News new | past | comments | ask | show | jobs | submit login
Where Are the Sanity Checks? (eclecticlight.co)
42 points by ingve 8 months ago | hide | past | favorite | 56 comments



Setting aside whether this is really a bug, I don't think sanity checks make much sense here. Sanity checks are more common in aviation software, because of three factors.

The first factor is that the software is redundant and safety-critical. If something is wrong with its logic, it would be better to shut the system off and let a backup system take over (or let the pilot fly manually), than accidentally fly the plane into the ground.

The second factor is that the checks are based on the laws of physics. No matter how weird your plane design is, it probably can't travel faster than the speed of light. So we can safely say that if that's the result we're getting from our calculation, the system is malfunctioning. (By contrast, there are a lot of weird ways a count of filesystem space can be wrong. Because filesystems are just weird sometimes.)

The third factor is that testing is insufficient for aviation software. You can test every possible condition, yet still need sanity checks. Because damage/shocks to the plane and its sensors can suddenly introduce bugs which didn't exist during testing. (By contrast, most consumer systems aren't built to keep working after being dropped or snapped in half or struck by lightning.)


Time Machine backups use filesystem snapshots with reference-counted copy-on-write blocks. The total file size can easily surpass the actual amount of storage capacity. The same problem exists with modern virtual memory implementations, where the sum of allocated memory accessible to every process can be larger than the amount of physical RAM in the system (shared memory pages, file-backed pages etc.)

This is not about sanity checks. The value is correct. This is about presenting information in a way that makes sense to the user.


That was also my first thought, references, virtual memory, etc. But I insist the value is incorrect. The measurement units got mixed up. You cannot say X Tb (virtual space) of Y Tb (physical space). These are different units! I can see how Apple tried boasting some inflated numbers to impress users, just like they try to sell 8gb mbp saying it is same amount as the old 16gb.

By the principle of least surprise, the reasonable expectation is to show used disk size is less than total size. It is reasonable for both technical and nontechnical users.


The total real value is the amount that's relevant for the action of clearing space. And if the way the virtual space made to the real space was already known, then the operation could be effectively instantaneous. The implementation shows the two most useful (to the user) pieces of information available.


Apple claims the use of 8GB on M chips is more efficient than Intel with 16GB, therefore needs less physical capacity to match functionality.

Remember AMD Athlon CPU versions? Athlon XP "Barton" 3200+, running at 2.2GHz was equivalent of an Intel running at 3.2GHz.


Unless they've figured out how to let you store two bits per bit, it's still obviously a dubious claim. My x86 desktops have also had zswap for a decade, so whatever magic fairy dust they have can't be that simple.


The magic fairy dust is "really fast NVME" and "really low latency memory access" which for most users makes the 8GB give a performance comparable to slower 16GB memory (and not need to read from disk).

That's about it. If you're doing anything other than regular web browsing and word processing it's not going to be the same as having 16GB. But I'm certain the average Mac user is not a developer, video editor, or other power user.

Like it or not, tech nerds are not the average consumer of Apple hardware.


I’d be concerned, in the presence of widespread sanity checking of these sorts, of the sanity checking code being bad and managing to make things worse. It’s well-known that error-checking code is about the buggiest code out there, largely because it’s not well-stressed.

There’s also the problem that it’s not always clear where such sanity checking may be useful, or what should be done if the sanity check fails. A human can recognise something is awry and apply reason to it, seeking confirmation or correction or ignoring the value and tweaking the rest to compensate, or whatever seems reasonable; but software can’t do that unless it’s been explicitly coded in, and, simplifying far beyond the point of strict accuracy, doesn’t make mistakes in calculations so that half of your human error-detection heuristics would be a waste of time and you should just fix the obvious bug. (This is, as I say, a gross oversimplification.)


I worked on a retail pricing system that had a sanity check for prices over 10.000. But once we moved to Japan, that sanity check silently stopped all sorts of items from being processed.


This is a classic localisation issue.

When I was younger and worked mostly for Danish companies, we always knew that that a system _could_ be utilised in another locale. as such we always tracked currencies, timezone, etc.

Starting to work for American companies all that is out the window. An int with the var name "amount" – of cause that is USD!


> An int with the var name "amount" – of cause that is USD!

Or is it cents? Or fractions of cents?

Enjoy digging through the source code and commit history (which may sometimes look like db.prices.old.2022.bak.php) to try and figure out what the answer is!

Bonus points if at some point, that changed. Extra bonus points if it didn't change everywhere.


Ours wasn’t called just amount, it was called “amount_usd” and represented the local currency. Fun times!


What an aggravating article to read.

was quietly trying to delete some old Time Machine backups, and had selected quite a few of them when the Finder assured me that they came to a total of over 60 TB, and that was on a 2 TB SSD

Multiple links will count the same file multiple times if you traverse the filesystem the straightforward way, and filesystem compression and other trickery like CoW (which AFAIK is what Time Machine does) also no longer makes it valid to assume that the total size of all files on a storage device will be <= its capacity. I'm not even a Mac user, yet I easily knew what was going on at first glance.

A little sanity checking should have revealed that claiming that amount of space could exist on a 2 TB SSD was impossible.

It really irritates me to see people make such claims with confidence, ignorant of their own lack of knowledge. Disk compression (early 90s). Symbolic links (ever since UNIX existed). CoW backups (Time Machine and probably other preceding systems). Perhaps the missing sanity checking here is in the brain of this article's author...

https://eclecticlight.co/2023/04/28/apfs-hard-links-symlinks...

https://eclecticlight.co/2022/07/23/explainer-the-arithmetic...

https://eclecticlight.co/2020/07/29/time-machine-and-snapsho...

https://eclecticlight.co/2019/06/10/time-machine-past-presen...

...who clearly already knows about how this works !?

The only filesystem I know of in widespread use where the assumptions the author makes would be true is the FAT family. One link only. No compression (except as an add-on container). No CoW or other fancy features.


You're making the mistake that you're accusing the article of. Of course the author knows these things. As you just showed. Yet, the Finder GUI still reports bad information. Information that can mislead users into making bad decisions. That's what this article is complaining about. The article is talking about Finder, the GUI. It's showing screenshots. It's not talking about probing for CoW status using CLI filesystem tools. What are regular users supposed to do? A sanity check would at least have the Finder report nothing instead of incorrect information.


Saying that finder should sum file sizes in a way that accounts for this makes sense.

I'm not sure adding a separate "Sanity check" would help though. If the sanity check is that the reported file sizes shouldn't exceed the disk size, that's still a lie.

If you have 1000 duplicate 10GB files that are sharing storage blocks in the file system, that's either 10TB or 10GB of stored files depending on how you count it. Reporting 2TB because that's the size of the disk is a less accurate answer than either of those.


Sanity check, not min().


What value is sane in this case?


You're making the same mistake if you think that's "incorrect information", because it isn't --- there is a clear and logical reason why the sizes are reported the way they are, and if you'd only take some effort to learn instead of complaining because they don't fit your incorrect assumptions, you'd understand.

A sanity check would at least have the Finder report nothing instead of incorrect information.

You're saying to just not show any size at all if the filesystem contains more data than its capacity, which is basically going to be 100% of the time after enough backups are made? WTF. That makes even less sense.

I'm not one to trust AI much at all, but I bet even an LLM these days would know and tell you why the total size of all the files on a filesystem can appear to exceed the capacity of the storage device.


> You're making the same mistake if you think that's "incorrect information"

No, that would be a different type of mistake.

> you'd only take some effort to learn instead of complaining because they don't fit your incorrect assumptions

I know exactly why it reports it like this. That wasn't the point of the article, and not the point of my reply.

> You're saying to just not show any size at all if the filesystem contains more data than its capacity, which is basically going to be 100% of the time after enough backups are made? WTF. That makes even less sense.

No, not 100% of the time. But it would be a significant portion of time, yes. The Finder should probably not even try to report the information like this if it's going to be misleading such a high percentage of the time, if Apple is not going to add a better way for users to account for where all of the space on their drives is going.

> I'm not one to trust AI much at all, but I bet even an LLM these days would know and tell you why the total size of all the files on a filesystem can appear to exceed the capacity of the storage device.

I tried to find a way to interpret this other than, "you should just use ChatGPT as a person to talk to, because ChatGPT is smarter than you and could explain to you how filesystems work," but I wasn't able to.


Why did you add

> because ChatGPT is smarter than you

? It's really weird framing.


Because the reply was written like this:

> … but I bet even an LLM these days would know and tell you …

The "even" here says that knowledge of the filesystem is basic.

> I'm not one to trust AI much at all …

This part shows that the person replying does not think highly of AIs.

Together: AIs aren't very smart, but they at least know about filesystems and could explain it to you.


respect for something's knowledge seems orthogonal to trust to me. I can respect a serial killers intelligence without remotely trusting him.


Maybe Finder could display two values, size on disk and total bytes in files, or something like that. At least some hint to the user that the software knows what it's doing.

This idea where "the user is wrong when their understanding of the software/computer doesn't match the particular mental model the software developer had in mind" is one of the reasons why software is so bad. This particular author probably knows why the numbers look weird, but he is writing from the point of view of a layman user (99% of users out there) who would have the exact questions. If my elderly parents had a 2TB drive and their computer said it was doing something with 60TB of files, they would be right to question that number.


It might not be technically incorrect but it certainly isn't very useful information.

What does it matter if there is one backup of a 1GB file or 100 deduped backups of the same file? Telling me that the backup of that file takes up 100GB is not useful information in any way that I can think of, it's just a convoluted way of telling me how many backups I've made.

It is not actually taking up 100GB of space nor do I need 100GB available to restore the backup.


However, the information is not wrong to begin with. You can extract that amount of data from the disk if you copy the filesystem to another filesystem which does not support these features.


No. There is not 60TB of data on the disk, which is what the GUI is reporting in the article. The GUI doesn't say that 60TB of data would exist if the snapshots were each individually flattened and expanded to another filesystem. It says there is 60TB of data filling the disk.


However, the code traverses that amount of data. It doesn't matter if you pass over the same inode n times. From OS point of view, there's no difference.

Of course you can write a more complicated calculation algorithm, but it'll require tons of syscalls to determine the type of the file handle you're interacting with and will be probably very slow.

This is what rustup did inadvertently back in the day, and it caused NTFS to melt down to a halt because of stat() overhead [0].

[0]: https://www.youtube.com/watch?v=qbKGw8MQ0i8


Why does the user care about how many times an iterator passes over an inode? The Finder is reporting incorrect information to them and may lead them to make bad decisions, like buying unnecessary external storage drives. You've missed the point of the article.


I don't think so, because a) macOS provides correct free space information, b) It explicitly says that "your backup disk is full, and I'll open up some space, so your earliest backup will be at $DATE", before actually deleting any snapshots.

Our disagreements stem from the assumption that the reported space is incorrect to begin with, but there's already "disk on space" note on every file, which provides physical size of something.

We can argue that the progress may be calculated over that number, but I still think it has its own limitations with performance.


I think that's fair when you consider it all together.


What happens then if you try to copy that 2TB of 'data' to a filesystem that doesn't support links? 60TB would be used in that case.

There are some use cases that are computationally expensive to figure out and don't have a simple/easy answer to solve.


That's not the information that dialog is purporting to display. That's a theoretical thing you've made up for this comment.


It's a real thing that happens all the time when copying. Delete is a different operation, but now you're asking for a different space calculator when performing delete which will have its own set of bugs and performance issues.

Hard links are a pathological case.


> I'm not even a Mac user, yet I easily knew what was going on at first glance.

The vast majority of computer users wouldn't. 60 TB might be technically correct but it isn't useful. I don't want my computer to tell me "this data you're deleting adds up to 60 TB when you count it an unspecified multiple number of times". I want it to tell me (at least roughly) how much space on disk will be freed once it's deleted. If it can't do that, I agree with the author: I'd prefer an unlabeled progress bar, or perhaps 0-100%.


Also, it's extra confusing when this comes from a person who can reliably dig deeper into macOS than most people.


I'm joe blow apple user.

Sorry, you have to have a masters in disk compression algorithms to interpret our tools.

LOL. You didn't even count how many symbolic links exists? You plebian.


...I've once had an installer refusing to install the program because I had several GiB of free disk space and obviously disks so large do not exist. Thanks for having this sanity check, I guess.


Yea, that wasn't a Sanity Check, it was a terrible assumption by the software developer. I think one of the best ways to proactively find bugs in someone's software is to look for 1. declared constants and 2. comments like // this can never happen

For 1. constants: Are they really CONSTANTS in the sense that they are immutable laws of the universe? Or are they assumptions you are making to make your program look nice? I can understand declaring pi as a constant or the gravitational constant. Is MAX_DB_RECORD really a constant? OK, so let's say you believe it is--now you just gave yourself tons of "checking" work every time a record is added, and deliberately added bugs in the cases where you don't check. Did you catch them all? Is MAX_HD_SIZE really a constant?

For 2. this can never happen: These are the parts of your code that edge cases swarm to, in order to give you all your P3 bugs. Really? It can NEVER happen? If so, why are you logging and doing a stack dump there? A "This can never happen" code block is usually not a sanity check, it's usually a deliberately engineered bug.


Well, most of the code of the 2nd kind that you describe that I've seen had a comment of the "I am almost certain that this condition can occur at this point but I may be wrong; as Knuth once said, 'I've only proven this code correct, haven't actually tried it' so let's add some debug logging, just in case" kind near it which IMHO is an entirely reasonable approach.


Reminds me of a Reddit submission where the "WalkScore" map assumed rent could never be more than $5000/month:

https://imgur.com/2DrlnYH


I keep a few screenshots of such hilarious occurrences. One of my favorite one is a list of stock prices on a website where something had gone wrong and numbers were all in the thousands of trillions of USD. For a split second before my morning coffee I wondered if the world had just experienced hyper-inflation overnight. The bug only lasted a few minutes but it was "cool".

Now the thing is: safety checks and then what? Because basically such safety checks are only really useful for developers and, arguably, should be stuff present in some kind of tests. What can the UI do? Not show any estimate when the numbers would be nonsensical?

Another wonderful screenshot I have is some old MS software where something failed and it tells me one of the reason it failed may be because "your computer is turned off". I mean: it was literally using that very same computer to tell me, in a pop-up, that my computer could be off.

Fun stuff but I wouldn't sleep (or write a blog) about it: a comment on HN at most ; )


> What can the UI do? Not show any estimate when the numbers would be nonsensical?

Yes, exactly that. Also, log a warning and, if possible, add the warning to outgoing telemetry.

Sure, in theory such a situation should never happen because everything was already caught in tests. In theory, our software should also be 100% bug-free. However, because that's not how the world works, it's polite to the user (and makes a better impression) to detect such output and correct for it.

You have the same situation with websites showing NaN in the UI: The website didn't catch it, because the variable is never supposed to be NaN - yet, for some reason it was here and the result was causing frustration for the user.


How many checks do you need for each language/locale that has very different displays for their local monitary denomination?


>were all in the thousands of trillions of USD

It was just trying to show Zimbabwe bucks instead of USD.


What is the takeaway here? Heuristic exception raising? Exposing error bars and confidence figures? If it were that trivial, would they even have to leave it to the user to make the actual decision?


I agree. With sanity checks, the program would report the size only when it is believable (yet still wrong), and say something like "error calculating size" when the number is bigger than the disk?

This doesn't help at all, and sacrifices consistency (and availability) for no clear benefit. Software that is wrong consistently (or mislabeled) can be used (or fixed by an errata). Software that aborts for harmless reasons is useless.


Dunno about you, but my takeaway is that asking "how much space does X occupy" on a modern filesystem is a poorly specified question. Which means that there isn't a single answer. Which means that software should not try to report a single answer, unless there is additional context that fully disambiguates the question. And generally speaking, that last part will not be true, but programmers will think things like "well, they're looking at backups, so we are in this context...", which will sometimes be wrong.

Stepping back a bit, this becomes partly a UX problem and partly a user education problem. Software alone is not going to be able to perfectly guess[1] why a human wants to know, so the human needs to learn enough to competently ask. Software can certainly help educate them, and I wish more software tried to level up the machine operator instead of trying to guess the right thing in the face of ignorance.

[1] Don't mention "genai". Even if the robots get to the point that they are better than me at contextualizing this sort of thing, until they assume legal liability for outcomes, humans need to make decisions.


I don't think the issue is with the number, but rather the context. What should the end user think when finder reports that there is 60TB of data on a 2TB drive? People with above average file system knowledge can generate an explanation for it, but what good does that number do?

When programming a feature like this it can be hard to spot the problem. The calculations for the backup's size can be 100% right, but still not appropriate to show.


Wow. Lots of thoughts. First off, if we're talking about the same thing(s) I'm in violent agreement. However I question that congruence.

First observation is that one light isn't a reliable on/off indicator because the light can fail; you need two lights, one for "on" one for "off". Then at least if no lights are showing you know one of them is broken. OTOH a meter which reliably reads 50% of actual or consistently jumps between 30% and 70% violates the principle of least surprise, but I can work with it once I know about it as long as there aren't too many competing distractions. Sometimes a broken indicator is better than nothing; sometimes it's worse.

People ask me to do a lot of tasks applying deterministic logic to nondeterministic systems (at lot of this comes down to lack of research, although the reasons for this lack of research vary; I don't accept these kind of jobs without asking some pointed questions first). Sanity checks feel kind of pointless here, because what would they mean? I've surrendered to the philosophy that feeling that I know what is going on is sanity; therefore the sanity check is I feel like I know what's going on. The approach I've settled on is "bug parts", or measuring the actual ingredients and including that in a label. In practice this means that I report certain (often reverse engineered) values, and count the number of times through various portions of code and report those. In reality it seems like the good ones collect in some distinct buckets or fingerprints (and I keep adding indicators until this occurs!); anything which is not in one of those buckets could probably use a closer look.

There are different ways of reporting disk (and memory) space. A common example of this is "copy on write"; a common corrolary is "start with zero". If this is being done, large chunks of a randomly written file might not be allocated and if you read them you get zero. This presents a problem for "space which is used": does this represent the commitment (this is a 30GB file), or usage (4GB is allocated, the rest is all zero)?

Finally, in a twist on that last point systems overcommit. Your allotment might be a limit, not a contract.


I still don't get Apples bizarre new snapshot system.

What i know is that a few years ago and back i would never have this problem, but now i'm often having gigantic snapshot files taking up extreme amounts of space like 100 gigabytes on my 500gb Macbook. Probably because i work with large files. And this is not explained to anyone unless they research why the hell deleting files does not give more space in some apple support forum.

OSX is stil my go to OS but these unexplained magic elements with no explanation are getting completely out of hand now.


My favourite version of this is Arc, the location tracking "guess what you're doing" thing. For whatever reason, I've always had trouble with phone GPSs being occasionally inaccurate (no idea if it's iPhone - but it's happened over 10 different models - or WIFI assist or my network or London) and saying I'm, e.g. instantly 5 miles away, there for 30 seconds, and then back to where I am.

Arc will dutifully report this and I have to delete these segments even though it's not physically possible to make these movements.

Or if I get on a bus or a bike, it'll e.g. think I've gone walking -> bus -> car -> plane -> stationary -> bus in the span of 5 minutes and... just... apply some simple sanity checks, I beg you.

(Example from today where I haven't moved more than about 50ft. Parents [Car 2m34s 188ft] Parents [Car 1m51s 327ft] Parents [Car 2m42s 262ft] Parents [Car 3m 547ft] Parents [Car 1m54s 596ft] Parents [Transport 1m14s 99ft].)


Thing with sanity checks is that reality has a way of getting in the way.

Like when the price of crude went negative. I was lucky enough to work on a trading system that had no check on the price, so it kept trading.


Someone from HR told us we weren't allowed to call them that anymore. No one knew how to ask for something we no longer had a good name for, so we stopped doing them.


Is this satire?


The first half definitely isn't, it was a term on the list of top priority bad things to stop saying. The second half might just be me mistaking correlation for causation.

Other items included:

* blacklist/whitelist - mostly complete, but we picked all the alternative names depending what part of what project you're on

* master (the git branch) - yeah, that didn't happen

* backlog grooming - always seemed like a silly name anyway, refinement is a better name. Our backlog isn't a prize animal being groomed and primped for show, it's more like a crude product being belched out of the bowels of the earth that needs to be refined in order to be useful.


there're no sanity checks because there are no sane "recovery" nor there are much risk. useless sanity checks mostly work towards making software annoying and unusable




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: