Hacker News new | past | comments | ask | show | jobs | submit login
How to reclaim disk space used by Docker volumes, but keep a few important on (domm.plix.at)
76 points by todsacerdoti on June 20, 2020 | hide | past | favorite | 34 comments



Docker (on a Mac) doesn't just hog used space, it hogs space from the very moment you install it [1] and the Docker team doesn't care.

Between that and Electron apps taking up any smidgen of available RAM I'm surprised my machine is able to run anything else at all.

[1] https://github.com/docker/for-mac/issues/2297


FWIW I skimmed that issue thread and unless it massively changes much later on, Docker is not in fact using that disk space as it is a sparse file.


Well, if the rest of the system treats the spare file as regular file[1], then it doesn't matter it is sparse, the space still can't be used.

[1] for example someone there mentioned that Apple Software Update app refuses to update system, claiming there's not enough space.


I took a brief look at the issue and it's very strange. People reporting the issue and piling on clearly don't understand sparse files (which is fine in and of itself), and their claims that "the rest of the system treats the spare file as regular file" are just not very credible to me.

The thing is, if you want to look at the physical size of a file/directory, you don't use ls, you use du. Similarly, if you want to look at disk usage and free space on a volume, you use df, not summing up ls reported size of all files. Using df for free space is especially important on APFS since you have APFS volumes sharing the space in a container. And of course, du and df correctly report actual disk usage. If you check Docker.raw in Finder, it says something like "63,999,836,160 bytes (10.25 GB on disk)", too. Btw, even `ls -l` reports correct total disk usage:

  $ ls -lh ~/Library/Containers/com.docker.docker/Data/vms/0/data
  total 9.6G
  -rw-r--r-- 1 <omitted> staff 60G Jun 17 02:01 Docker.raw
There's just no way "the rest of the system treats the spare file as regular file". I suspect people actually have disk space tied up in APFS snapshots (Time Machine takes a lot of those and I have to clear them constantly with tmutil on machines with meager SSDs), but they're not aware of that, instead they ran some crappy script or something found online, located this "huge file", then just blamed it for the disk usage. Any proper tool, e.g. Disk Utility (builtin), DaisyDisk won't make this obvious mistake. I thought maybe people were using Finder search to find large files and that's the problem, but I just tried and Finder search also makes a distinction between logical size and physical size, so it's not even that.

Edit: In case it's not clear, sparse files don't "reserve" their space. Try this:

  for i in {000..999}; do dd if=/dev/zero of=/tmp/sparse-$i bs=1024 count=0 seek=104857600; done
You just created a thousand 100GiB files totaling zero bytes in disk usage (not counting metadata).


I understand what a sparse file is and so do many of the people in that thread. The important point is, do all the apps on the Mac understand that sparse files don’t take up all that space? Some clearly don’t, which means their developers either don’t or didn’t account for them. It’s not the fault of the users in that thread.


They don't need to. If an application needs to know how much free disk space there is, they use an API like statvfs(3) that queries the filesystem, which clearly knows how to compute the disk usage of itself. No application is gonna ask for the logical size of every single file on the disk; why the hell would they even care about this particular Docker.raw file?

As I said, reports like "macOS installer says there's not enough disk space because of a sparsefile somewhere" and "macOS tells me disk space is low every two minutes because of a sparsefile somewhere" just aren't credible if you understand how it works. Otherwise, how would they behave if you create a thousand "100GB files" like I demonstrated? Going all crazy? Your free disk space is negative?


> just aren't credible if you understand how it works.

Yes, your theoretical knowledge, which several of the people in that thread obviously have, beats actual experience.

Not only is that not credible, it's about as useful and insightful as "it works for me" - and who on Earth is using statvfs on a Mac? Is the installer using statvfs? You've looked, right? And you believe every developer uses the correct APIs and never introduces a bug…

Try reading and not skimming, it's especially important when dealing with bug reports.


The alternative theory is that the installer is spending a very very long amount of time (on my computer i bet that would take at least ten minutes) scanning every file on my disk to look up the declared size and totaling it to check whether the size of the disk minus that number is enough disk space, which is so absolutely ridiculous of a premise that I don't even know how it is being taken seriously.


That's an alternative theory, among many other possible ones that wouldn't lead to the dismissal of an entire thread of developers' comments based on a quick skim and the strange idea that not one of them knows what a sparse file is.


> Edit: In case it's not clear, sparse files don't "reserve" their space. Try this:

> for i in {000..999}; do dd if=/dev/zero of=/tmp/sparse-$i bs=1024 count=0 seek=104857600; done

On an APFS volume with 18GB of free space, that bombed out with "No space left on device"


I have no trouble creating a 1TiB file this way on a 512GB max volume. Trying to create a 2TiB file this way does error out. MacBookPro11,5, macOS 10.15.5.


It is a sparse file: copying it is very fast when it's empty. I don't know if the OS recognizes the space claims, but it would be logical if it did so.


That is simply because they run inside a VM with pre-allocated space. Docker is not native to macOS.


I’m sorry, but am I being downvoted for being factual? Docker sets up a HyperKit VM (which brings with it a fair amount of overhead, since it includes a complete Linux kernel and a sizable userland). overlayfs runs inside that VM image.


I’m not sure why you’re being downvoted but your comment doesn’t address the issue. Of course Docker will need some pre-allocated space but setting it so high as default seems excessive and ignores the real problems it’s causing to people in that issue thread.


For docker-compose: When running and killing don't use ctrl+c It leaves anonymous volumes behind

Start as deamon with: docker-compose up -d

Stop with -v to remove anonymous volumes:

docker-compose down -v

This will slow down swelling


How do I view server logs in real time while developing if daemonized (while preserving all the colorization and stuff). Do I just do some kind of tail log stream or something?


I often do:

docker-compose up -d && docker-compose logs -f


A docker problem ? More of docker is the solution !

https://github.com/amir20/dozzle (a container that plugs into the docker API to render a webpage with logs of running containers)


I think it would just be:

  docker logs -f CONTAINER
If you want to watch logs from a specific file within the container then I would just use 'docker exec' and whatever shell commands you prefer.


docker-compose logs -f <service>

So you don't have to manually hunt down the container you want the logs for.


For local dev, I run all my docker stuff via minikube (`minikube docker-env` sets things up so your shell uses the docker daemon running in minikube, allowing you to run docker containers outside of k8s) and then from time to time I just run `minikube delete`, which deletes the entire VM - docker images and cruft included.

It’s a bit of a scorched earth approach, and it won’t help if you want to preserve volumes, but it’s a good way to ensure no docker cruft is left behind on your local machine.


on one system, I had two images, and just did docker build once, and thereafter docker run

and then one day my disk was full.

https://lebkowski.name/docker-volumes/

and I always added: docker run --rm ...

But clearly I didn't internalize all the stuff that's saved.


I have had good experience with this that is posted at the end of the page

https://gist.githubusercontent.com/mlebkowski/471d2731176fb1...


This is only necessary because docker is steaming pile of hot garbage. It’s the leakiest of leaky abstractions, and a hoarder to boot.


Could you explain why is it so leaky?


Client/server model is one of the leakiest things of Docker. Try mounting a relative path, or run docker inside docker, or try to inherit the current user’s security privileges.

To this day I don’t understand what problems the client/server model solved, and why it was worth all the problems it created.


I guess - and I might be very wrong here - that the reason for client / server architecture was to be able to schedule docker containers on several hosts without the need to ssh into them. And I guess something like docker-in-docker or docker containers accessing the Unix socket would be more complicated.


But why the need for Unix sockets or anything like that? Creating a container is a fancy fork(), and executing that through a foreign process (especially when on the same server) makes no sense to me.

Remember, containers are just Linux cgroups, there is nothing “special” about a container that requires a client/server.


For me and my coworkers, the leakiness is mostly around networking. If you start the wrong vpn, docker doesn’t work. If your ip tables aren’t set up just so, docker doesn’t work. Yet the documentation would lead you to believe that how docker does networking should be considered an implementation detail.


Is there a better alternative on OS X?


Off topic: I was surprised about how amused I was by this. It looks like normal humor doesn't cut it for me anymore.

> Because I cannot be bother to remember this, and don't want to google how it's done only to end up at my own blog, I did a quick realias to add a new bash alias


On Mac, pruning images, containers and volumes also helps with battery life, because the default sync process for the VM draws a lot of power.


Is it really necessary though? I think it doesn't prune recently used volumes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: