I deleted the cache files from apt after installing (/var/lib/apt/lists*)
Looks quite nice, but it seems Ubuntu packages are much bigger than Alpine package, e.g. Postgres is 159 MB in Ubuntu Bionic and only 32 MB in Alpine (including dependencies). Do the Ubuntu packages have more feature than the equivalent Alpine packages?
> Do the Ubuntu packages have more feature than the equivalent Alpine packages?
The ubuntu package comes with more stuff. e.g. It includes all the 'in tree' PG extensions and their dependencies where as alpine keeps them in a separate package, postgresql-contrib.
Speaking of musl, I ran Void Linux with musl for a few weeks on my laptop but found that a lot of the software packages were prone to crash and many of them wouldn't even run correctly at all. What has the experience of using Alpine with musl on servers been like for those of you that do?
That's very interesting. I would love for someone to dig deeper into this.
What happens if you do a multistage build (to get rid of the build tools once you don't need them) and compile Postgres from source on Alpine and Bionic?
Consider how dope it'd be to ~"tree-shake" packages based on config files, if doing immutable infrastructure. Basically, only compile / include the features of the services you run based on what's required by the config files. I'm sure that this is a complex problem, but it shouldn't be impossible to mostly / fully automate.
This is brilliant. You'd need to basically define every single compile-time param and then have your build process emit a customized binary (and retinue of supporting stuffs) that does just that and nothing else.
Probably makes your total build time significantly longer; you'd need aggressive caching based on the feature flag set.
This is mostly how gentoo works. You define global settings, like "gui" or "pulseaudio", and you also have per package features to enable or disable. After setting these up, when you install a package, it is installed using only the features you enabled, so you have a lot of control over what and how you make your system work.
I can't imagine that to be one of the more significant factors in this case. It's much more likely that the ubuntu version includes a lot more functionality. Just looking at the configure flags:
Alpine: --with-ldap --with-libedit-preferred --with-libxml--with-openssl --with-perl --with-python --with-tcl --with-uuid=e2fs
Debian:
--with-icu --with-tcl --with-perl --with-python --with-pam --with-openssl --with-libxml --with-libxslt --enable-nls --enable-integer-datetimes --enable-thread-safety --enable-tap-tests --enable-debug --disable-rpath --with-uuid=e2fs --with-gssapi --with-ldap --with-selinux
Specifically the differences in enabling ICU (portable collations) and nls (i.e. translations) alone are probably going to be the majority of difference in installed size.
But musl is linked statically, therefore the image must contain multiple instances of the library. Very impressive that the Alpine image still comes out smaller
"Do you see any other opportunities for savings? Can you help us crop the Bionic Beaver images any further? Is there something that we've culled, that you see as problematic? We're interested in your feedback"
In an industry where "kill and reap all children" is a valid technical statement, I see no issue with shaving beaver images. Both have valid technical meanings.
I thought you were kidding, but then I read the article and it certainly is either innocence bordering on naïveté or indeed an unfortunate “hahaha this is funny but fortunately no one will realize” type of thing.
“To date, we've shaved the Bionic Beaver minimal images down...”
Breezy Badger being already used (before they moved to incrementinf letter based system - having done hoary hedgehog and waryy warthog.
It wasn't until the first LTS version, dapper drake, that the letters started matching the release number. The second LTS, or 8th release overall - Hardy Heron, in April 2008, was the second "hh" version.
Shockingly we still have 6 hardy heron boxes on our network in far flung locations.
I think you miss the point but feel free to provide feedback to the devs: this is a RFC after all.
I have used the minimals for years now and they really do give you a pretty decent starter for 10, with a minimum of hassle and a minimum of bloat. Boot the ISO (PXE, obviously) and off you go.
Even doing the install by hand, you get a fully patched basic server up and running within 10-20 minutes - the install is all off the current packages. Add Samba and a few copy n pastes and you have AD joined. A few more copy n pastes from your docs and you have an app server.
I wrote this lot: https://www.mediawiki.org/wiki/Intranet which simply assumes Ubuntu mini at the moment. I do have screenshots and could put together a pretty noddy guide for that bit but I'm not sure its necessary. Actually now I come to think of it, it probably is. Couple that with my Ref. build and you have a domain joined, Kerberized etc app server within about an hour if you do the job by hand and are unfamiliar with the process. I can do it rather quicker.
Yes, the installer is a 30MB image - good. An installer's size is no reflection on the installation size.
EDIT: I am from the sysadmin side of things and not dev ops ...
Sysadmin/devops is a nearly meaningless distinction. When a developer needs to write installation or configuration code, they cross over. When a sysadmin needs to write code to monitor applications, they cross over. Senior sysadmins need to write more code, senior developers need to know more about systems and networks.
Twelve years ago, I hired senior sysadmins. About seven years ago, I hired senior devops. Same people, same skill sets, same approach.
"Sysadmin/devops is a nearly meaningless distinction"
It should be as you say but it isn't really. I too hire and fire. To be honest "dev ops" should not really exist but has become a thing. Many who describe themselves as such do not bother with the nuts and bolts. To be fair to them, though, quite a few sysadmins I've known are a bit slack on the networking side, for example. sigh
The boot partion looks to be sda14 or sda15. But judging from the output of virt-resize, it appears that although these are sda14/15, they appear in front of sda1. (When virt-resize is run on sda1, sda14 becomes sda1, sda15 becomes sda2, and sda1 is now the resized sda3, and grub is confused.
My 2 cents, and possibly quite wrong: Is the ncurses packages really necessary in the minimal ubuntu image? It seems likely that curses based programs should be likely candidates for exclusion in a minimal image, as they are not usually meant for automation.
Also, why are there still motd files in /etc/update-motd.d? No sshd but still a motd? Odd.
ncurses-base and ncurses-bin are marked "Essential" in Debian (from which Ubuntu derives, and automatically imports packages), which means that any package is permitted to depend on things in those packages without declaring a dependency - including in pre-installation scripts. This image is defined as being large enough that any package in the Ubuntu archive can be correctly installed.
You could eliminate this by going through and finding all packages with implicit dependencies on ncurses-bin / ncurses-base, but that seems highly unreliable, and also a huge amount of pain, since Debian probably will not want to drop these packages from Essential (so you will end up maintaining a huge Ubuntu diff).
I could also imagine a scheme where packages in Ubuntu main are scrubbed for these sorts of implicit dependencies, but packages in Ubuntu universe (which is where auto-imports from Debian go) aren't, and apt-get automatically pulls in all Debian "Essential" packages as soon as you try to install something outside main. But that's a good bit of dev work and it's not clear that you'd get a meaningful payoff.
(update-motd.d is from base-files, which is also an essential package, and is actually important; again, possibly Ubuntu could carry a patch to split up base-files, but for < 1kb of text files, it's unclear this is worth doing)
If I understand correctly, the intention for the minimal image is to be used in things like Docker containers and other lightweight environments, and not only do they lack an installer, but they also lack a kernel or bootloader; essentially, it would be composed of a minimal chroot.
ncurses isn't required as part of the configuration -- debconf allows command line setting (and reading) of variables, and if it runs in a headless environment things like "note" get emailed to root if they can't be displayed.
Our ubuntu builds are either
1) On real hardware, in which case it's a network boot which takes an IP address etc, and configured the partitions but allows you to override it (you have to press ok)
or
2) On KVM builds, where virt-install is used, network detail is specified on the command line upfront and partitioning is fully automatic.
Thus the majority of debconf settings are either default (a requirement of the package manager is a sensible default), or specified in the preseed file.
I've been using the MinimalCD images[0] to install Ubuntu for years. (They are the minimum you need to boot, and they download everything else to install Ubuntu.) I'm guessing that these aren't what's being talked about.
I'd been doing the same thing for a long time (loved how fast it was compared to the normal installer, especially in a VM), but the last install I did was with debootstrap straight onto the target disk from another running machine. A bit of a learning curve the first time, but I think I'll try it again next time.
Not a Linux user but out of curiosity just looked at the "Trusty Tahr" 37MB amd64 minimal image (mini.iso).
Most recent amd64 minimal image is 58MB ("Artful Aardvark").
Trusty Tahr bzImage compressed kernel is 5.5MB.
The initrd.cpio.gz is 20MB.
The uncompressed initrd is 52MB.
Assuming most of the initrd size is modules, can the Linux user reduce the size of the initrd by compiling own kernel and creating own initrd with only the modules she needs?
Or just ditch the kernel and initrd entirely. If you're trying to save a few MB on an Ubuntu image, you're almost certainly working in a container environment where you don't need a kernel inside the fs.
If you really do need a kernel, and the few extra MB required by modules is a problem, you should probably be using Buildroot or Yocto for your bootloader/kernel.
I don't think everyone is working within a container when they boot a Ubuntu minimal, unless your containers happen to have a BIOS or similar.
These things are a full OS installer ie put it on a USB key, CDROM, PXE boot or whatever. These are a minimal installer and not a minimal installation, although that is a side effect - you don't get much out of the box but you can add everything later.
You could, for example, do a minimal install and then do "# apt install libreoffice" and with luck (not too sure) get the whole lot - X etc - to run it. You might have to add a Window Manager and a few other things.
I agree -- there are plenty of reasons to use a minimal Ubuntu install. My point was that if size constraints are so tight that you feel trimming kernel modules out is a reasonable use of effort, then Ubuntu starts to be a more awkward fit.
If you constantly have to trim away bits left by the package manager (man pages, examples, extra kernel modules), your time is probably better spent with a distro that allows you to avoid ever laying those into the rootfs to begin with.
Also worth noting:these images are full minimal root filesystems. "installer" images refer to the images containing software --the debian/ubuntu installer for bootstrapping a root filesystem onto a mounted volume. The minimal images from thearticle do not contain this installer, and are stanalone root filesystems.
Yeah. To be more clear, you can install a "minimal" system of Ubuntu on bare metal by just installing the "required" packages only, although I think the default if you don't select any tasks in the installer is to install "required" + "standard", which is a small amount more than just "required". Either way, it doesn't include much. I have to install openssh-server on my "nothing-but-standard" systems, before chef comes in and drags along another 1000 packages.
The installs the OP is talking about are images that don't even have a kernel, and don't use the traditional installer.
These things are loaded on every boot, which needs to be fast. That includes both the time to load them into RAM and to decompress them, and it's possible that gzip still wins on that.
Personally, I keep my initrd uncompressed because I've found that on my SSD loading that is faster than loading any compressed version + decompressing it, but that doesn't hold for spinning rust.
Sure, but this is the installer presumably being loaded from CD/USB; in addition to not needing to be compressed the same as the resulting installation and being a one-time ordeal, these are slower media: the less you have to read from disk, the faster your end result may be, even including decompression time.
It’s why lz4 compression of a file system can be faster than no compression even with an ssd on fast machines; I wonder if that would boost your performance vs the uncompressed image (though I don’t know if the kernel supports lz4 decimal at boot time)?
>Sure, but this is the installer presumably being loaded from CD/USB
Doesn't have to be. You could also load it from HDD, which is probably rather common when we're talking about containers and such (and in other circumstances, the couple of MB you'd save here with xz hardly matter). Also there's this PXE thing that I don't understand.
> I wonder if that would boost your performance vs the uncompressed image (though I don’t know if the kernel supports lz4 decimal at boot time)?
It didn't last time I checked, but it seems like it does now. So I'll give that a go.
Turns out it's a wash. On my arch machine the uncompressed image is 27M, with lz4 it's 14M. Both take around 10s to desktop according to `systemd-analyze` (which includes 3s bootloader timeout).
Though I did figure out that for some reason my system was running "lvm2-monitor.service", which took 1s all by itself for doing absolutely nothing (don't use lvm). So I masked that and gained more time than any compression method could.
Somewhat off topic: Is there an effort similar to this for CentOS 7? The centos minimal image is still rather large and I have to do some really ugly things to prune it down. Even then I can not get it down anywhere close to 80 MB. That would be amazing.
It's been my experience that people are less open to hacking CentOS/RedHat. It's pretty much you get what you get and changing anything makes it unsupported which defeats the purpose of using an enterprise distribution. That's not my opinion, it's what seems to be the community's opinion when you bring up things like using non-stock kernels.
Thanks! I was looking at [1] and wasn't sure if those were binaries.
That's perfect then. Install GCC, compilers, build headers etc via `sudo apt-get install build-essential` when necessary. So this should be the same general approach as on Alpine.
libgcc_s.so.1 is a collection of utility routines used by all sorts of programs. The entry named "gcc" is a directory (that contains only empty directories?). /usr/share/gcc-7/python/libstdcxx/ is from the libstdc++6 package (looks like gdb pretty-printers for C++ standard library types).
I used to build embedded Linux distros for a hobby. The best, least aggravating way to have minimal platforms is to build them from scratch. Not only are they 10-50x smaller, you have more visibility over what's installed, and it's easier to tailor to your use case.
Something like OpenWRT or buildroot can fit into 8 MiB easy. These people have an actual reason to make small "containers" because flash is a big part of the BoM cost and adds up when you are shipping many thousands.
I suspect, though I could be wrong, that the reason would be that some packages support Ubuntu specifically and not Debian, because of a larger consumer user base. An important example of this is the Nvidia CUDA toolkit, which supports Ubuntu and not Debian.
Posting here rather than the blog because I don't have a google account:
What about adding sshd to the minimal install? If the purpose of this is minimal installs of containers and cloud servers and such, that seems like quite an omission.
If you have SSHD in the minimal image then you have to deal with host keys.
- on embedded and some other places where minimal images are used, generating the host key on first run can cause a very significant startup delay.
- on some container environments, environments are so identical that you might not have enough entropy to generate sufficiently unique keys.
- if somebody generates a host key and then creates an image from a running container, then you might end up distributing a host key, making what should be private public.
I've probably got some of these details wrong and am going to be promptly corrected, but there are some very good reasons for excluding sshd.
I use containers as lightweight VMs in many places. Generally I see this as a way to get a minimal install that other tools can then configure appropriately, with up to date packages fetched from upstream mirrors directly, instead of installed from CD and then upgraded.
I currently use packer.io to script the creation of a bunch of server images, and for ubuntu I've missed the "minimal install CD" that other distros have. Instead packer has to download a 800MB CD image, in order to install only a few hundred megabytes of uncompressed packages in a bare-bones install, which is then provisioned using some orchestration tool that at its heart uses ssh to login to the virtual machine.
Not having SSH means you need to add in some sort of serial-attach step to manually install sshd, or hook into the install scripts to download sshd as part of the install or whatever. Either way that's additional custom work that is probably common to a great many use cases.
And a couple of internal packages which have their own dependencies (including lldp, snmpd etc) which do a variety of things including user management (active directory based), automatic registration into our asset database and monitoring systems
You're running these containers in an orchestrator, right? That should give you API access to the running container, allowing you to get shell. E.g. with kubernetes, `kubectl exec` will get you into the container.
But the sibling comment about using a Dockerfile to install/start sshd works if you're running these containers on a remote host without any kind of access to the running container.
LXD containers make fantastic replacements for VMs! Just try 'lxc launch ubuntu:'. Then 'lxc list'. And then you can either exec into, or ssh into your machine container!
Not arguing that point for this image, but I use containers as lightweight VMs, not only as stateless-horizontally-scalable-try-to-be-google app servers.
Docker is awesome for making virtual desktop systems. My dev environment + all required IDEs and apps is a docker image running xrdp, x2go, and google remote desktop, and my home directory is mounted as a volume. Works great!
Now when I need to move my dev environment to another computer (travel), I just copy the home dir, docker build, and I have the exact same environment ready to go.
My dev environment is 100% deterministic, backed up to git, and serves as documentation for what I need installed to do my work. If I find I need something else added, I modify the Dockerfile, commit, rebuild and redeploy. If something messes up my environment, destroy and rebuild is only a couple of commands away.
For the record, LXC containers are much more cooperative as non-emulated VMs than Docker containers. I'm sure this is also the case with Virtuozzo, etc. (though I haven't tried them). These other container systems function nearly identically to an emulated VM, which is what most people actually want out of containers -- thin VMs.
It's stupid that you've been downvoted. It is so tiresome to see people ask such useless questions like "what? you want to use this thing in some way that my tiny brain cannot envision? what is this madness?"
His reply was snarky and created a false dichotomy. The 90% usecase exists in between “lightweight VM’s” and google scale horizontal app deployment. Consider that the former is just as much an outlier as the latter. This post is about a pragmatic minimal Ubuntu base image, which would meet neither case well.
Again, consider that you are creating this dichotomy, not the tools, or vast majority of practitioners in this space. Docker comes with (and encourages) lots of ways to persist state, as do orchestration environments like Kubernetes and ECS and you should choose the approach that suits the problem you are solving. If you want containers as lightweight VM's, there are a ton of ways to do that and they are actively supported.
I'm reluctant to get off-topic here, because the narrative relating to the actual post should be "Does it makes sense to have openssh in a minimal ubuntu base image", to which the answer is "No, obviously".
We've moved at work to mainly Alpine based k8s containers, which is awesome, but you lose a lot of debug ability.
Thinking about it - with Linux KSM the overhead of full-fat containers based on Ubuntu (for example) isn't massive. We may have to look at the metrics I reckon.
I often boot machines connected to the public internet or to a coffeeshop / other public wifi with the Ubuntu live CD, so I wouldn't like the live CD to have an sshd with a well-known password out-of-the-box. So if you're going to have to log into the machine to customize authentication anyway, you already have enough access to `apt-get install openssh-server`.
(It would be nice to have a one-click tool that builds you a customized image with your own SSH public keys baked in. Ubuntu doesn't have to run this tool - actually there's probably a cool project idea for someone in standing up a little website to do this, either by letting you paste in public keys or grabbing them from the GitHub public API.)
Containerization - taking a deployment that previously was made of lots of apps running on one OS on one physical machine (or a relatively generous fraction of one, as a VM), and turning it into lots of apps each running on their OS on top of the same amount of machine. Containerization significantly increases disk usage for each container and probably increases RAM use a little bit, but to first order does not affect CPU or IOPS or network.
I understand it is about containers. But no disk space of the world is worth your container app crashing because someone wanted to save a few megabytes by ditching glibc or otherwise just pushing more work into each and every individual container. It's Java Enterprise all over, who cares it's always allocating tens of GiB of RAM, you have to get to the point of buying a lot of it before it makes an hour of an engineers time worthwhile.
I prefer less disk usage because they enable faster and more frequent backups, and quicker uploads to remote backup. I'm not really running out of free space but I still prefer reducing disk usage.
Second reason is that I can't fit more than two hard disks in my laptop, and more than 4 in my desktop.
Third reason is that internal or external hard disks are not exactly cheap in my country. My country sees ridiculously high import duties and retail markup on all electronics.
less moving parts means less to consider for risks and bugs.
and all the other reasons people say small, like faster to load, upload, distribute into container models
Looks quite nice, but it seems Ubuntu packages are much bigger than Alpine package, e.g. Postgres is 159 MB in Ubuntu Bionic and only 32 MB in Alpine (including dependencies). Do the Ubuntu packages have more feature than the equivalent Alpine packages?