Hacker News new | past | comments | ask | show | jobs | submit login
Docker: Not Even a Linker (adamierymenko.com)
133 points by nkurz on July 1, 2015 | hide | past | favorite | 69 comments



Fantastic article. We need more deconstruction of "fads" (which I don't mean in a pejorative sense) so that we can quickly understand them without sacrificing tens of thousands of man-hours slowly coming to terms with each one. It would be much better if we could reason about the exact differences and benefits instead of getting bogged down in new terminology, etc.

More examples I've been thinking about:

- Goroutines (fibers are equivalent to threads, but coroutines are different)

- Safety and the unsafe keyword in Rust (not sure but the effective difference seems to be default-allow versus default-deny)


Regarding those two examples, more than having explanations about them, it would help if people cared about IT history and it was more accessible.

Coroutines are easily explained in Modula-2 literature.

Safe keyword in Rust goes back at very least to Ada and Modula-2. Also in Oberon and its derivatives, Modula-3 (which inspired C#). The literature for those systems also has lots of examples.

Being an old IT dog, that started when those technologies were new it is sometimes hard to me to see how new generations fail to find such information, even though it is available on the web. I guess the main cause is thatbone needs to know what to search for.


Tech is strangely ahistorical. Not just practitioners not reading the literature, but it seemingly being forgotten entirely. Possibly this is a side effect of so many of us being self-taught.

In another thread I've just been arguing with someone who thought that the DOOM code should have been thread-safe.


I would say cyclical. Every day we read about a similar framework working in a popular language when that solution already existed for long time.

But this happens in other areas outside computer science. For example, modern medicine rediscovering old medicine "recipes".

The problem in our field is when people talk all day about Docker while surpressing LXC from the discussion.


I think it happens for different reasons though.

With medicine it boils down to a dismissal of folk remedies as placebo.

But with computing its because the old ways were developed on mainframes and minicomputers in an environment that current generation may only have heard stories about.

This because the micro-computer era was pretty much a mental reboot for computing, as little if any software crossed over (until fairly recently).


i think being a young industry doesn't help with that regard either. Think about how the medical industry was in the middle ages! And tech is even younger than that time - a mere 70 or so years tops.


That is why I always look forward new technologies. We are very far away from reaching any tipping point.

Regarding IT history, on my case it helps to have been part of it since the mid-80s and my history desire to delve into available documentation from Egyptian, Greek, Roman days and apply the same process to IT.


> it is sometimes hard to me to see how new generations fail to find such information, even though it is available on the web

As someone who has attempted to go back and find old articles and concepts for which I remember only the generalities, I understand this completely. The problem, IMHO, is that search engines put too much emphasis on the publication date of an article - you'll get recent musings on the topic on the first ten pages, and then if you're lucky you'll get the original material on the eleventh.

However, you're frequently unlucky in this regard, since the search engines will devolve into poor-yet-recent matches before looking further back in time.

Ironically, the internet is forgetting, because we've too finely tuned the algorithms to value newness and community guesses over authority.


> I guess the main cause is thatbone needs to know what to search for.

It is common misbelief (particularly among yang people), that all in IT changing so fast, that any technology created more than 20-30 years ago is outdated and don't worth to study. The wheel reinvented many times.


I'm a huge fan of docker, but I've also drawn a comparison to linking in the past: some people are using it to defer (not solve!) Dependency management and distro ecosystem complexities. Fossilizing dependencies is not the future; that's just like static linking. And just as in static linking, if you hide in the corner, avoiding your distro's dependency and patch and package management systems, this ultimately creates more avoidable pain.


So I've recently reimplemented some of our applications service dependencies in docker for the main reasons of mitigating dependency issues and allowing for for developer local consistency.

I fully admit this is deferring solving these issues fully (solr 1.4.1 really?) but it has allowed for minimizing the issues the existing backlog was causing us. Arguably its also brought more consistency to our environments vs the chef/puppet/hand rolled management tools. We could have used these tools with a different process to achieve similar results though.


Yeah, I totally get it - I realized I'm a hypocrite when I posted this; just a few weeks ago I put a custom build environment together with an obscure version of gcc because I'm dealing with some code that depends on some of those "bugs".

I just don't think it's healthy to embrace this as an alternative to proper maintenance.


"Fads" are only un-useful when you don't understand them. But once you go over the edge from rejection to application, the fad state desists and you end up with a useful, cutting edge tool.

There are very few fads which persist long unless they are understood and used. So I would say that the "fad"-gadget'ization of our technology world is a subjectively-borne reality.

It doesn't matter if you don't understand it - some do. That's why its out there in the first place: someone had a use for it.


It's been said several times before that a large incentive for Docker's adoption was to get around the dynamic linking hell that is present in most modern Unix-likes.

It's funny the author mentions a "world without linkers" with my posting of an article about the TAOS operating system today. Go look there if you want some primers on achieving that.

That said, the author greatly oversells Docker's novelty.


Yeah, I see that a lot nowadays, it is what I get being an old IT dog.

Get UNIX containers, copy them with a sprinkle of FOSS and suddenly they are all the rage everywhere.


Maybe you old dogs should've barked a bit louder because the way we used to manage systems and system environments before docker was plain ridiculous. No one ever told me to use containers, I don't even know of any pre-docker technology that allows me to do easily what docker allows me to.

We used to do Chef scripts that are fed json configuration files that render configuration template files that installed loads of software into a completely bloated environment.

Now we just have a simple bash script that provisions the machine, which means it sets up the network, firewall, sets up docker some monitoring, a configuration daemon (etcd) and then runs the orchestrator (in a docker container of course).

And then from that point on, all operations are in the container world. You build a new production container, push it to the private registry, and tell the orchestrator to redeploy a service.

Was that workflow even possible before docker? Using FOSS? I get angry just thinking about how we used to do it.


We old dogs don't mind paying for software. That was the only option when we started working.

HP-UX vaults were already available in 2000.

One of the reasons of Tru64 existence was to provide a high safety UNIX environment.

Aix had its containers since ages.

I can list many other UNIX and mainframe examples.


Well if they're all paywalled-off that sort of explains why I wouldn't have heard of them. Still a bit weird that apparently they were such a good idea many corporations used them, but none of them had popular Linux/BSD alternatives, even when isolation tech has been in the OSS OS'es for ages.


FreeBSD had the first FOSS alternative with jails, then Solaris with zones. In the intermediate Linux had OpenVZ, but that was a much different container model and required kernel patching.


So if a typical BSD sysadmin would set up a Rails website, would he create one jail for the application server, one jail for the SQL server, then set up firewall rules so that the app server could reach the SQL server and that traffic to the host would be forwarded to the app server?


There is more than one way to do this and jail per application is one of then. May be not most popular, but often used.


    mainframe
That may be the real reason. Most people working on server these days may well only have heard about mainframes as something close to myths, much less actually interacted with one in a production environment.


you are a bit provocative for a start. Also what you describe does apply to vm.


> "the dynamic linking hell that is present in most modern Unix-likes"

WTF are you talking about...

There has never been a "hell" of dynamic linking problems on Unixes, this used to be a Windows problem. Even the "most modern unix-likes" doesn't make sense, since "most modern unix-likes" do not even use similar linking models.


We have dependency management built into the package managers which hides that from us these days. Unix and Linux before package managers was kind of a pain. Now, I will totally give you that it was nothing like the "DLL hell" of Windows.


Frankly i think package managers are part of the problem, rather than the solution.

From personal experience using a less elaborate variant of what nixOS/Guix offers (Gobolinux), ld and friends are quite adept at getting multiple lib versions sorted via sonames.

But package managers do not use anything like sonames for resolving dependencies.

This because foobar-1.0 and foobar-1.1 can't be installed at the same time. Instead you would have to do something like foobar-1.0 and foobar2-1.1, even though foobar2 is actually a minor upgrade to foobar.

This because the logic of most package managers balk at having two packages with the same name, but different versions, installed at the same time.


Then how can be that an incentive for Docker's adoption? Honest question; if, as you say, this is a solved problem thanks to package managers.

I can't even remember the last time I had a real dependency problem deploying an application (using Debian; and CentOS before that), other than myself not doing things right (read: installing RPMs I found online and I shouldn't install).


Well I wasn't originally speaking wrt Docker, but Docker doesn't magically lose all the hard work done by package managers. You have total access to them in your containers.


What people mean is Python (and others scripting languages) dependency hell. Unix dynamic linking very rarely has any issues as static typing makes it relatively easy to reason about stable interfaces. The raise of containers, is consequence of raise of dynamic scripting languages, and rather un-unixy and "get it ready ASAP" culture, IMO.

You can have 100 daemons written in C/C++ running on one system just fine, and that's how Unix systems used to work, but as soon as you start using software written in Python, you can't get two running system-wide, without recompiling every dependency into separate `virtualenv`. So it's easier to just throw them each in their own lightweight virtual machine, based on the recipe on how to rebuild the thing.


No, even Unix dynamic linking has problems because of symbol versioning, sonames and package managers adding a whole new policy layer to the shared library context that can lead to transactional and dependency conflicts.

Though, yes, dynamic languages do add even more concerns.


> Had their developers known what they were actually writing, perhaps we'd have a lean and mean solution that did the right thing.

I am surprised nobody mentionned nix, nixos and guix.


Good time to raise awareness for these projects, both of which solve this problem extremely well.


Can you explain what those are and what they do?


Nix and Guix are purely functional package managers, meaning that software builds are treated like a mathematical function: Input the same source code + dependencies and receive the same build as output. They have features such as reproducibile (often bit identical) builds, transactional package upgrades and rollbacks, and unprivileged package management. They solve the dynamic linking problem by allowing each package to refer precisely to the dependencies that it was built with. With this mechanism in place, it becomes very easy to use applications that require different versions of some C library, or a different Ruby/Python interpreter, or whatever else. Furthermore, it can do this without relying on a specific type of file system, and without requiring that applications be run inside containers or virtual machines. This makes it very composable and general-purpose.

https://nixos.org/

http://www.gnu.org/software/guix/


What it doesn't do: handling cpu quota on per "stack" basis, no builtin security isolation. That said, both use container technology for that.

By solving the issue at a layer below (instead of adding one like docker does) it makes things much cleaner, more powerful making obselete puppet and the like. FWIW describing a containers/vm's/os'es in guix is much more easy than using docker.

Have a look at https://github.com/NixOS/nixops too.


> Instead of building and filing away heaps of immutable (read: security nightmare) containers [...]

Is there a consensus on what is(are) the best method(s) to handle security patches automatically in Docker? For example, the official images at https://registry.hub.docker.com/ are fixed in time and you should apply security patches before using them?


The official images aren't fixed in time, assuming you're pulling using a tag e.g redis:3.0. That image may be updated at any point and should be updated with minor patches and security updates. Rather than manually apply patches, just pull the image again to get the updates. If the image hasn't been updated, complain loudly.

If you want your image to be "fixed in time", pull by digest instead.


Thank you very much


Very interesting. I'm not convinced this captures the core value of containers though. Or at least not the only core value. Calling containers an evolution of configuration management tools seems like an oversimplification just to make a point. This may be one aspect of building a micro-service driven architecture that containers make easier, but there are other very important ones. Portability comes to mind. It's not just that you can build your stack once and save it, but that you can then run that stack anywhere, and it becomes much easier to share/borrow bits and pieces of other people's stacks.


> you can then run that stack anywhere

Anywhere that runs Linux, at least.

> it becomes much easier to share/borrow bits and pieces of other people's stacks.

At the cost of not knowing what's really in them.


To a certain degree I don't want to know what's in them. If I want to add search to my stack - initially I'd rather not have to have an intimate knowledge of Elastic Search, a task queue and whatever other moving parts there are. In many cases a black box that just works would be a fantastic option.

The reason hosted services are popular is for exactly this reason.

A wide understanding of different technologies is a wonderful thing but sometimes you just need to ship.


For any 'Early coders' like my self who want to learn more about linkers and loaders based on this write up, Programming from the Ground Up by Jonathan Bartlett is a good book.


As well as Ian Lance Taylor's 20-part blog series on linkers: https://lwn.net/Articles/276782/


And John Levine's 'Linkers and Loaders'. http://www.iecc.com/linker/


Sorry, no. I don't want a dynamic linker for my software stacks. I want a complete, ready-to-deploy chunk of code, FROZEN IN TIME, that has a known and predictable state that I can trust.

If I need to apply security fixes, I'll rebuild the chunk of code, also frozen in time, and deploy.

Ideally, I want no dependencies between container and host, or container and container. Or at least I want them kept to an absolute minimum.

Even more ideally, I want isolation to be so complete that I'd be able to run my built stack 100 years from now and have it operate exactly the same as it does today. That's a bit hyperbolic, of course.

Docker is not a linker; it is a system from which you build deployable code. In fact, there's no reason why in theory you couldn't add support to deploy Windows or BSD stacks (other than the fact that Windows and BSD kernels haven't been added yet).


Shared libraries were considered a bad idea in Plan 9, and I really wish that point of view had made it into commercial Unix and Linux.


Shared libraries are a fantastic idea. Static linking wastes system resources and makes system-wide library updates problematic. Docker's approach to things is essentially a higher level form of static linking, which is to say that it's not a very good approach. It's papering over the package management problem. We need general-purpose package management systems that allow for different applications to use different versions of shared libraries without interference. Luckily, the Nix and GNU Guix projects solve this problem very well, if only they could get some more "mindshare."


There is also Gobolinux, that do so in a less stringent fashion.

This in that, as i understand it, compiling a lib with a feature enabled or disabled will produce different tree branches in NixOS/Guix, while Gobolinux will happily replace one compile with another if it is of the same lib version number.

So you can't have lib foo 1.0 with feature X enabled in one compile sitting side by side with the same foo 1.0 with feature X disabled when using Gobolinux.

Then again, Gobolinux has never been meant as a server distro...

Edit: thinking about it, Gobolinux could perhaps be extended with checksum dirs inside the version-subdir to separate different compiles. Not sure if this has been considered or attempted.


Yeah, having to rebuild every app that uses OpenSSL when a new advisory is issued... wow, that would be expensive!


Thousands of mobile app developers feel this pain now, from that particular library.

Not updating these applications is not acceptable to most organizations / device operators.

Just in case anyone thought the parent was sarcasm or theory, some refs:

http://www.digitaltrends.com/mobile/heartbleed-bug-apps-affe...

http://blog.trendmicro.com/trendlabs-security-intelligence/b...


Shared libraries are a great idea, but not when they're fixed in an environment.

Dynamic linking everywhere has been one of the JVMs greatest strengths and best selling points. Every JVM app just names its dependencies and their versions in its build file, and often overrides (upgrades) its transitive dependencies for security/bug-fixes/performance (conflicts can be resolved by the build tool) and no configuration interferes with that of any other app as there are no environments or well-known location for the dependencies (and it's only getting even better with Java 9's modules).


What's with the light-grey-text-on-white-background styling ? It may look good, but it's a pain to read.


It looking "prettier" is pretty arguable.

The body text is as close to black as it is white. #444 is borderline acceptable. #888 is absurd.


Yes, I opened it and didn't even bother trying to read before hitting the Firefox Reader View button.


What looks good about unreadable text on a blinding white page in the middle of the night (PST)? ;)

The black on light gray of old was just the right amount of contrast... TBL got it right.


That's what I came here to say. Anymore, I won't even bother to try to read an article when the site shows such contempt for its readers.


It's a very contrast of display thing I think. I've seen those sort of pages on a Macbook and they are more easily readable than they are on my machine, for instance. I have to use stylebot to 'fix' quite a few pages that commit this sin (to varying degrees). This is quite an extreme example though.


You answered your own question there. The thought stopped when they got it looking good.


This is an interesting take, but it doesn't entirely make sense. Ierymenko's 'save your work' metaphor is a little misleading, since (I certainly hope) nobody is creating docker images manually. But I like his idea that dockerfile creation, by which you set up a stack in a way that's automatically reproducible, is equivalent to the role of a linker in a compiled program.

Where he loses me is when he suggests that Puppet et al are closer to a 'pure' linker. Configuration management systems are doing the same thing as a Dockerfile: instead of setting up your XYZ stack by hand, you write a Puppet manifest that calls the modules for XYZ and sets them up the way you need. Your final result isn't a server with the XYZ stack: it's an abstracted process that will reproduce your XYZ stack. The main difference is the implementation; Docker reproduces your stack in an isolated environment, and configuration management tools reproduce your stack on an arbitrary platform.

But nobody thinks of Docker as a configuration management tool, and for the most part I don't think people even think of Docker as a competitor to configuration management. Hell, Docker is a core component of many Puppet CI workflows.

So there's something else going on here. What's the secret sauce? Is Docker just two great things (config management + virtualization) glued together so cohesively that it becomes greater than the sum of its parts?


That's a very clever metaphor for that aspect of Docker. I hadn't considered looking at it that way before.


The author writes:

"Sometimes (unless I am writing in Go) I don't want to bundle all my code together into one giant hulking binary."

I am unfamiliar with Go - can someone please offer why this technique might especially desirable/feasible with Go?


Go only supports static linking. No dynamic linking means no linking issues when deploying the same binary across a billion machines in the Googleplex.


Go is one of the few gc'ed languages that supports static linking[0]. As a result it can solve the same problems as Java / Python / etc but makes distribution significantly easier.

[0] There's a whole bunch (Haskell, OCaml, nim) but none of them have the same popularity and corporate backing that Go has.


Is "gerschnorvels" really a word in any language?


My first though was that it was some sort of compound word.


According to this article, Docker is a way to save your work after configuring your server. Can't I do that with

    rsync -a /etc /whatever backupserver:/backups/server1

?


First off,

    rsync -a / backupserver:/backups/server1
would be a better comparison; full server state never properly stays in /etc.

Do you actually do that though? Multiple times a day? How easy is it to roll back to a previous state?

Given Dockerfiles, a better comparison would be rsnapshot, since intermediate steps are important, and maybe that last "yum upgrade/apt-get update/whatever" broke something (on dev, of course) and you want to roll back.

How do you compare two related file system images? Is there something more advanced than "diff -u"? How does that handle binaries? Will that map backwards and say what command resulted in changed binaries? Can I submit a code review for the changes between the two states like I could for a Dockerfile which is plain text?

Docker isn't quite a configuration management system like Chef or Puppet, but there's a lot of overlap.


That won't get your application and its dependencies installed and running.


> perhaps some quantum superposition of those that has yielded a New Thing.

Ugh. That's not what quantum superposition means.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: