Images may also be stored in one of the available private registries. <shameless plug> As the Co-Founder, I am partial to Quay.io [https://quay.io], which in my not so humble opinion has the best features, reliability, and support for businesses and organizations, but there are other options if for some reason Quay.io doesn't meet your needs. For those who prefer to self-host, we've also got an enterprise option, which brings all of the index and registry goodness behind your firewall. </shameless plug>
That said, we love the Docker ecosystem and way of doing things. A sibling comment mentioned how complicated Docker is, but I think when you realize that they are trying to offer DVCS like features and paradigms, you will realize that it is complicated for a reason. We all thought git was complicated at first as well.
Regarding the shameless product plug: Quay looks like a very cool product. Love the history and diff views. Glad to see pricing mimics Github model "pay for private, but public is free and unlimited". Awesome!
Regarding complexity in Docker: So here's the thing, people wanted npm, but they got git. How can we bridge the gap between a easy to use, out of your way package manager and a fully featured DVCS experience? I love the idea of merging them, but IMO, need to make the semantic model more accessible. Specifically, need to ensure concepts are properly orthogonal, not overloaded, and unambiguously defined. Might be too late to scrub this aspect though.
Some other general problems are things like checksums, fingerprints, image signing, etc. How to verify the validity of an image?
I will speak to the issues about which I am familiar.
Checksums are currently uploaded by the client and verified by the registry. Signing is on the roadmap[1]. I'm not sure what you mean by a fingerprint, would this be analogous to an SSH host key? What function would it serve if you already had a signature that only you could reproduce?
A fingerprint is just a small, easy to recognize string that identifies a pub key of a trusted individual. It's helpful with recognizing the "trustfulness" of a release. More important than the fingerprint though is the pub key of the release engineer, and a web of trust to verify that key.
The process that is the gold standard for this, IMO, is what's used over at Apache Software Foundation.
For those who aren't familiar with the topic, I'll illustrate with a release I made a few years ago, here's the release artifacts for Lucene.Net 2.9.2:
You'll find a .zip, .asc, .md5, and .sha1 file. The .zip is the release artifact. The MD5 and SHA1 are just two different hashes to prove that the package you got is not corrupt and is what it should be, similar to a checksum (note: these hashes should also be signed, IMO). The .asc is a signature for the release.
A signature is made from the release engineer's key pair and the release artifact. gpg can take the .asc and the .zip as inputs and tell you what pub key made the signature (and it reports it as a short fingerprint). If you've imported a trusted key into gpg, it will tell you that it's a verified and trusted key, and tell you who it was.
If you pull all these files together and verify them, this should be your result:
$ curl -sSL http://people.apache.org/\~thoward/F1AADDE6.asc | gpg --import
gpg: key F1AADDE6: public key "Troy Howard (CODE SIGNING KEY) <thoward@apache.org>" imported
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
$ gpg --verify ~/Downloads/Apache-Lucene.Net-2.9.2-incubating.src.zip.asc ~/Downloads/Apache-Lucene.Net-2.9.2-incubating.src.zip
gpg: Signature made Fri Feb 25 09:33:40 2011 PST using RSA key ID F1AADDE6
gpg: Good signature from "Troy Howard (CODE SIGNING KEY) <thoward@apache.org>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 062B 4DAF 06F8 61CD 2E71 E40B 8EAA A8A8 F1AA DDE6
Anything else, and you should not use the release.
A good package and release system, like Docker Index/Registry should build these verifications in automatically. A tool like Quay can host pub keys, and can automatically sign images. The Docker Index API can be extended slightly to support fetching the signature. Docker itself could be extended to support "verified" mode, where it refuses to run images that don't have a signature, or fail key verification from a trusted set of keys.
(1) Who pays the bills for the public registry (docker.io) and why?
(2) Is there a future possibility similar to how so many Ruby projects fall down when GitHub goes down due to not 'pit-of-success-ing'[1] a copy of everything locally?
Part of HN's convention, especially since there are often more than one link, is to use end-noted links rather than inline ones. This makes the prose easier to read, and still makes it easy to identify and annotate the importance of links.
I suppose it's to make it clear where links go, and discourage trolling... but browsers failing to show you where links go is a failing of browser chrome, and people who are concerned about that can install extensions that make link destinations more obvious.
1. a Docker registry is like (or maybe just is) an S3 bucket: a dumb, private object-store.
2. A docker index is a database-backed web service with a REST API, that clients talk to.
3. The web service can generate temporary tokens that let you GET things from, and PUT things in, the bucket.
4. The web service's database has a model of an image "project" similar to a Git repository: version history, branches, and other metadata.
5. The bucket contains the image repository's "object pool." Just like git, when you pull a branch, the client downloads all the "objects" required to check out that branch.
Remember the registry deals with the actual data, and delegates auth and other stuff to an index. The docker-registry has a dummy implementation of an index that has no notion of authentication or authorization, so anything you push to your private registry is really public if you don't secure it with some other method.
For me, I wanted to have a real private registry with access control limited to my team. It turned out to not be too difficult to make our own registry+index implementation that is private by default. It has a basic web interface too. I've open sourced development of it https://github.com/jimrhoskins/stevedore . It's still really rough, especially on the web interface stuff, but for push/pull operations with required authentication, it does the job now.
You can fork the registry code [1]. We work in a regulated industry (healthcare) and need excellent access controls and auditing/logging. Forking the registry and rolling our own is possible, but not something we want to spend time on. So we looked around for private registries. We went with Quay [2] and have been really happy. They're responsive, performant, and on top of enterprise-level requirements.
I've started using quay.io to store my docker images. I have not used the service heavily (because I'm still building out my Dockerfiles), but what I have used has been great.
Docker.io is great to use to store images, unless you don't want them public. I have proprietary apps loaded on my images so making them public is not an option.
So far Docker is set to solve my scalability problem that i've been seeking for the past year. Since VMs are not ideal, I start with a farm of bare ubuntu servers and scale out to VM's in the cloud if needed. With Docker I can configure once and deploy to all of these nodes no matter how they were built. I stopped using Chef when I realized I could accomplish my goals with a fraction of the complexity and effort.
Great writeup. I always appreciate somebody else doing a down-to-earth overview of something complicated. This post makes me wonder if some sort of flowchart or UML diagram of the docker system components wouldn't be a useful thing.
That said, we love the Docker ecosystem and way of doing things. A sibling comment mentioned how complicated Docker is, but I think when you realize that they are trying to offer DVCS like features and paradigms, you will realize that it is complicated for a reason. We all thought git was complicated at first as well.