Hacker News new | past | comments | ask | show | jobs | submit login
How to own your own Docker Registry address (httptoolkit.com)
205 points by thunderbong on March 18, 2023 | hide | past | favorite | 65 comments



Hosting a forwarding / redirect server instead of actually hosting images is probably a decent idea.

The K8s proxy is redirecting from only hosting on GCR to community-owned registries - https://kubernetes.io/blog/2023/03/10/image-registry-redirec...

You can view the code here - https://github.com/kubernetes/registry.k8s.io

But because everyone is already pointing at gcr.io (just like many openfaas users point at docker.io/) - they're having to do a huge campaign to announce the new URL - the same would apply with the author's solution here.

I wrote some automation for hosting (not redirects) in arkade with the OSS registry - Get a TLS-enabled Docker registry in 5 minutes - https://blog.alexellis.io/get-a-tls-enabled-docker-registry-...

The registry is also something you can run on a VM if you so wish, and have act as a pull through cache.

Apart from reliability - GitHub's container registry is the current next best option - but we have to ask ourselves, what happens when they start charging or the outages start to last longer or are more frequent than 1-2 times per week as we've seen in Q1 2023.


> But because everyone is already pointing at gcr.io

Did people forget about Google Code repositories already?


I'm the annoying kind that always has answers to people's questions. Want me to argue for everything liberal and against everything conservative? Easy. For everything conservative, and against everything liberal? Also easy.

I cannot for the life of me understand why people use google products like google isn't going to shut them down whenever they want to. Stadia was the most amazing example.

"Surely, google wouldn't do that. Stadia customers are paying customers."

3 years from launch to shut down announcement. 3 months from shutdown announcement to actually shutting down and losing all your investment in Stadia.

I don't understand how long google can keep getting away with this.


People didn't lose all their investment in Stadia. They lost save files, but not monetary investment - Google refunded all game/software purchases and all Google Store hardware purchases.


People may just be avoiding google stuff where possible, and not really forgetting. A lot of bad stigma there, at least on hackernews from what I've read.


Thanks for the write up. I was planning to look into this on the weekend and now I don’t have to!

A slightly fancier version of this concept is the Kubernetes official registry, registry.k8s.io .

https://github.com/kubernetes/registry.k8s.io

Their registry forwards to a container registry by default. But if it detects if the request is coming from inside AWS or Google Cloud, it forwards requests for blobs to a S3 or GCS bucket near the requester. This saves money on cloud egress charges.


I find it a bit sad that the only two options we have is either using central services or everyone manages their own infrastructure (at the very least their own domain name).

I would've hoped that at this point we would have a true decentral solution for this sort of thing. Despite all the blockchain/dapp/web3 hype for many years they have no practical solution for anything.

We have all the pieces it seems, torrent and dht/magnet links work, ipfs works, web of trust works. And yet we don't seem to manage to work collectively on true decentral solutions to the issue of centralization of critical internet infrastructure. Why can't we all work collectively together and share resources so we aren't dependent on the whims of some shaky businesses, we are all constantly at risk of them turning on us for profit.


The concept that escapes you has a name: tragedy of the commons [1]. Yes, it is frustrating and depressing, in the long run a feeling of hopelessness settles in, that we are unable to share resources: we have failed to do so with the bodies of dead dinosaurs, creating authoritarian glorified gas stations as a side-effect, and we have failed to do so with more ethereal substances such as compute.

All the cutesy technowords-of-the-day, blockchain/dapp/web3/torrent/magnet links, are just a bandage over a greater point: ever since the atomic bomb, once we became able to destroy the planet, we needed to become a new species, evolve our cone of care. We were unable to do so and hopefully we will be extinct before we destroy the planet, let some other species have their try in a few million years, before the sun runs out.

[1] https://en.wikipedia.org/wiki/Tragedy_of_the_commons


> We were unable to do so and hopefully we will be extinct before we destroy the planet, let some other species have their try in a few million years, before the sun runs out.

Eh, we can take solace in the fact we still don’t have the capability to destroy the planet. Vastly alter the current environment, cause mass extinctions, and irradiate the planets surface, sure we can do those things. But the planet won’t care, and life, well, life finds a way.


I find it a bit ironic that you link Wikipedia to make that point. The greatest encyclopedia in the world, a great demonstration of the potential of collectivist projects. I still remember when it was launched and everyone thought it was crazy and impossible and laughed at it.

I would also draw a distinction between web3 and torrent technologies. Torrents work great, and it doesn't even give its users a monetary incentive to seed, people do it anyway. But web3 makes everything transactional and builds everything around individualist monetary incentives, and yet no useful application was ever (so far) conceived by it. So perhaps torrents and the wikipedia (and similar projects) work because it doesn't built everything around the free market libertarian fever dream.


Sure, Wikipedia is great, like finding moissanites [1] in the mud: great, but you are still in mud.

Perhaps I am too doomy, but as we see every day, and now with the GPT advances, almost every hour, a bridge being built between the information space, the decision-making space, and the 3D space of the physical world, and this bridge being restricted to only certain entities, it makes one wonder: would a Wikipedia even be possible today?

[1] Diamonds are a De Beers invention and a monopolistic violent endeavour, moissanites are cheaper, no artificial scarcity, and better looking https://en.wikipedia.org/wiki/Moissanite


I'm not sure I follow your reasoning.

Wikipedia works thanks donations.

Voluntary donations are the free market libertarian equivalent of involuntary taxation.

Web3 doesn't work because there's not enough value being provided, party because paying micro transactions is too unfriendly. That's a hard problem and it's lack of success is a clear demonstration of the market working as intended and not rewarding something useless.

The socialist equivalent is a government owned web3 which we all have to pay with taxes and we're increasingly close to getting this.


People volunteer on Wikipedia, they write and edit articles, they do content and user moderation, etc. Everyone can edit Wikipedia, everyone can make a Wikipedia account, there are 42 million registered accounts, nobody is working and spending their time on Wikipedia for monetary gain, how does any of that possibly work and work so incredibly well? According to everything libertarians believe this should be completely impossible and yet it works, because libertarian, free-market and neoliberal ideology is WRONG.



What's the incentive for people to run a node in a decentralised network though? It will always end up being abused and misused, and be a negative influence on one's time trying to deal with that.


If it is serving just the images you use the harm is minimal vs benefit of resilience.

But yeah, any system where you store blobs of other people that you yourself don't want is potential liability


That's indeed sad because docker images are content addressed.

Image registry over ipfs would solve the problem of decoupling who actually stores the image from the process of discovering the locations



Because frankly in corporate network I want to add rule to proxy to allow access to this and that container registry and nothing else for security reasons.

"Just" docker registry proxy that had torrent support would be fine enough solution for the distribution. But good luck convincing anyone in ivory tower of security that opening some random ports to entire of the internet is a good idea


The problem here is everyone relying on Docker to foot an expensive bill for free forever. If it was always for a few, incentives would be more aligned and this blowup wouldn't happen. But as it is, it's a bit inevitable.


> The problem here is everyone relying on Docker to foot an expensive bill for free forever.

Tragedy of the commons at its finest.


Yeah, i kinda hoped someone would start building building something which would be automatic paying with a blockchain. Like when you upload a docker image, that will automatically pay a few cent or dollar for that storage which will be active as long as the money does not dry up.

Sadly that ship has sailed, until a dollar/government backed blockchain which allows to do such things will pop up. Which won't happen i think


> In their updated policy, it appears they now won't remove any existing images, but projects who don't pay up will not be able to publish any new images

This is not correct. It's the "organization" features which are going away. That is the feature which lets you create teams, add other users to those teams, and grant teams access to push images and access private repositories. Multiple maintainers can still collaborate on publishing new images through use of access tokens which grant access to publish those images. It's kind of a hack, but it works. You would typically use these access tokens with automated CI tools anyway. This will require converting the organization account to a personal user (non-org) account. (Interesting note/disclosure: I was the engineer who first implemented the feature of converting a personal user account into an organization account some time around 2014/2015, but I no longer work there.)

For open source projects which are not part of the Docker Official Images (the "library" images [1]), they announced that such projects can apply to the Docker-Sponsored Open Source Program [2].

I would also heed the warning from the author of this article:

> Self-hosting a registry is not free, and it's more work than it sounds: it's a proper piece of infrastructure, and comes with all the obligations that implies, from monitoring to promptly applying security updates to load & disk-space management. Nobody (let alone tiny projects like these) wants this job.

Having most container images hosted by a handful of centralized registries has its problems, as noted, but so does an alternative scenario where multiple projects which decided to go self-hosted eventually lack the resources to continue doing so for their legacy users. Though, I suppose the nice thing about container images is that you can always pull and push them somewhere else to keep around indefinitely.

[1] https://hub.docker.com/u/library [2] https://www.docker.com/community/open-source/application/


The move does show, though, that Docker isn't shy about changing existing terms. So there's merit for some projects to take control of their namespace. It doesn't seem inconceivable that at some point in the future, Docker will say that exchanging tokens in a personal user account to enable "hacky" org type features is a ToS violation.


> This will require converting the organization account to a personal user (non-org) account.

I don't think this is possible.


Good to see you around, hope you are well


Always found Docker merely caching layers/images as files named after opaque hashes in the file system un-Unixy. And it's also uneconomical in terms of local disk space (image dir building up) and network usage (images pulled to every individual host). Why not use clear names and use eg BitTorrent? Why tie this to a registry service over IP in the first place?


Docker layers are content-addressable. So the hashes are not entirely opaque. They are a direct result of what’s inside. Two images (mostly) share the same layers? No disk space wasted whatsoever.

Sure you could implement a finer-grained deduplication or transfer mechanism, but I doubt this would scale as well. Many large image layers consist of lots and lots of small files. The overhead would be tremendous.


The local storage is mostly a solved problem with hard links. Any modern file system (I.e. not NTFS) can have arbitrarily many file paths that refer to the same underlying file, with no more overhead than a normal file system.


I was referring to the network transfer process concerning the overhead of single-file transfers.


The comment to which you were replying mentioned both the excessive local disk usage and the excessive network transfer, and so your comment appeared to apply to both portions. This is why I started my comment by explicitly restricting it to the case of local disk usage.


For hard links to work you still need to know that the brand new layer you just downloaded is same as something you already have, i.e. running a deduplication step.

How? Well, the most simple way is compute the digest of the content and look it up, oh wait :thinking:


I’m not sure what point you’re trying to make. Are you assuming that a layer would be transferred in its entirety, even in cases where the majority of the contents are already available locally? The purpose of bringing up hard links was to state that when de-duplication is done at a per-file granularity rather than a per-layer granularity, it doesn’t introduce a ru time overhead.


AppFS [0] does deduplication at the file level and it works well for me.

[0] https://AppFS.net


About time someone realized that re-centralizing back to GitHub’s services isn’t a good idea in the long run as I said before [0] during the recent ‘Dockerpocalypse’ thread.

Perhaps there is some glimmer of hope for self-hosting after all.

[0] https://news.ycombinator.com/item?id=35174819


It's been a while since I've looked into self-hosted alternatives, do you happen to have any concrete suggestions applicable to solo developers or small teams with limited time and budgets (i.e. not self-hosted GitLab)?

The last time I've looked at self-hosted CI/CD, Concourse stood out as one of the more promising options.

As for code and container registry - Gitea? It seems like it has an integrated container registry now, so that's a plus.


If you're just looking for docker, the official docker registry[0] is quite literally just a docker container. The only problem is that authentication is very much geared around being totally private (as in, they recommend you just throw nginx in front of it). Couldn't find much on a "read-only" version of that.

GitLab is an overbloated mess that you can't really justify unless you have organization-style funding/tax-writeoffs for the server (at which point it's easily the best choice). It expects CI/CD to exist for all projects, even though it can run without it, you'll be missing quite a number of features (the main example is that GitLab demands release builds to be generated through CI unless you want to manipulate the API with curl on your dev machine, vis-a-vis uploading things). It needs a somewhat beefy server unless you go out of your way to downtune the entire thing (which requires quite a bit of configuration), a 5$ VPS will not suffice.

Gitea mostly rocks and in my experience runs on even that 5$ VPS, but it does not ship with any CI/CD by design. They do have a list of external services[1] that can provide CI that can integrate with their software, so you can have CI/CD. Personally I'd recommend Gitea if you're looking to selfhost.

[0]: https://docs.docker.com/registry/

[1]: https://gitea.com/gitea/awesome-gitea#user-content-devops


> Gitea mostly rocks and in my experience runs on even that 5$ VPS, but it does not ship with any CI/CD by design. They do have a list of external services[1] that can provide CI that can integrate with their software, so you can have CI/CD. Personally I'd recommend Gitea if you're looking to selfhost.

Currently using Gitea + Sonatype Nexus + Drone CI which has worked nicely for my own needs, after previously running self-hosted GitLab for a few years, but finding the updates to be a bit problematic: https://blog.kronis.dev/articles/goodbye-gitlab-hello-gitea-...

That said, Woodpecker might be a more open CI offering, licensing wise and works similarly to Drone.

But personally I wouldn't judge others for picking whatever else they are familiar with and what works for them, even if that choice would be Jenkins or something like that.


JFrog's container registry is a good self-host-able option. There are others too but self hosting is never easy


It’s just a game of hot potato with platforms that are willing to foot the bill for the bandwidth and storage for some other business gain.

Self-hosting your own private registry is easy and cheap but for a public registry you’re essentially writing a blank check letting randos across the internet pull your image a million times from their CI pipeline and cost you egress fees.


> To dig into this traffic, the easiest option is to use an HTTP-debugging tool (such as HTTP Toolkit) to see the raw interactions, and configure Docker to use this as your HTTP proxy (docs here) and trust the CA certificate (here).

Probably not surprising given that this is the blog of HTTP Toolkit, but instead of debugging the HTTP requests, they could have gotten much of the same information by reading the introduction of the API docs (https://docs.docker.com/registry/spec/api/#overview).


Well the implication is that:

  - there are such docs
  - you can find them
  - they won't lie to you
  - they'll be enough to give you a picture of how everything works
It's great that Docker fits this criteria, but in general tools that let you cut out a few of those steps and inspect how the actual thing is running live are also nice.

Ideally, consider using both: trust the docs, but validate regardless to have more confidence.


Excellent writeup, timely and pragmatic solution to a real problem faced by many; bravo!


Nice! I do something similar with my podcast's RSS feed. I just put BunnyCDN in front of my podcast host's feed, create a custom domain for the CDN, and then distribute my domain instead of the vendor-locked RSS feed the podcast host provides. I've switched podcast hosts, and there's zero impact on listeners.


I want to see a native service discovery protocol for OCI registries.

HashiCorp uses a service discovery protocol for Terraform: https://developer.hashicorp.com/terraform/internals/remote-s...

It allows domain owners to use a "pretty" name for their Terraform services (e.g. terraform.ycombinator.com) while pointing to a different host and path (e.g. an S3 bucket).

Decoupling the "root" of a registry from where its API is implemented is a great layer of indirection I wish existed in the container image registry ecosystem without speciaised server software to perform API-aware redirects.


Been a fan of CapRover for its ability to create a self-hosted private registry (HTTPS) at the click of a button, besides swarm/cluster support, and multiple deployment methods.

https://caprover.com


Which One-Click App are you using? I looked over their list, but couldn't find the Docker Registry. Thanks!


You don't need a separate one-click app for a Docker Registry. CapRover can create a managed registry (see under Cluster menu option).

https://caprover.com/docs/app-scaling-and-cluster.html#setup...


Fantastic, thanks!


> Which One-Click App are you using? I looked over their list, but couldn't find the Docker Registry. Thanks!

I'm not sure about them, but Nexus might fit the bill from that list: https://github.com/caprover/one-click-apps/blob/master/publi...

It's what I'm using for myself (though with just Docker Swarm + Apache2, without Caprover) and has worked well for years.


Thank you!


I recently set up a private registry for some custom GUI apps- it really is quite a bit of work just to set up and debug some less obvious issues (e.g. pushing large images).

I'm using official registry, docker-registry-ui for auth and web-based repository browser (which essentially proxies the docker API requests to registry, while providing web access to list the images and nginx-proxy-manager in front of all this (needed as the NAS I'm running this on hosts some other stuff as well).


The possible problem I see with this is:

Fact: Docker wants to recover the cost of hosting all those images. There is a cost to storing each image. (is that right?)

Fact: Docker can easily change their API and how they handle redirects to ensure this scheme does not work now and does not work in the future.

If the goal is survive the next Dockerpocalypse this seems unlikely to do it. Or perhaps I misunderstand.


They can't, if they do its going to be a very hib change that may break compatibility with other OCI runners like podman and K8S.


Big*


This change will certainly impact many, but Open Source projects are still free.

Is Docker doing such a poor job that this is still misunderstood?


> Open Source projects are still free

Wholly non-commercial open source projects. Meaning, for example, even if you sell consulting services for your project on the side, you don't qualify. And it's re-evaluated every year.


I think that some people don't like bureaucracy and would rather move than apply for "big enough open source enough project yada-yada".


This is nice, but having a way to just publish blobs to raw storage would likely be cheaper.


some offerings allow you to set a custom domain


Free offerings?


Not that i'm aware of but you still gain the name ownership and decoupling from the service provider.


Did you just discover how registries works or something? Docker hub is just a public registry, anyone can have their own private registry and it is what everybody usually does when you want your own kubernetes. Harbor is the usual free/opensource usually you use, or you can use the simple one that docker provides and run it locally. Also they are other public registries like quay.io or the GitHub one.


I think you misunderstood the article. This is not about running a private registry. Running a public registry (e.g. for an open-source project) exposes you to bandwidth costs and such.

The solution in the article allows such projects to use their own domain without incurring any of the maintenance burden of running a public registry.

One of the problems for open-source projects currently is that any move (even to another provider) causes a lot of disruption, because all users of the project need to update the address they pull from. This solves that.


>Did you just discover how registries works or something?

That feels a bit snarky. The article opens with that path, but concludes it's heavier than they want. Then suggests an option that's lighter/thinner and meets their needs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: