EDIT: incredibly, I'm seeing people use this benchmark to argue both 1) that Docker is bad for performance and 2) that Docker is magically faster (and probably does something stupid to be faster). Talk about an illustration of what's wrong with benchmarks! Neither of these statements are correct. It's possible to configure Docker for negligible performance overhead. Docker definitely does not do magic tricks to make things faster. Mostly it sets up the storage backend of your choice and then gets out of the way. Whatever you're seeing is probably a result of your particular host configuration combined with your particular storage backend configuration.
My initial comment below is a detailed response to the "Docker is terrible for performance" camp.
TLDR: "read the Docker manual before trying to benchmark it".
It looks like the guy didn't bother to mark /var/lib/postgres as a volume, which is the recommended way of running a database in Docker. It can be fixed with the following line in his Dockerfile:
VOLUME /var/lib/postgresql
This will make sure his persistent (and IO-critical) data is bind-mounted from the underlying host filesystem, instead of depending on Docker's copy-on-write backend (which could be btrfs, aufs or lvm/devmapper depending on his configuration).
Read the docs, fix your configuration, and try again. Hint: the IO overhead will be negligible. Also: it will end up being a benchmark of your underlying host storage, not of Docker.
It's worth mentioning when you benchmark Docker you are never really benchmarking Docker unless what you are doing is measuring setup/teardown of containers.
What you -are- benchmarking is namespaces, cgroups, CoW filesystems and your underlying hardware.
As Solomon has already pointed out as soon as you use a bind mounted volume in Docker parlance you are simply benchmarking your host Linux kernel + host filesystem + host hardware, nothing more. I am unsure of whether Docker is now configuring the cgroups cfq or blkio controllers yet but that could also play a part.
The TLDR is this: Docker doesn't have an impact on performance because it's just glue code, none of Docker itself sits in any of the hot paths. You are better off benchmarking the individual components in the actual hot path. Also worth nothing that compute benchmarks will be absolutely worthless as they -will- be 1:1 native performance (because there are actually no additional moving parts).
This comment deserves its own post - and shouting. This is a rather profound point which is lost on almost everyone and makes 99% of docker performance discussions a waste of time for people who should know better.
Blog posts like this one don't start a discussion, don't help anyone, they just get used in random FUD discussions from any and all points of view.
Well, Docker is big news and people are interested in what the performance impact of "using docker" is.
Whether it's accurate or not, people associate Docker with the functionality if facilitates. (i.e., cgroups, namespaces, etc.) so i think it's valid to show the performance impact of running an application from within docker.
Based on my data, they're not 1:1 native performance. I suggest you consider the data before dismissing it.
Thanks, I have to admit I didn't think of that (saw VOLUME commented out in your Dockerfile, but didn't think that you might have moved it to a run script).
My point remains that Docker benchmarks are especially easy to spin any way you want, since usually they are really benchmarks of the underlying system configuration (with some parts glued together by Docker, but most parts outside of its control).
The big thing to take away from this is not "docker causes a performance change", it's that "there is some way to configure docker that affects performance". Obviously the expected behavior is that docker should have marginal performance impact, and any deviation from that is undesirable.
The numbers showing here are about what I would expect if something in the storage stack is doing more caching than normal. This likely has hidden costs, either in memory usage or robustness. It's worth digging down to find out what is happening, and how to validate docker configurations.
@shykes - what is the recommended way to manage logs? In this case, postgres would generate logs that need to be rotated (with a sighup to parent process). I'm not sure if I can use the host's logrotate to do that.
Currently I build containers with supervisord as pid 1 and logrotate running inside the container,but with logs being saved to a bind mounted volume.
Is this correct?
P.S. there are no docs, blog posts or articles on this topic. I'm a little puzzled if people are living with ephemeral logs.
There are actually a lot of blog posts and docs on this subject. Previous posters already explained some of the solutions, but I was just searching on this topic this morning so I thought I would share what I found:
It seems that the recently recommended setup is to create a container specifically to host volumes for both logs and other persistent data (ie database files). You then connect each container that needs to write to those volumes using the volumes-from directive. This is explained in blog posts and included in the documentation.
You can run your docker container supervised with tools like systemd, runit etc - and have stderr/stdout (and signals) forwarded on from the container process to the supervisor - from then on you do what you would normally do if it wasn't running in docker.
Other approaches are to have the app(s) in container log to a VOLUME that is bind-mounted in from the underlying host (where you can access them directly) - yet another approach is to bind mount in syslog or other tools into the container and allow the process(s) inside the container to log to it. All work well.
Docker captures stderr; you can configure Postgres to simply use stderr logging, and then capture it in the host system.
Another option is syslog, which Postgres also supports, with which you would log to the host's syslog daemon. (You can mount the host's /dev/log if you don't want to deal with setting up the networking.)
If you're sending logs out to the host's syslog, then the log creator is your syslog daemon.
Also, processes in containers exist as processes on the host (they have different PIDs inside and outside of the container due to PID namespacing); so logrotate should be able to send signals to containerized processes.
hey thanks for that comment - could you talk about it in a slight more detail.
I have rsyslog on the host and I'm sending logs to host syslog through bind mounting of /dev/log . Is this what you meant by sending out logs to host ? I'm having a lot of trouble figuring out how to do it any other way.
Could you also confirm that rsyslog on host will be able to send appropriate signals to the container's syslog ... which forwards it to the containerized process ? I'm unable to find documentation in rsyslog (or syslog-ng) that talks about this behavior so I'm not sure.
Sighup is needed when processes (such as Postgres) write directly to log files. These files are open, and when you want to rotate the file, you have to tell the process to close the files so that it will start writing to a new file.
If you tell Postgres to log to rsyslog (or any other syslog daemon), the log data will be sent to rsyslog via UDP, TCP or a Unix socket. Postgres itself will not have any files open, so there is no need to sighup it.
You will have to sighup rsyslog in the host system, though.
If you bind mount /dev/log of the host into your container, you only run rsyslogd on the host. There's no need to send any SIGHUP to anything in the container.
Even ignoring performance, putting persistent data (like databases) on volumes is a best practice since the normal filesystem inside the container is ephemeral.
This kind of benchmarking should really be run on physical hardware where you know that the underlying resources are not being switched out. The repeated running somewhat mitigates this, but when you don't control the hardware, decisions are being made that are beyond your visibility.
The almost 50% more context switches for normal Postgres is very telling. If those had any disk implication which is quite possible with the 512Mb of RAM, it could easily explain the discrepancy.
You are running a benchmark in a virtualized environment with no guarantees about access to the underlying hardware from one moment to the next.
It's also the smallest instance type. At least if you had chosen the largest you'd possibly have the entire box. With the smallest instance type however anyone with a larger instance than you is going to steal cycles and I/O away the second they have any load.
What makes people think benchmarking on virtualized hardware is at all worthwhile? (Serious question) It's like writing a blog post about which way the wind was blowing for the last five minutes. This is all nonsense.
It is not nonsense because people really run their applications (e.g. PostgreSQL) in virtualized environment these days (i.e. AWS, DO, Linode, Google Cloud etc.) People still need to estimate the performance they will have under these situations - and these benchmarks are relevant to them.
If this is the Dockerfile [1] and he's not mounting a volume in, then he's running the Postgres data dir on a btrfs snapshot vs. probably ext4 w/o Docker
I pray to god nobody uses docker and postgres or any other database until this question is resolved. The most rational explanation for why postgres would be 'faster' under docker virtualization than it is raw-on-the-operating-system would be that libcontainer/docker are skipping over some fundamental atomicity guarantee.
Docker is just glue code. That's all it is. It sets up your bind mounts, cgroups and namespaces. It then execs the process and gtfos. The way the kernel is dealing with the namespace'd postgres is the difference.
I'd say it's much more likely the pre-configured docker digital ocean makes available is on hardware that isn't as heavily used as a regular old postgress configuration.
That is, there's more load in general on the postgress machine rather than the docker machine, you don't get perfect isolation in those environments.
First of all. Docker is -not- virtualisation. It's -containerization-. Effectively namespacing and little else.
In Linux namespaces are hierarchal, I can arrange them in a tree. Where at the top namespace A can have 2 sub-namespaces B and C. A can see the process table of B and C, but B and C can only see their respective process tables.
This is called pid name spacing.
Full containerisation on Linux actually requires a bunch of different namespacing, firstly the pid namespaces above but also uids, network (yes, in Docker your container has a fully namespaced network stack in including it's own (multiple!) route tables and adapters) and also device/filesystem etc.
The key take away here is NOTHING is being virtualised, we are simply only showing a section of the system to each process depending on it's position in the namespace hierarchy. What this means is Docker (well really Linux.. Docker is just glue that calls this functionality up) is just.. Linux. Nothing fancy and definitely nothing KVM/Xen/VMWare would be able to influence.
Until shown otherwise, I am partial to the idea that this is just a difference between PG on Ubuntu vs PG on CentOS. It's possible docker is simply a red herring, and that there are too many confounding variables for any of these benchmarks to be meaningful. I would be interested to see more rigorous benchmarks of virtualized environments though.
Just as an uninformed guess, wouldn't docker effectively be just moving the postgres process to another cgroup task group, where it might be scheduled more fairly when the benchmark process is hogging the system?
My initial comment below is a detailed response to the "Docker is terrible for performance" camp.
TLDR: "read the Docker manual before trying to benchmark it".
It looks like the guy didn't bother to mark /var/lib/postgres as a volume, which is the recommended way of running a database in Docker. It can be fixed with the following line in his Dockerfile:
This will make sure his persistent (and IO-critical) data is bind-mounted from the underlying host filesystem, instead of depending on Docker's copy-on-write backend (which could be btrfs, aufs or lvm/devmapper depending on his configuration).Read the docs, fix your configuration, and try again. Hint: the IO overhead will be negligible. Also: it will end up being a benchmark of your underlying host storage, not of Docker.