Hacker News new | past | comments | ask | show | jobs | submit login

Recent versions (1.11+) have had various race conditions around stdio handling. 1.12.4 (fingers-crossed) should resolves all the issues we've come up to at this point (hopefully that's it).

The rest of the lockup issues have been related to kernel bugs which would also be observed with other tools. One major issue we've had (which is really multiple kernel issues with multiple causes) is netlink (a kernel interface) not responding and the container's mutex is held forever while we wait on netlink. The locking up part should be resolved in 1.12.4 for this as well, where it uses a timeout on the netlink socket... still going to have errors from this since if netlink isn't responding there's something else weird happening, but at least the container mutex can be released you can interact with the daemon.

In 1.14 we should have a lock-free container object so that any new issues that come up in this regard are isolated to a particular thread. This makes detecting that there is an issue harder, but something we could potentially track and even report on.

note that the daemon isn't really locking up fully, just that any command that tries to lock the container's mutex will get stuck, which includes listing it in `docker ps`, start/stop/restart/etc on that particular container.




The netlink not responding issue sounds like the one we've been battling. Now we just have to wait for a CoreOS release with 1.12.4.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: