Containers Lie 🤫 | A Deep Dive into Docker-Shim and a Real On-Call Fix
In this BLOG I will share an incident that taught me how containers really work under the hood.
Production Down –
Once I received production website down alert for one of my customer.
As I checked the website was giving 502

website-down
Initial Checks –
I immediately logged in to the production host to investigate.
The first thing I checked was the container that was running the production WordPress website.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7f3a9c1b2d91 wordpress:6.4-apache “docker-entrypoint.s…” 3 days ago Up 3 days 0.0.0.0:80->80/tcp prod-wordpress
At first glance, everything looked healthy the container was up there were no restarts Ports were mapped correctly.
I checked and relaoded the webiste again just to find out the website was still down.
Next Move –
Next I checked the POD utilisation
$ docker stats prod-wordpress
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
7f3a9c1b2d91 prod-wordpress 0.00% 110MiB / 2GiB 5.37% 0B / 0B 0B / 0B 1
Red flags:
- 0% CPU
- 0 network traffic
- Only 1 PID
Website was down but container showed “Up”
This clearly meant -> The container was running, but not serving any traffic.
Quick Fix –
My Attempt of Quick Fix (The first thought of every DevOps) –
I tried restarting the container.
$ docker restart prod-wordpress
Guess what it was hung
Next tried to stop the container
$ docker stop prod-wordpress
It also hung, now it was a panic moment for me.
Even a force kill didn’t work:
$ docker kill prod-wordpress
I thought of restarting the Docker daemon (systemctl restart docker) might have been the easiest fix. But this was production all conatiners would be impacted and there would be unplanned downtime.
At this point of time I was pretty clueless. Then I went to seek help with online.
Docker-Shim
While researching similar incidents, I came to know about docker shim which is present in old docker versions.
docker-shim
Older Docker versions used a helper process called docker-shim, which acted as an intermediate between Docker and the container runtime.
Each container had its own docker-shim process.
So I checked:
$ ps aux | grep docker-shim | grep prod-wordpress
root 24791 0.0 0.1 123456 3456 ? Sl Feb08 0:01 docker-shim -namespace moby -id 7f3a9c1b2d91 -address /run/docker/libcontainerd/docker-containerd.sock
Findings –
docker-shim PID: 24791
It was the parent process of the container’s main PID (24873)
Since docker-shim is just a helper process, I decided to kill it directly.
$ kill -9 24791
Immediately checked Docker again:
$ docker ps -a
CONTAINER ID IMAGE STATUS NAMES
7f3a9c1b2d91 wordpress:6.4-apache Exited (137) 2 seconds ago. prod-wordpress
Bingo! The Zombie process was killed and container was in Exited state now
Now I started the container and it worked instantly:
$ docker start prod-wordpress
prod-wordpress
Verified traffic:
$ docker stats prod-wordpress
CONTAINERID NAME CPU % MEM USAGE / LIMIT NET I/O
7f3a9c1b2d91 prod-wordpress 1.23% 145MiB / 2GiB 8.4MB / 6.9MB
Website was back online.

website-up
Learned from This Incident
Containers Can Lie
A container showing “Up” doesn’t mean it’s healthy or serving traffic.
docker-shim Was a Critical Link
docker-shim acted as the parent process
If it hung, the container lifecycle was broken
