The Case of the Missing Entropy

Date

Tags
#selfhosting

In computing, entropy is the randomness collected by an operating system or application for use in cryptography or other uses that require random data. (Source: wikipedia)

Docker, are you still there?

It all started when I got myself a new VPS server for serving web content. I have a more-than-capable server at home but I'd rather not use it for "uptime-sensitive" use-cases, the odd crash still takes it down from time to time. I know, a CDN…

Sticking with what I'm comfortable with, I decided to go with a docker setup with only a few containers:

As per usual, I write my docker-compose.yaml and it's all set. But not this time. At times, when I would change something in the yaml file and run docker-compose up -d, it would do it immediately, as I would expect from all the times I've run it on my homelab. But sometimes, it would wait a minute or longer and only execute then.

I accepted this behavior a few times, but at some point, it had to be dealt with.

Investigating

I noticed a few things. First, it did not seem to be due to a lack of computing resources. My Grafana dashboard (with InfluxDB as backend and Telegraf as agent) clearly showed me that CPU usage was about 1% and RAM was about 30% full. No excessive DISK IO or NETWORK IO. So we are not overwhelming the system!

Additionally, while it was waiting to execute, I could open a new SSH connection and do other stuff. With one exception: any docker-related command would not execute.

Final clue: I could not ctrl-c my way out of a pending docker command execution, but if I would close the terminal, open a new one, connect via SSH and run any new docker command, it would still wait.

Final final clue: a minute later, I could run docker commands left, right and center without a single problem. Another minute later, it might do the whole waiting again. It was very… "Random". Wink, wink…

Have you figured it out yet? I hadn't.

Researching

With this information, I was confident enough to start searching online and I came across this github issue fairly quickly:

"docker-compose often takes a long time to do anything"

That sounds about right!

A few comments in, it was suggested to run the following command: cat /proc/sys/kernel/random/entropy_avail.

On my VPS, this returned 52. Whoopsie…

Entropy

For those of you who don't know what (computing) entropy is, here's the wikipedia article for it. In short: computers are terrible at coming up with random numbers (just like humans! Topic for another day), which many applications require for their proper working.

Our operating systems have a clever way to solve this: take all input that is NOT generated by the computer itself and use that as "randomness". For example, a computer doesn't know in advance how you are going to move the cursor or which keyboard button you will press. The operating system takes these inputs, processes them to "extract the randomness" and stores it in the entropy pool.

Any application needing some randomness can request some random data from the entropy pool. Maintaining sufficient entropy is therefore a challenge in itself: process enough random data to keep up with the demand.

Apparently, Docker is an application that requires randomness. But how have I never encountered this issue before?

VPS and entropy

On both my desktop computer and homelab, the entropy available is around 4000, which is perfect. They are able to maintain this entropy because of all the sources of randomness available to them. Mouse and keyboard inputs, processes running in the background, etc.

Now, let's take the VPS as a counter example. These things are made to be fully reproducible: every time you boot one up, they are expected to run in the same way. They are also very sealed off from the host system for security reasons: I cannot read core temperature values for my VPS. They don't have "true hardware", they get portions of hardware, shared with other VPS instances. Except for my SSH connection, the VPS has no mouse or keyboard inputs.

In other words, VPSs are severely lacking in sources of entropy. That is why the entropy available was only 52 and why docker stalled: it had to wait for sufficient randomness to occur.

More information on VPS and entropy from DigitalOcean.

The remedy: haveged

There is a way to remedy the situation: haveged. Having only discovered it last night, I do not fully understand it yet but from what I have read, it is a pseudorandom number generator (PRNG) that fills the entropy pool with "pseudorandomness". Installing haveged immediately solved my issue, all docker commands were running instantly again.

Available entropy suddenly increases after installing haveged
Can you tell when I installed haveged?

Caveat: pseudorandomness

There is a downside to this: PRNGs are NOT random. Wikipedia article on PRNGs. PRNGs generate numbers that appear random but are fully deterministic: run the exact same algorithm twice, and you'll get the same "random" numbers. Therefore, VPSs may not be the perfect solutions to perform entropy-heavy tasks such as cryptography: a cryptographic key generated with pseudorandom numbers is far less secure than one generated with truly random numbers.