Using apt-get update in a Dockerfile

Dockerfiles are the source code of Docker images. While they are amazingly simple, they hide some traps that the Dockerfile authors should be aware of.

Now, how does Docker build an image from a Dockerfile? It runs its statements sequentially and synchronously. For each statement, it starts a new container from the existing image, it runs the statement in the container, and creates a new image from the container.

You may think that this process is quite slow, and you are right. That’s why Docker has a cache – that means, doesn’t rebuild images that are already present. If the first statements in the file are cached, Docker skips them, until it find a statement that is not cached (because it was added or changed after the last build). From that point, it won’t use the old cache anymore.

That’s why statements adding metadata that are subject to changes (such as most LABELs) should be at the end of the Dockerfile: we don’t want Docker to rebuild all intermediary images just because we added a label.

But there are other consequences. Think about the following lines:

RUN apt-get update
RUN apt-get install apache2

Now consider the following line:

RUN apt-get update && apt-get install apache2

Apparently, they have the same effects. But what happens if you change the package list, adding an Apache module or replacing Apache with Nginx? In the first example, apt-get update is not executed again, in the second case it is. Usually we prefer the second option: periodically updating the packages is up to the user. Image maintainers should not rebuild their images each time a package version is released.

But there’s another case. Suppose that we don’t want to change the package list, but we need apt-get update to be executed again to fix some package security bug. We could build the Dockerfile with the --no-cache option. Well, it works, but it completely ignores the existing cache. That option is better used occasionally to address cache problems, not periodically to keep the system updated.

Another way is adding a cheap statement before apt-get update. We also want that the purpose of the new statements to be self-documenting. The almost-standard way is:

ENV UPDATED_ADD 2016-02-23
RUN apt-get update && apt-get install apache2

This sets an environment variable that can also be used to find out when the Dockerfile was last modified up to that line. The rest of the file could be modified months later, so you can set that variable again if necessary. Remember that it only represents the Dockerfile last changes: an image could still be built with --no-cache.