Docker security (Part 2)

November 16, 2022

If you haven't read our previous part, check it out on our blog.

3. Docker registries (layer 2)

Docker registries store images and allow Docker to easily set up a central repository from which we can download images. However, the simplicity and convenience of registries can become a security risk if we fail to evaluate the security context of the registry.

The Docker Trusted Registry (DTR) is a registry that can be installed behind our firewall. It gives us the ability to verify both the integrity and the publisher of all the data received from a registry over any channel. And even if the registry is only accessible through a firewall, we must also resist the temptation to let anyone upload or download images.

Docker Trusted Registry.

4. Docker images (layer 3)

The Docker images do possess vulnerabilities and are not secure by default. The vulnerabilities might because of the packages installed in the image, libraries used by the user, or even the base image. Even Docker Official images still have vulnerabilities. Sometimes the unstructured base images from Docker Hub may have backdoor, or contain dangerous malware.

The ease of pushing and pulling images, while making developers lives easier, has also made it easy for malicious actors to spread malware. An analysis in December 2020 by Prevasio show that 51% of the images had exploitable vulnerabilities over around 4 million Docker Hub images.

Half of 4 million public Docker Hub images found to have critical vulnerabilities (12/2020).

Although Docker have some mechanisms (which I will list later) to restrict the access back to host, a malicious image still can use up all CPU cycles, it can exhaust memory, it can fill drive, and it can send network traffic out from our machine. These vulnerable images that make their way into production environments pose significant threats that can be costly to remediate and can damage our reputation.

In the other hand, if you write a bad Docker image, it may leak your sensitive information or private file, which could become a dangerous security problem later.

5. Docker containers (layer 3)

Securing a Docker container requires an end-to-end approach that provides protection everywhere from the host to the network and everything in between. Because containers are accessible, there are many difficulties in security.

Docker container technology increases the default security by creating the isolation layers between the application and between the application and hosts. Isolation is a powerful mechanism in controlling what containers can see or access or what resources they can use.

Isolation in Docker container.

When you start a container with docker run, behind the scenes Docker creates a set of namespaces and cgroups to provide basic isolation across containers. Advanced isolation can be achieved like capabilities, seccomp, AppArmor, SELinux, .etc.

Linux kernel features using in Docker container.

5.1 Kernel namespaces

namespaces are a feature of the Linux kernel that partitions kernel resources such that each set of processes see difference set of resources. The feature works by having the same namespace for a set of resources and processes, but those namespaces refer to distinct resources (resources can exist in multiple spaces).

Isolation between processes within different namespaces.

Docker takes advantage of Linux namespaces to provide the isolated workspace - container. A set of namespaces are created when we deploy a new container, isolating it from all the other containers. This is an isolation of what process can see inside a container.

With namespaces, Docker container:

  • Is provided process isolation: Processes running within a container cannot see, and even less affect, processes running in another container, or in the host system.
  • Has own network stack: Each container by default gets its own network stack and does not get privileged access to the sockets or interfaces of another container.
Namespaces created for a Docker container.

There are 5 namespaces created for a Docker container:

1/ PID Namespace: An unique ID number is assigned to the namespace (different from the host system). Each container has its own set of PID namespaces for its processes.

2/ MNT Namespace: Container is provided its own namespace for mount directory paths.

3/ NET Namespace: Each container is provided its own view of the network stack avoiding privileged access to the sockets or interfaces of another container.

4/ UTS Namespace: This provides isolation between the system identifiers; the hostname and the NIS domain name.

5/ IPC Namespace: The inter-process communication (IPC) namespace creates a grouping where containers can only see and communicate with other processes in the same IPC namespace.

5.2. Control groups

Control groups (or cgroups) are a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.

cgroups in Linux.

cgroups allows Docker control which resources each container can access, providing good container multi-tenancy. They allow Docker to share available hardware resources, and set up limits and constraints for containers. This help us to ensure that the whole system will not down just because a resources exhausted container.

Compare to namespaces, which is an isolation of what the processes can see, cgroups is an isolation of what the processes can access inside a Docker container. namespaces limit what you can see (and therefore use) while cgroups limit how much you can use.

Namespaces and cgroups in Docker.

Docker Engine uses the following cgroups:

  1. Memory cgroup for managing accounting, limits and notifications.
  2. HugeTBL cgroup for accounting usage of huge pages by process group.
  3. CPU group for managing user / system CPU time and usage.
  4. CPUSet cgroup for binding a group to specific CPU.
  5. BlkIO cgroup for measuring & limiting amount of blckIO by group.
  6. net_cls and net_prio cgroup for tagging the traffic control.
  7. Devices cgroup for reading / writing access devices.
  8. Freezer cgroup for freezing a group.

cgroups are essential to prevent denial-of-service attacks and guarantee a consistent uptime and performance in multi-tenancy platforms, even when some applications start to misbehave.

5.3. Seccomp

seccomp (secure computing mode) is a computer security facility in the Linux kernel. We can understand it as a firewall for syscalls, to restrict the system calls that process may make. From this view, seccomp isolates the process from the system's resources entirely.

Secure computing mode diagram.

Docker Engine supports the use of seccomp, allows us to limit the actions within the container to the level of system call. However, feature is available only if Docker has been built with seccomp and the kernel is configured with CONFIG_SECCOMP enabled.

With seccomp we can restrict our application's access to the host system. While Docker has default seccomp profile, we can also customize it ourselves to improve security and adapt to our purpose.

Seccomp is used in container security.
// Add a system call to whitelist
       "name": "mkdir",            // system call's name
       "action": "SCMP_ACT_ALLOW", // action if system call's name is matched
        "args": []                 // additional arguments

5.4. Linux kernel capabilities

Before capabilities, Linux considers OS security in terms of root privileges versus user privileges. With capabilities Linux now has a more nuanced privilege model. Restricting both access and capabilities reduces the amount of surface area potentially vulnerable to attack.

Although capabilities allow granular specification of user access, but there still have the option to elevate their access to root level through the use of sudo or setuid binaries. Doing this may constitute a security risk. Docker’s default settings are designed to limit Linux capabilities and reduce this risk.

Linux kernel features using in Docker container.

In most cases, containers don't need true root privileges at all. Therefore, containers can work with a limited set of capabilities. The Docker default bounding set of capabilities is less than half of the total capabilities assigned to a Linux process. This reduces the possibility that application level vulnerabilities could be exploited to allow escalation to a fully-privileged root user.

Linux capabilities for host and container.

5.5. Other Linux kernel security features

Modern Linux kernels have many additional security constructs in addition to the concepts of capabilities, namespaces and cgroups. Docker can leverage existing systems like TOMOYO, SELinux and GRSEC, some of which come with security model templates which are available out of the box for Docker containers. We can further define custom policies using any of these access control mechanisms.

Linux hosts can be hardened in many other ways and while deploying Docker enhances the host security, it also does not preclude the use of additional security tools. Specifically, Docker recommend users run Linux kernels with GRSEC and PAX. These patch sets add several kernel level safety checks, both at compile-time and run-time, that attempt to defeat or make some common exploitation techniques more difficult. While not Docker-specific, these configurations can provide system-wide benefits without conflicting with Docker.

More articles

Start Your Project

Tell us about your project and get a free consultation

We offer up to 6 months of warranty and dedicated support for all projects. Plus, it's on us if your project exceeds the estimated budget.

Request a project