November 16, 2022
If you haven't read our previous part, check it out on our blog.
Docker registries store images and allow Docker to easily set up a central repository from which we can download images. However, the simplicity and convenience of registries can become a security risk if we fail to evaluate the security context of the registry.
The Docker Trusted Registry (DTR) is a registry that can be installed behind our firewall. It gives us the ability to verify both the integrity and the publisher of all the data received from a registry over any channel. And even if the registry is only accessible through a firewall, we must also resist the temptation to let anyone upload or download images.
The Docker images do possess vulnerabilities and are not secure by default. The vulnerabilities might because of the packages installed in the image, libraries used by the user, or even the base image. Even Docker Official images still have vulnerabilities. Sometimes the unstructured base images from Docker Hub may have backdoor, or contain dangerous malware.
The ease of pushing and pulling images, while making developers lives easier, has also made it easy for malicious actors to spread malware. An analysis in December 2020 by Prevasio show that 51% of the images had exploitable vulnerabilities over around 4 million Docker Hub images.
Although Docker have some mechanisms (which I will list later) to restrict the access back to host, a malicious image still can use up all CPU cycles, it can exhaust memory, it can fill drive, and it can send network traffic out from our machine. These vulnerable images that make their way into production environments pose significant threats that can be costly to remediate and can damage our reputation.
In the other hand, if you write a bad Docker image, it may leak your sensitive information or private file, which could become a dangerous security problem later.
Securing a Docker container requires an end-to-end approach that provides protection everywhere from the host to the network and everything in between. Because containers are accessible, there are many difficulties in security.
Docker container technology increases the default security by creating the isolation layers between the application and between the application and hosts. Isolation is a powerful mechanism in controlling what containers can see or access or what resources they can use.
When you start a container with docker run, behind the scenes Docker creates a set of namespaces and cgroups to provide basic isolation across containers. Advanced isolation can be achieved like capabilities, seccomp, AppArmor, SELinux, .etc.
namespaces are a feature of the Linux kernel that partitions kernel resources such that each set of processes see difference set of resources. The feature works by having the same namespace for a set of resources and processes, but those namespaces refer to distinct resources (resources can exist in multiple spaces).
Docker takes advantage of Linux namespaces to provide the isolated workspace - container. A set of namespaces are created when we deploy a new container, isolating it from all the other containers. This is an isolation of what process can see inside a container.
With namespaces, Docker container:
There are 5 namespaces created for a Docker container:
1/ PID Namespace: An unique ID number is assigned to the namespace (different from the host system). Each container has its own set of PID namespaces for its processes.
2/ MNT Namespace: Container is provided its own namespace for mount directory paths.
3/ NET Namespace: Each container is provided its own view of the network stack avoiding privileged access to the sockets or interfaces of another container.
4/ UTS Namespace: This provides isolation between the system identifiers; the hostname and the NIS domain name.
5/ IPC Namespace: The inter-process communication (IPC) namespace creates a grouping where containers can only see and communicate with other processes in the same IPC namespace.
Control groups (or cgroups) are a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.
cgroups allows Docker control which resources each container can access, providing good container multi-tenancy. They allow Docker to share available hardware resources, and set up limits and constraints for containers. This help us to ensure that the whole system will not down just because a resources exhausted container.
Compare to namespaces, which is an isolation of what the processes can see, cgroups is an isolation of what the processes can access inside a Docker container. namespaces limit what you can see (and therefore use) while cgroups limit how much you can use.
Docker Engine uses the following cgroups:
cgroups are essential to prevent denial-of-service attacks and guarantee a consistent uptime and performance in multi-tenancy platforms, even when some applications start to misbehave.
seccomp (secure computing mode) is a computer security facility in the Linux kernel. We can understand it as a firewall for syscalls, to restrict the system calls that process may make. From this view, seccomp isolates the process from the system's resources entirely.
Docker Engine supports the use of seccomp, allows us to limit the actions within the container to the level of system call. However, feature is available only if Docker has been built with seccomp and the kernel is configured with CONFIG_SECCOMP enabled.
With seccomp we can restrict our application's access to the host system. While Docker has default seccomp profile, we can also customize it ourselves to improve security and adapt to our purpose.
// Add a system call to whitelist
...
{
"name": "mkdir", // system call's name
"action": "SCMP_ACT_ALLOW", // action if system call's name is matched
"args": [] // additional arguments
},
...
Before capabilities, Linux considers OS security in terms of root privileges versus user privileges. With capabilities Linux now has a more nuanced privilege model. Restricting both access and capabilities reduces the amount of surface area potentially vulnerable to attack.
Although capabilities allow granular specification of user access, but there still have the option to elevate their access to root level through the use of sudo or setuid binaries. Doing this may constitute a security risk. Docker’s default settings are designed to limit Linux capabilities and reduce this risk.
In most cases, containers don't need true root privileges at all. Therefore, containers can work with a limited set of capabilities. The Docker default bounding set of capabilities is less than half of the total capabilities assigned to a Linux process. This reduces the possibility that application level vulnerabilities could be exploited to allow escalation to a fully-privileged root user.
Modern Linux kernels have many additional security constructs in addition to the concepts of capabilities, namespaces and cgroups. Docker can leverage existing systems like TOMOYO, SELinux and GRSEC, some of which come with security model templates which are available out of the box for Docker containers. We can further define custom policies using any of these access control mechanisms.
Linux hosts can be hardened in many other ways and while deploying Docker enhances the host security, it also does not preclude the use of additional security tools. Specifically, Docker recommend users run Linux kernels with GRSEC and PAX. These patch sets add several kernel level safety checks, both at compile-time and run-time, that attempt to defeat or make some common exploitation techniques more difficult. While not Docker-specific, these configurations can provide system-wide benefits without conflicting with Docker.
Start Your Project
We offer up to 6 months of warranty and dedicated support for all projects. Plus, it's on us if your project exceeds the estimated budget.