Docker [01]: How do things work from the inside?

If you came by this page, you might be interested in knowing what is docker and how it works. I’m gonna explain how it works internally with some examples, so we do not lose it by having too many theoretical discussions.

Non-isolated processes:

Every program running on a machine is composed of one or more processes. Imagine that we have one process that runs malware. This process can badly affect the other normal processes. it can do the following:

Overuse the available resources of the machine (for example: memory, network bandwidth, or CPU)
Corrupt the files that other processes use or even change their code files.

For these security reasons, there’s a need to isolate the resources of each process running on the machine, but is it only a security requirement?

The answer is NO, as the running processes are normal applications, say a Java, or a Python application. These applications have a set of requirements that should be available to the process environment before running the application. You may even have two apps that need different versions of a requirement, so the question pops up, how can we make an isolation so that each process can run within its own environment that is separate from the other environments? Should we use virtual machines? No, that would be too heavy to run a VM for each process. So, the solution is what Docker calls CONTAINERS which uses an underlying technology called Namespaces.

What are Containers?

container is a lightweight and isolated execution environment that packages software and its dependencies together. Containers provide a consistent and portable way to run applications across different computing environments, such as development machines, servers, or cloud platforms. These containers are built on a Linux technology called Namespaces. This means it is a requirement to use Linux to have that type of isolation.

What are Namespaces:

Wikipedia

Namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources. The feature works by having the same namespace for a set of resources and processes, but those namespaces refer to distinct resources. Resources may exist in multiple spaces. Examples of such resources are process IDs, hostnames, user IDs, file names, some names associated with network access, and Inter-process communication.

Imagine that some memory, CPU, and network bandwidth and you want to partition all of them between the running process so that none of them can interfere with the resources of the other processes, or in some cases not know that another process exists ;)

Linux Namespaces:

pid (isolate process IDs)
cgroups
mnt (isolate disk space and files)
net (isolate network interfaces)
IPC (inter-process communication)
time
user

These are the most common and known namespaces that are being used and available on Linux distributions that support Namespaces. We will go through each one of them (with examples) to explain how they work.