Cgroups and Namespaces On Ubuntu
Hello folks. Before diving into the concepts of cgroups and namespaces on ubuntu, there are a few things one must be clear with.
1) Virtualization :
Its a method or technique used to run an operating system on top of another operating system. The hardware resources are fully utilized and will be shared by each of the operating system running on top of the base operating system.
A Hypervisor, also known as a virtual machine Monitor (VMM), sits in between the guest operating system and the real physical hardware. A Hypervisor controls the resource allocation to the guest operating system running on top of the physical hardware. It works at the virtualization layer.
Now there are basically two types of virtualization methods:
i) Hosted Virtualization
Examples : VMware Workstation, Microsoft’s Virtual PC.
ii) Bare metal Virtualization
Examples : VMware ESX and Citrix Xen Servers
The key difference between the two methods is the location of the virtualization layer. In the case of Hosted Virtualization, the Hypervisor sits above the operating system whereas in the case of Bare metal hardware, as the name suggests, the Hypervisor sits directly above the hardware.
Recently the world is moving towards lighter virtualization technologies, one of them being container virtualization.
Container virtualization is done at the operating system level, rather than the hardware level. The main thing that needs to be understood about container virtualization is…
Each container(well call it guest operating system) shares the same kernel of the base system.
The limiting of resources’ usage and to which processes is done by cgroups and namespaces.
Introduction to Cgroups:
Control groups (cgroups) are a kernel mechanism for grouping, tracking, and limiting the resource usage of tasks/processes…
There are a few key terminologies one must understand to implement cgroups. I will try not to bore you folks with the bookish language!!
Subsystem: A subsystem represents a single resource, such as CPU time or memory. (The butter block !)
Some common subsystems include:
-cpusets: fascilitate assigning a set of CPUS and memory nodes to cgroups. Tasks in a cpuset cgroup may only be scheduled on CPUS assigned to that cpuset.
-blkio : limits per-cgroup block io.
-cpuacct : provides per-cgroup cpu usage accounting.
-devices : controls the ability of tasks to create or use devices nodes using either a blacklist or whitelist.
-freezer : provides a way to ‘freeze’ and ‘thaw’ whole cgroups. Tasks in the cgroup will not be scheduled while they are frozen.
-memory : allows memory and swap usage to be tracked and limited.
-net_cls : provides an interface for tagging packets based on the sender cgroup. These tags can then be used by tc (traffic controller) to assign priorities.
-net_prio : allows setting network traffic priority on a per-cgroup basis.
-cpu : enables setting of scheduling preferences on per-cgroup basis.
-perf_event : enables per-cpu mode to monitor only threads in certain cgroups.
If i want to limit the usage of memory and limit allocation of CPUs , i would use the memory and cpusets subsystems. (We’ll get to the exact implementation later).
Hierarchy: A set of subsystems mounted together forms a hierarchy.
Tasks : The system processes are called tasks in cgroups terminology.
Cgroups : A cgroup associates a set of tasks with a set of parameters for one
or more subsystems.
For the sake of the demo, i am working on an EC2 instance. It is advised to remember that one mounts the filesystem of a device when one is asked to mount a device.
By default the subsystems are mounted into the cgroup in /sys/fs/ .
I mounted the subsystem cpuset in /sys/fs/cgroup/cpuset, and now i will attach the process to it.
First you will need to create a child group inside the parent cgroup on which the limiting of resources will occur.
The child group cg is inside the parent cgroup /cgroup/cpuset on which i have limited usage of resources by the process sh.(use only the 0th core for cpu usage). The sh process is now running inside the defined cgroup.
To summarize this short demo, what we just did was:
-mount -t tmpfs cgroup_root /sys/fs/cgroup
-mount -t cgroup cpuset -ocpuset /sys/fs/cgroup/cpuset
-/bin/echo 0 > cpuset.cpus
-/bin/echo 0 > cpuset.mems
-/bin/echo $$ > tasks
# The subshell ‘sh’ is now running in cgroup test
# The next line should display ‘/test’
You follow the same steps to mount other and multiple subsystems.
However there are a few key points to keep in mind when implementing cgroups.
1)Rule 1 :
2) Rule 2 :
3) Rule 3
4) Rule 4
Since every task is always a member of exactly one cgroup in each mounted hierarchy. To remove a task from its current cgroup you must move it into a new cgroup (possibly the root cgroup) by writing to the new cgroup’s tasks file.
So far we have covered cgroups, now lets move onto Namespaces!!
Let us return to that butter example. As explained in the earlier analogy , we can see the different butter cubes to be the different namespaces. This is just to show that each namespace is isolated from the other.
Example: There might be a process that runs with the pid 6661 in one namespace, however a different process is running in another namespace by a different process id.
-A namespace (abbreviated as NS) provides an isolated instance of global resource.
-Used for implementation of containers.
-A lightweight virtualization that provides processes illusion that they are the only process running in the system.
Each NS has a stack of resources that are used by the processes. For example a network namespace would have its own network stack consisting of firewall rules, route tables etc. Thus a process that needs only the network resources would be put in the network namespace.
Currently Linux implements six different namespaces:
1) Mount Namespace : (CLONE_NEWNS)
-mount points , filesystem
2) UTS Namespace : (CLONE_NEWUTS)
-hostname, domain name
3) IPC Namespace : (CLONE_NEWIPC)
4) PID Namespace : (CLONE_NEWPID)
5) Network namespace: (CLONE_NEWNET)
6) User namespace : (CLONE_NEWUSER)
For example while implementing network namespaces, the function CLONE_NEWNET is an identifier in the code where the network namespace is created.Namespaces can be implemented with API calls. These APIs consist of the following functions:
-creates a new process and a new namespace.
-the process is attached to the new namespace.
-Process creation and process termination methods, fork() and exit() are used to handle. the new namespace CLONE_NEW* flags.
-does not create a new process.
-creates a new namespace and attaches the current process to it.
-a system call was added for joining an existing namespace to an existing process, that is, now new namespaces or processes were created.
iv) /proc files
-Each process has a /proc/PID/ns directory that contains one file for each type of namespace.
Let us look at a working example of one of the namespaces (say network namespaces), to understand how it works.
What is a network namespace?
A network namespace is logically another copy of the network stack, with its own routes, firewall rules, and network devices.
To create a network namespace: On the terminal window run:
ip netns add <new namespace name>
For example : ip netns add test : creates a network namespace by the name of test.
The next step is to assign interfaces to the network NS.
Note: you can only assign virtual Ethernet (veth) interfaces to a network namespace and they always come in pair!!
Since you cannot have multiple Ethernet attached to your system, thus, one of the virtual Ethernet is attached to the physical Ethernet and the other is at the namespace end.
Now, let’s say that you want to connect the global namespace to the test namespace.
To do that, you’ll need to move one of the veth interfaces to the test namespace using this command at the terminal: ip link set veth1 netns test
As veth1 has been attached to the test network NS, it does not show on the list of global namespaces.
How to execute commands in the namespace??
On the terminal, run
ip netns exec test ip link list
Now lets break this command down and see whats happening.
ip netns exec : to execute commands in different namespaces.
Test :name of the network namespace
ip link list :the command you want to run within a namespace
When you run that command, you should see a loopback interface (lo) and the veth1 interface you moved over earlier.
And that is exactly what we see!!
Remember veth1 was the virtual ethernet attached to the test namespace.
Now that we have assigned the interfaces, lets now configure them.
To configure assigned interfaces.
We have to configure the veth1 interface in the test namespace.
For that purpose, on the terminal, run
ip netns exec test ifconfig veth1 10.1.1.1/24 up
Ip addr, ip link, ip route would do the same thing.
Now if you run <ip addr> in the global namespace and in the test namespace, you will get different interfaces.
You can see that i ran ip addr in the global namespace and in the test network namespace, i got different inet addresses.
So here we see that the same resource is treated differently in different namespaces.
Thus we saw a brief implementation of Cgroups & Namespaces on Ubuntu.
Credits : Gunjan Sawhney (AWS)
References: Images taken from www.google.com
Well would be nice if someone put down what the allowable values are for each cgroup subsystem and how this works with LXC as cgmanager has been depricated and rely on cgroup namespaces now.