systemd_resource_control

Overview

systemd is a Linux initialization system and service manager that includes features like on-demand starting of daemons, mount and automount point maintenance, snapshot support, and processes tracking using Linux control groups. systemd provides a logging daemon and other tools and utilities to help with common system administration tasks.

resource control

All processes running on the system are child processes of the systemd init process. Systemd provides three unit types that are used for the purpose of resource control

  • Service
    A process or a group of processes, which systemd started based on a unit configuration file. Services encapsulate the specified processes so that they can be started and stopped as one set. Services are named in the following way: name.service

  • Scope
    A group of externally created processes. Scopes encapsulate processes that are started and stopped by arbitrary processes through the fork() function and then registered by systemd at runtime. For instance, user sessions, containers, and virtual machines are treated as scopes. Scopes are named as follows: name.scope

  • Slice
    A group of hierarchically organized units. Slices do not contain processes, they organize a hierarchy in which scopes and services are placed. The actual processes are contained in scopes or in services. In this hierarchical tree, every name of a slice unit corresponds to the path to a location in the hierarchy. The dash ("-") character acts as a separator of the path components. For example, if the name of a slice looks as follows: parent-name.slice.

NOTE

  • Service, scope, and slice units directly map to objects in the cgroup tree. When these units are activated, they map directly to cgroup paths built from the unit names. For example, the ex.service residing in the test-waldo.sliceis mapped to the cgroup test.slice/test-waldo.slice/ex.service/

  • cgroup dir for slice is created when

    • A service starts when it uses that slice
    • systemctl start test.slice even no one uses it

troubleshooting

systemd cgroups

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# get cgroup tree 
$systemd-cgls
systemd-cgls
├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
├─docker
│ └─febf819341b7bee63374f6f666077b04fbc53bce4cb091db0ecd2327db6d8546
│ └─13204 /bin/bash
├─machine.slice
│ └─machine-qemu\x2d1\x2dvm100.scope
│ └─12841 /usr/libexec/qemu-kvm -name guest=vm100,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-vm100/master-key.aes -machine pc-i440fx-
├─user.slice
│ └─user-0.slice
│ ├─session-121.scope
│ │ ├─12400 sshd: root@pts/2
│ │ └─12403 -bash
│ ├─session-80.scope

$systemd-cgls cpu
$systemd-cgls memory
...

# get cpu/memory/io usage of each cgroup
$systemd-cgtop
Path Tasks %CPU Memory Input/s Output/s

/ 210 16.7 12.1G - -
/system.slice 1 11.1 2.2G - -
/system.slice/systemd-journald.service 1 8.3 112.1M - -
/system.slice/rsyslog.service 1 2.3 9.8M - -
/machine.slice - 1.1 2.0G - -
/machine.slice/machine-qemu\x2d1\x2dvm100.scope 3 1.1 2.0G - -
/machine.slice/machine-qemu\x2d1\x2dvm100.scope/vcpu2 1 0.6 - - -
/machine.slice/machine-qemu\x2d1\x2dvm100.scope/vcpu3 1 0.5 - - -
/system.slice/mysqld.service 1 0.2 - - -
/system.slice/ovs-vswitchd.service 1 0.2 - - -
/system.slice/ovsdb-server.service 1 0.1 - - -
...

############# Creating transient service using systemd-run command#########################
# this will create a service unit at /run with command under slice
$systemd-run --unit=<name> --slice=<name>.slice <command>

# these files are removed when os reboot or it exits correctly or systemctl stop sleep
$systemd-run --unit=sleep --slice=system.slice sleep 10000
Running as unit sleep.service.

# these files are created
$ls /run/systemd/system/sleep.service
$ls /run/systemd/system/sleep.service.d/
50-Description.conf 50-ExecStart.conf 50-Slice.conf

# this will create a service scope unit at /run with command under slice
$systemd-run --unit=<name> --scope --slice=<name>.slice <command>

service and unit management

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# list all units by type
$systemctl -t service
$systemctl -t slice

# show runtime info of given service
$systemctl show libvirtd.service
# show runtime info of slice
$systemctl show test.slice
Slice=-.slice
ControlGroup=/test.slice
MemoryCurrent=56729600
TasksCurrent=23
Delegate=no
CPUAccounting=no
CPUShares=18446744073709551615
StartupCPUShares=18446744073709551615
Slice=-.slice
ControlGroup=/test.slice
MemoryCurrent=56729600
TasksCurrent=23
Delegate=no
CPUAccounting=no
CPUShares=18446744073709551615
...

############# set property of unix(slice, service, scop etc) from command line #######################
# set does not change /usr/lib/systemd/system/test.slice
# but create a file(overide the above) at /etc/systemd/system/test.slice.d/50-MemoryAccounting.conf
$systemctl set-property test.slice MemoryAccounting=no
$systemctl set-property <service name> <unit file option>=<value>
# check the new setting
$systemctl show --property <unit file option> <service name>

########### set property of unix(slice, service, scop etc) by editing file############################
$vi /usr/lib/systemd/system/test.slice
[Slice]
MemoryAccounting=no

$vi xx.service
[Service]
MemoryLimit=16G

FAQ

without systemd, run a program in a pairs of cgroups

1
2
3
4
5
6
$cgexec -g controllers:path_to_cgroup command arguments 

# example
$mkdir /sys/fs/cgroup/memory/test
$mkdir /sys/fs/cgroup/cpu/test
$cgexec -g memory:test -g cpu:test sleep 10

without systemd, move a process in a pairs of cgroups

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# check the cgroups of the given process
$cat /proc/12546/cgroup
11:blkio:/
10:cpuset:/
9:devices:/
8:cpuacct,cpu:/test2
7:net_prio,net_cls:/
6:memory:/test
5:pids:/
4:freezer:/
3:hugetlb:/
2:perf_event:/
1:name=systemd:/user.slice/user-0.slice/session-80.scope

# move the process and its child to the given cgroups
$cgclassify -g cpu:test2 12546
# to all subsystem of cgroups(must create it each before)
$cgclassify -g *:test2 12546

how to set limit for a given service by systemd

These can be done from command line with set-property command or edit service file directly.

1
2
3
4
5
6
7
8
# vi xx.service
[Service]
MemoryLimit=16G

# if command line with 'set-property`, a override file is created at
# /etc/systemd/system/xx.service.d/50-MemoryLimit.conf
[Service]
MemoryLimit=16G

how to set limit for a group of services

You need to create a slice which defines the limitation for the services.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$touch /usr/lib/systemd/system/test.slice
[Unit]
Description=Test Slice
Before=slices.target

[Slice]
MemoryAccounting=true
MemoryLimit=2048M
CPUAccounting=true
CPUQuota=25%
TasksMax=4096
...

# edit each service unit file
$vi /etc/systemd/system/libvirtd.service
...
[Service]
Slice=docker.slice
...

$systemctl daemon-reload
$systemctl stop libvirtd
$systemctl start libvirtd
$systemctl status libvirtd
● libvirtd.service - Virtualization daemon
Loaded: loaded (/etc/systemd/system/libvirtd.service; disabled; vendor preset: enabled)
Active: active (running) since Wed 2022-09-21 14:38:46 CST; 1h 0min ago
Docs: man:libvirtd(8)
https://libvirt.org
Main PID: 13573 (libvirtd)
Tasks: 19 (limit: 32768)
CGroup: /test.slice/libvirtd.service
├─12657 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
├─12659 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
└─13573 /usr/sbin/libvirtd --listen

Ref