docker-faq

Posted on 2019-10-18 Edited on 2024-01-29 In docker , command

FAQ

Frequently asked questions about Docker.

how to capture packet for a container

There are two ways to capture packet for a given container

way1: capture in the given container

go to that container, use tcpdump(install it first if not available) in it.

required:

container has tcpdump
container no tcpdump but can install tcpdump on it from local or internet(change the container)

way2: capture in another container which shares the same net with the given container

In some case, you can’t change the given container or it’s complex to install tcpdump on the given container, so create a template container which has tcpdump installed and shared the same network with given container.

$ docker build -t tcpdump - <<EOF
FROM ubuntu
RUN apt-get update && apt-get install -y tcpdump && \
apt-get install -y net-tools && \
rm -rf /var/lib/apt/lists/* && apt-get clean
CMD /bin/bash
EOF

# create an image called tcpdump with tcpdump installed
# then create a container with this image shares network with another container
# rm means container will be destroy when exit.
# 99c1bd6b342c is container ID

$ docker run --rm -it --net=container:99c1bd6b342c tcpdump
# this will create unnamed container from image tcpdump and shared network with the container that you want to capture packet.

how to deploy docker env on lots of machine?

Use Docker Machine, docker machine provides a client to deploy docker env on lots machine(windows, mac, linux).

whats dockerd, docker-containerd,docker-containerd-shim, docker-runc

docker daemon

docker-container-shim runs in host, not in container, while, entrypoint is the first process runs in container

NOTE: latest version, runc and shim merged into one binary: containerd-shim-runc-v2

docker 公司将 libcontainer 捐出并改名为 runC 项目，交由一个完全中立的基金会管理，然后以 runC 为依据，大家共同制定一套容器和镜像的标准和规范 OCI

2016 年 4 月，docker 1.11 版本之后开始引入了 containerd 和 runC，Docker 开始依赖于 containerd 和 runC 来管理容器，containerd 也可以操作满足 OCI 标准规范的其他容器工具，之后只要是按照 OCI 标准规范开发的容器工具，都可以被 containerd 使用

1
2
3

$ docker
docker                  docker-containerd-ctr   dockerd
docker-containerd       docker-containerd-shim  docker-runc

docker
docker 的命令行工具，是给用户和 docker daemon 建立通信的客户端。
dockerd
dockerd 是 docker 架构中一个常驻在后台的系统进程，称为 docker daemon，dockerd 实际调用的还是 containerd 的 api 接口（rpc 方式实现）,docker daemon 的作用主要有以下两方面：
- 接收并处理 docker client 发送的请求
- 管理所有的 docker 容器
  有了 containerd 之后，dockerd 可以独立升级，以此避免之前 dockerd 升级会导致所有容器不可用的问题。
containerd
containerd 是 dockerd 和 runc 之间的一个中间交流组件，docker 对容器的管理和操作基本都是通过 containerd 完成的.

containerd 的主要功能有：
- 容器生命周期管理
- 镜像管理
- 存储管理
- 容器网络接口及网络管理
- 日志管理

containerd-shim
containerd-shim 是一个真实运行容器的载体，每启动一个容器都会起一个新的containerd-shim的一个进程，它直接通过指定的三个参数：容器id，boundle目录（containerd 对应某个容器生成的目录，一般位于：/var/run/docker/libcontainerd/containerID，其中包括了容器配置和标准输入、标准输出、标准错误三个管道文件），运行时二进制（默认为runC）来调用 runc 的 api 创建一个容器，上面的 docker 进程图中可以直观的显示。

1
2
3

$ ps -ef
...
docker-containerd-shim 0d55f781ae78a903e68fe6b7941e78c82ca4362b550ca5e7dfc522c113d29226 /var/run/docker/libcontainerd/0d55f781ae78a903e68fe6b7941e78c82ca4362b550ca5e7dfc522c113d29226 docker-runc

其主要作用是：
它允许容器运行时(即 runC)在启动容器之后退出，简单说就是不必为每个容器一直运行一个容器运行时(runC)
即使在 containerd 和 dockerd 都挂掉的情况下，容器的标准 IO 和其它的文件描述符也都是可用的
向 containerd 报告容器的退出状态
有了它就可以在不中断容器运行的情况下升级或重启 dockerd，对于生产环境来说意义重大。

runC
runC 是 Docker 公司按照 OCI 标准规范编写的一个操作容器的命令行工具，其前身是 libcontainer 项目演化而来，runC 实际上就是 libcontainer 配上了一个轻型的客户端，是一个命令行工具端，根据 OCI（开放容器组织）的标准来创建和运行容器，实现了容器启停、资源隔离等功能。

# even without docker, dockerd, docker-containerd, we still can run a container using runC, here is an example to run busybox:

$ mkdir /container
$ cd /container/
$ mkdir rootfs

# 准备容器镜像的文件系统,从 busybox 镜像中提取
$ docker export $(docker create busybox) | tar -C rootfs -xvf -
$ ls rootfs/
bin  dev  etc  home  proc  root  sys  tmp  usr  var

# 有了rootfs之后，我们还要按照 OCI 标准有一个配置文件 config.json 说明如何运行容器，
# 包括要运行的命令、权限、环境变量等等内容，runc 提供了一个命令可以自动帮我们生成
$ docker-runc spec
$ ls
config.json  rootfs

$ docker-runc run simplebusybox    #启动容器
$ ls
bin   dev   etc   home  proc  root  sys   tmp   usr   var

rootfs之后，我们还要按照 OCI 标准有一个配置文件 config.json 说明如何运行容器，

docker daemon uses registry mirror and open tcp socket

create file /etc/systemd/system/docker.service.d/override.conf then restart service with systemctl restart docker

[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H fd:// -H tcp://0.0.0.0:2375 \
          --registry-mirror https://registry.docker-cn.com

push local image to docker hub

First you must have a docker hub account like xxx, and tag your image with this format.

xxx/yyy, xxx is you account id, yyy can be any string

$ docker tag local_image xxx/yyy
$ docker tag local_image xxx/yyy:new_tag
$ docker login
$ docker push xxx/yyy

# other people can download from docker hub
$ docker pull xxx/yyy

what’s the most used repo for docker.

docker hub: https://hub.docker.com/
google hub: https://console.cloud.google.com/gcr/images/google-containers/GLOBAL

how to get value of a particular attribute

1	$ docker inspect --format='{{.HostConfig.CpuQuota}}' $container_id

how to map a host block device to container

$ docker run --device=/dev/sdb:/dev/xvda -it ubuntu /bin/bash
# /dev/sdb is host device
# /dev/xvda is device in container


# default permission: rwm(read, write, create)
# with permission
$ docker run --device=/dev/sdb:/dev/xvda:r -it ubuntu /bin/bash

how to limit IO for a container

cgroups to accomplish this

Refer to docker io throttle, only supports DirectIO due to blkio cgroup limitation.

# these two policies can work together, not it only supports DirectIO, not buffer io
--blkio-weight=0                Block IO weight (relative weight) accepts a weight value between 10 and 1000, default weight for each device
--blkio-weight-device=""        Block IO weight (relative device weight, format: DEVICE_NAME:WEIGHT), override default weight, example: --blkio-weight-device "/dev/sda:100"

--device-read-bps=[]            Limit read rate (bytes per second) from a device
--device-read-iops=[]           Limit read rate (IO count per second) from a device
--device-write-bps=[]           Limit write rate (bytes per second) to a device
--device-write-iops=[]          Limit write rate (IO count per second) to a device

$ docker run -it --rm --device-write-bps /dev/sda:1mb ubuntu /bin/bash
# /dev/sda is host device, it limits the rate of this container if it accesses /dev/sda

$ docker run -it --rm --device-write-iops /dev/sda:10 ubuntu /bin/bash

# blkio-weight is relative weight, that means if only one processA(docker process or other) with weight 100 access /dev/sda, it uses the 100% IO bandwidth,
# if another processB access /dev/sda at same time who belongs to another blkio group with weight 200, 1/3 bandwidth for processA, 2/3 IO bandwith for processB

# test inside each container with directIO
(container)# time dd if=/dev/zero of=test.out bs=1M count=1024 oflag=direct

limit cpu of a container

cgroups to accomplish this

--cpu-shares , -c
Set this flag to a value greater or less than the default of 1024 to increase or reduce the container’s weight, and give it access to a greater or lesser proportion of the host machine’s CPU cycles. This is only enforced when CPU cycles are constrained. When plenty of CPU cycles are available, all containers use as much CPU as they need. In that way, this is a soft limit. It prioritizes container CPU resources for the available CPU cycles. 

# !!!It does not guarantee or reserve any specific CPU access.!!!

# let's say there are four CPUS on host, if all containers are running CPU intensive workload
# first container takes one CPU
# second container takes one CPU
# third container takes two CPU

# but if third container quits or sleep, the other two both take two CPUS !!!

$ docker run -it --rm  --cpu-shares 1024 ubuntu /bin/bash
$ docker run -it --rm  --cpu-shares 1024 ubuntu /bin/bash
$ docker run -it --rm  --cpu-shares 2048 ubuntu /bin/bash

--cpu-period & --cpu-quota
CPU quota (cpu_qota) is a feature of Linux Control Groups (cgroup). CPU quota control `how much CPU time a container can use`, `cpu_quota is the number of microseconds of CPU time` a container can use `per cpu_period`. For example configuring:

cpu_quota to 50,000
cpu_period to 100,000

The container will be allocated 50,000 microseconds per 100,000 microsecond period. `A bit like (see below) the use of 0.5 CPUs`. Quota can be greater than the period. For example:

    cpu_quota to 200,000
    cpu_period to 100,000

Now the container can use `200,000 microseconds of CPU time every 100,000 microseconds`. To use the CPU time there will either need to be multiple processes in the container, or a multi-threaded process. `This configuration is a bit like (see below) having 2 CPUs`.

cpu_quota allows setting an `upper bound on the amount of CPU time a container gets`. Linux enforces the limit even if CPU time is available. Quotas can hinder utilization while `providing a predictable upper bounds on CPU time.`

--cpuset-cpus
Limit the specific CPUs or cores a container can use. A comma-separated list or hyphen-separated range of CPUs a container can use, if you have more than one CPU. The first CPU is numbered 0. A valid value might be 0-3 (to use the first, second, third, and fourth CPU) or 1,3 (to use the second and fourth CPU).

--cpus=<value> 	Specify how much of the available CPU resources a container can use. For instance, if the host machine has two CPUs and you set --cpus="1.5", the container is guaranteed at most one and a half of the CPUs. This is the equivalent of setting --cpu-period="100000" and --cpu-quota="150000". shortway for cpu-period and cpu-quota

# In order to set cpu_quota correctly, you need to know how many cpu can be used by container, then set cpu_quota and cpu_period correctly.
$ docker run -it --rm  --cpuset-cpus 0-1 --cpus=1.5 ubuntu /bin/bash
# OR
$ docker run -it --rm  --cpuset-cpus 0-1 --cpu-period=100000 --cpu-quota=150000 ubuntu /bin/bash

# only alow cpu0 and cpu1 to run this container
$ docker run -it --rm  --cpuset-cpus 0-1 ubuntu /bin/bash

expose Nvidia GPU to a container

# must install nvidia runtime first on host
$ apt-get install nvidia-container-runtime
# run from image ubuntu with nvidia-smi command as entrypoint
$ docker run -it --rm --gpus device=GPU-3a23c669-1f69-c64e-cf85-44e9b07e7a2a ubuntu nvidia-smi

limit memory used by a container

cgroups to accomplish this

# options followed by a suffix of b, k, m, g, to indicate bytes, kilobytes, megabytes, or gigabytes
-m or --memory= The maximum amount of memory the container can use. If you set this option, the minimum allowed value is 6m (6 megabyte).

# !!!it's just cgroup limitation, it does not reserved such memory for a container!!!
$ docker run -it --rm  -m 100M ubuntu /bin/bash
$ docker run -it --rm  -m 10G ubuntu /bin/bash

# update docker memory or cpu when it's running
$docker update --memory 123289600 --memory-swap 123289600 ubuntu
$docker update --cpus 1 ubuntu

check stats for container

# it will show container memory limit and how much it's used now
$ docker stats $container_id
CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O
5ed                 12.44%              104.8 MB / 104.9 MB   99.92%              4.861 kB / 648 B    9.138 GB / 10.16 GB

run container automatically when dockerd start

1 2	# run your container with --restart=always $ docker run --restart=always -it ubuntu /bin/bash

expose a port from container

iptable rule to accomplish this

# in this way you when you access host port, it redirect to container port
# host port: 6000
# container port: 22
$ docker run --restart=always -p 6000:22 -it ubuntu /bin/bash

start a container with given ip and hostname

# check subnet of bridge that container uses
$ docker network ls
$ docker network inspect bridge
# set with hostname and ip
$ docker run --restart=always -p 6000:22 --hostname test --ip 172.16.0.2 -it ubuntu /bin/bash

change docker store /var/lib/docker to other dir

1. Modify /lib/systemd/system/docker.service to tell docker to use our own directory 
   instead of default /var/lib/docker. In this example, I am using /p/var/lib/docker
   
   Apply below patch.

   $ diff -uP -N /lib/systemd/system/docker.service.orig /lib/systemd/system/docker.service
   --- /lib/systemd/system/docker.service.orig	2018-12-05 21:24:20.544852391 -0800
   +++ /lib/systemd/system/docker.service	2018-12-05 21:25:57.909455275 -0800
   @@ -10,7 +10,7 @@
    # the default is not to use systemd for cgroups because the delegate issues still
    # exists and systemd currently does not support the cgroup feature set required
    # for containers run by docker
  -ExecStart=/usr/bin/dockerd -H unix://
  +ExecStart=/usr/bin/dockerd -g /p/var/lib/docker -H unix://
   ExecReload=/bin/kill -s HUP $MAINPID
   TimeoutSec=0
   RestartSec=2

2. Stop docker service
   $ systemctl stop docker
3. Do daemon-reload as we changed docker.service file   
   $ systemctl daemon-reload
4. rsync existing docker data to our new location   
   $ rsync -aqxP /var/lib/docker/ /p/var/lib/docker/
5. Start docker service   
   $ systemctl start docker

inside docker permission denied to run some command

1 2	# run docker with --privileged=true and must run with /usr/sbin/init $ docker run --restart=always -p 6000:22 --hostname test --ip 172.16.0.2 --privileged=true -it ubuntu /usr/sbin/init

retrieve docker run command from container

# use docker inspect to get all parameter of run
$ pip install runlike

$ runlike -p  centos7_org
docker run \
        --name=centos7_org \
        --hostname=1cde10a63a09 \
        --mac-address=02:42:ac:11:00:02 \
        --env=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
        --restart=always \
        --label='org.label-schema.license=GPLv2' \
        --label='org.label-schema.build-date=20201113' \
        --label='org.label-schema.schema-version=1.0' \
        --label='org.label-schema.vendor=CentOS' \
        --label='org.opencontainers.image.created=2020-11-13 00:00:00+00:00' \
        --label='org.opencontainers.image.title=CentOS Base Image' \
        --label='org.opencontainers.image.licenses=GPL-2.0-only' \
        --label='org.label-schema.name=CentOS Base Image' \
        --label='org.opencontainers.image.vendor=CentOS' \
        --runtime=runc \
        -t \
        centos:7 \
        /bin/bash

run df/free/cpu inside container

As there is no namespace for memory, cpu, hence when you run free, cat /proc/cpuinfo, it shows information about the host!!!, it’s not one that container can use, what you see is not true, it’s not what container can use, it’s also true, it’s true for host.

but df show mount information while container has its own mnt namespace, so it sees devices of its own not host.

$ docker exec -it centos bash
# it's host memory
[root@423eb7c3e9a5] # free -h
              total        used        free      shared  buff/cache   available
Mem:          7.8Gi       538Mi       6.9Gi       8.0Mi       390Mi       7.0Gi
Swap:         7.9Gi          0B       7.9Gi

# it's host fs
[root@423eb7c3e9a5 opt]# df -h
Filesystem               Size  Used Avail Use% Mounted on
overlay                   50G   23G   28G  46% /
tmpfs                     64M     0   64M   0% /dev
tmpfs                    3.9G     0  3.9G   0% /sys/fs/cgroup
shm                       64M     0   64M   0% /dev/shm
/dev/vg1/lv1              89M  4.9M   84M   6% /opt
/dev/mapper/centos-root   50G   23G   28G  46% /etc/hosts
tmpfs                    3.9G     0  3.9G   0% /proc/asound
tmpfs                    3.9G     0  3.9G   0% /proc/acpi
tmpfs                    3.9G     0  3.9G   0% /proc/scsi
tmpfs                    3.9G     0  3.9G   0% /sys/firmware

# it's host cpu info
[root@423eb7c3e9a5 opt]# cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 165
model name      : Intel(R) Core(TM) i7-10700T CPU @ 2.00GHz
stepping        : 5
cpu MHz         : 1992.000
cache size      : 16384 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d arch_capabilities
bogomips        : 3984.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:
...

# check disk size that can be used by container, please check https://cyun.tech/docker-persist-data
# all limits you have with containers using something like the overlay2 filesystem are inherited from the parent filesystem. 
# Since docker does everything under /var/lib/docker, your available disk space on that filesystem are the same as the limits you'll see inside of a container.

# check cpu and memory can be used for this container, run docker inspec
$ docker inspect centos | grep Cpuset
89:            "CpusetCpus": "0-1",
90:            "CpusetMems": "",

$ docker inspect centos | grep Memory
76:            "Memory": 2147483648,

limit rootfs size per container

By default, docker sees the same disk size(while RW layer sits) as host and can use it as well, but in some case we want to limit storage size used by container, --storage-opt size=xxx can do this, but limit to some storage driver devicemapper, btrfs, overlay2, windowsfilter and zfs For the devicemapper, btrfs, windowsfilter and zfs drivers, user cannot pass a size less than the Default BaseFS Size. For the overlay2 storage driver, the size option is only available if the backing fs is xfs and mounted with the pquota mount option. Under these conditions, user can pass any size less than the backing fs size.

XFS supports disk quotas by user, by group, and by project. Project disk quotas allow you to limit the amount of disk space on individual directory hierarchies. You can configure both hard and soft limits on the number of disk blocks (or disk space), and the number of inodes, which limit the number of files a user can create. Quotas do not apply to the root user.

You must first enable quotas for users, groups, and/or projects by using a mount option when mounting for the XFS file system. After enabling quotas, use the xfs_quota command to set limits to view quota information.

# enable disk project quotas on /var
$ cat /etc/fstab
...
/dev/mapper/centos_dev-var /var                     xfs     rw,pquota        0 0 
...

# report the overall quota state information:
$ xfs_quota -x -c state
User quota state on /var (/dev/mapper/centos_dev-var)
  Accounting: OFF
  Enforcement: OFF
  Inode: #20328 (5 blocks, 5 extents)
Group quota state on /var (/dev/mapper/centos_dev-var)
  Accounting: OFF
  Enforcement: OFF
  Inode: #227342 (1 blocks, 1 extents)
Project quota state on /var (/dev/mapper/centos_dev-var)
  Accounting: ON
  Enforcement: ON
  Inode: #227342 (1 blocks, 1 extents)
Blocks grace time: [7 days]
Inodes grace time: [7 days]
Realtime Blocks grace time: [7 days]

# show quota
$ xfs_quota -x -c 'report -h' /var
Project quota on /var (/dev/mapper/centos_dev-var)
                        Blocks              
Project ID   Used   Soft   Hard Warn/Grace   
---------- --------------------------------- 
#0           2.2G      0      0  00 [------]

$ docker run -it --storage-opt size=10G fedora /bin/bash

$ xfs_quota -x -c 'report -h' /var
Project quota on /var (/dev/mapper/centos_dev-var)
                        Blocks              
Project ID   Used   Soft   Hard Warn/Grace   
---------- --------------------------------- 
#0           2.2G      0      0  00 [------]
#2             8K    10G    10G  00 [------]
#3             8K    10G    10G  00 [------]


# what quota does
# initialize project with ID: 100
$ mkdir -p /data/volumes/xfs32m/5m
$ xfs_quota -x -c 'project -s -p /data/volumes/xfs32m/5m 100' /data/volumes/xfs32m

# set a 5M quota on project, id=100
$ xfs_quota -x -c 'limit -p bsoft=5m bhard=5m 100' /data/volumes/xfs32m

change storage driver

Refer to storagedriver

Get Pid of container

# Pid we see from host, inside container, it's Pid 1!!!
$docker inspect --format {{.State.Pid}} $container_id
$docker inspect --format {{.State.Pid}} $container_name
16755

enter docker namespace without docker exec

The first process of container, its parent is containerd-shim-runc-v2, but if you run docker exec -it $docker bash, its parent is containerd-shim-runc-v2 as well.

$docker inspect --format {{.State.Pid}} $container_name
16755

              /proc/pid/ns/mnt    the mount namespace
              /proc/pid/ns/uts    the UTS namespace
              /proc/pid/ns/ipc    the IPC namespace
              /proc/pid/ns/net    the network namespace
              /proc/pid/ns/pid    the PID namespace
              /proc/pid/ns/user   the user namespace
              /proc/pid/root      the root directory

# mount, uts, ipc, net, pid, user namespace
$nsenter -t 16755 --mount --net --uts --ipc --pid --user --root /bin/bash
# same as
$docker exec -it $container_name /bin/bash

dockerd setting

keep container running when docker service restarting

$ cat /etc/docker/daemon.json 
{
    "live-restore": true
}

prevent docker to create default bridge network

$ cat /etc/docker/daemon.json 
{
    "bridge": "none"
}

change log level for dockerd

$ cat /etc/docker/daemon.json 
{
    "log-level": "debug"
}

Get log of dockerd

$ journalctl -xu docker.service

change data directory for dockerd
default it’s /var/lib/docker on Linux.

$ cat /etc/docker/daemon.json 
{
   "data-root": "/mnt/docker-data"
}

run dockerd in forground with profiling

$ /usr/bin/docker-containerd-current -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime docker-runc --pprof-address 127.0.0.1:5151 --debug

update docker config after created it

# build-dev is container's name
$docker inspect --format="{{.Id}}" build-dev
6b667579c2a963767bb97b5ed35e4d56ca9a428da8b9fd067fac14b25712048a

# stop docker service, otherwise, edit config.v2.json will be lost as docker will rewrite this file if edited outside!!!
$service stop docker
$vim /var/lib/docker/containers/6b667579c2a963767bb97b5ed35e4d56ca9a428da8b9fd067fac14b25712048a/config.v2.json
$service start docker

show disk usage of container

# NOTE: this list virtual size and real size
# virtual size = real size + image size
# real size is only RW layer size

$ docker ps --size

updating conf of existing container

how to set a pre-existing docker container’s restart policy

1 2	# docker update has very limit options(for mem, cpu, restart only) $ docker update --restart=always jason-dev