docker-network

Docker network

Overview

Docker’s networking subsystem is pluggable using drivers. Several drivers exist by default, and provide core networking functionality

  • bridge: The default network driver. If you don’t specify a driver, this is the type of network you are creating. Bridge networks are usually used when your applications run in standalone containers that need to communicate, will use a separate network namespace for this container, add veth pairs, one in the bridge, the other in the container.

  • host: For standalone containers, remove network isolation between the container and the Docker host, and use the host’s networking, use same network namespace(root) with host directly.

  • overlay and macvlan

  • none: For this container, disable all networking. Usually used in conjunction with a custom network driver. will use a separate network namespace but only with loopback interface.

More details about docker network, refer to

Bridge mode

As host and none driver are simple, host uses network namespace with host, while none uses a separate namespace but only with loopback interface, host driver has high performance but less isolation, none for customer defining it own network.

So here let’s explain Bridge driver with more details and see how traffic goes in container and goes out with ping command in docker.

ping from docker

From simplicity, ignore arp, dns, $ ping baidu from container.

Request packet out:

  1. First container checks its routing table, it sees that for baidu(arp, dns ignore),it should send packet to gateway 172.17.0.2(docker0[each bridge has such virtual device] which is in root network namespace), as eth0(in container) and docker0 are in same subnet, so the request packet is sent by container with below info through vethpair(eth0 in container, the other end in bridge).
1
2
3
4
src mac: eth0(in container) mac
dst mac: docker0 mac
src ip: eth0(container) ip: 172.17.0.1
dst ip: baidu's ip
  1. The packet goes directly to vethx(as it’s the peer of eth0 in container)
  2. When packet reaches vethx, bridge CAM table is used to search which port should be sent to by checking the dst mac, as dst mac is docker0 mac, so packet is sent to docker0 without any change(because it's bridge(switch)).
  3. When docker0 receives this packet with dst mac is itself, packet goes up(bypass the bridge) as it’s local, PREROUTING hook is called at IP layer for each table with priority(conntrack->nat), as no connection track for this request now, so conntrack table does not match, then check if any rule matches in nat table, still no match here. hence looking up routing table after PREROUTING.

there are raw mangle tables in PREROUTING hook, we did not check that because it's empty, just ignore it for simple, only check nat table at PREROUTING

  1. By looking up routing table with dst ip(baidu’s ip), the default route is matched with gateway 10.117.7.253, out iface eth0(host), then goes to FORWARD phase(received packet that’s not for me, forward it out), check rules in that HOOK point.

  2. The first rule in FORWARD(filter table) matches, jump to DOCKER-ISOLATION-STAGE-1

  3. Also the first rule in DOCKER-ISOLATION-STAGE-1 matches, jump to DOCKER-ISOLATION-STAGE-2

  4. The second rule in DOCKER-ISOLATION-STAGE-2 matches with target RETURN, so goes back to DOCKER-ISOLATION-STAGE-1, the second rule in DOCKER-ISOLATION-STAGE-2 matches with target RETURN, goes back to FORWARD chain, the fourth rule in FORWARD matches with target ACCEPT, FORWARFD hooks is done.

In FORWARD hook, we only check filter table, but there are other table: mangle as well, as it's empty, ignore checking it for simple

  1. Now packet reaches POSTROUTING HOOK at IP layer, the first rule matches with target MASQUERADE(special SNAT with output interface ip), create connection at conntrack table after SNAT, then packet gos out on eth0(host) with below info.

In POSTROUTING Hook, we only check nat table as well, but there are other tables: mangle as well, as it's empty, ignore checking it for simple

1
2
3
4
src mac: eth0(in host) mac
dst mac: gateway mac(10.117.7.253 mac)
src ip : eth0(in host) ip --->because SNAT(MASQUERADE)
dst ip: baidu ip

For short, eth0(container)—–>docker0(host)—routing—-postrouting SNAT(outgoing ip of interface)——physical eth0

Reply packet IN:
when reply packet comes back with below info

1
2
3
4
src mac: gateway mac(10.117.7.253)
dst mac: eth0(in host) mac
src ip: baidu ip
dst ip: eth0 (host) ip

Only ip and port for connection track entry, no mac address

1
2
3
$ conntrack -L
tcp 6 431982 ESTABLISHED src=10.226.143.201 dst=172.17.0.2 sport=2806 dport=8080 src=172.17.0.2 dst=10.226.143.201 sport=8080 dport=2806 [ASSURED] mark=0 use=1
tcp 6 196299 ESTABLISHED src=127.0.0.1 dst=127.0.0.1 sport=49816 dport=46457 src=127.0.0.1 dst=127.0.0.1 sport=46457 dport=49816 [ASSURED] mark=0 use=1

As the dst mac is eth0’s mac, you can image packet bypass bridge(as dst mac is local), then the next step is to check hooks in PREROUTING, as we already created a connection at conntrack table, hence we found that entry before checking rule in nat table(PREROUTING), as we found the connection track, skip nat table rules in PREROUTING(most for DNAT), As we found the connection in conntrack, do DNAT for the packet, after DNAT, packet with below info, actually the mac is not important now, as it reaches IP layer(PREROUTING).

1
2
3
4
src mac: gateway mac
dst mac: eth0(host) mac
src ip: baidu ip
dst ip: eth0(in container) ip(172.17.0.1)
  1. After PREROUTING(DNAT), looks up routing table with above info
    the third route matches, result: out iface docker0, then goes to FORWARD HOOKS(as dst ip is none of host’s port, forwarding)

  2. At FORWARD HOOK, the first rule matches, jump to DOCKER-ISOLATION-STAGE-1

  3. At DOCKER-ISOLATION-STAGE-1, the second rule matches with target RETURN, hence goes back to FORWARD, the second rule in FORWARD matches with target ACCEPT(terminated here)

  4. Check POSTROUTING rules, no one matches

  5. Goes down neighbor system neigh_xmit() which calls dev_hard_header() to add skb’s mac address with below information, then call dev_queue_xmit() which calls dev->ndo_start_xmit()(here dev is docker0)

1
2
3
4
src mac: docker0 mac
dst mac: eth0(container)172.17.0.1 mac
src ip: baidu ip
dst ip: eth0(container)172.17.0.1
  1. Now packets goes into bridge(through docker0->ndo_start_xmit==br_dev_xmit), as for docker0(management port), its ndo_start_xmit is br_dev_xmit which flood(for multicast, broadcast identified by dst mac) to all ports or forward packet to the specific port through which we see the dst mac, so here packet is forward to vethxx(update skb->dev = vethxx which is docker0), call its dev->ndo_start_xmit() which is veth_xmit() which sends packet to the other end(its peer), so eth0(in container) receives it. All right, it’s done.

command to set docker network

This is parameter for docker run to set network for a container

--network="bridge" : Connect a container to a network

  • ‘bridge’: create a network stack on the default Docker bridge

  • ‘none’: no networking

  • ‘container:<name|id>’: reuse another container’s network stack

  • ‘host’: use the Docker host network stack

  • ‘<network-name>|<network-id>’: connect to a user-defined network

As you can see docker can use network of another docker that means they share the same network(same network namespace, no new veth pair for this container as it shares with another container which already has one!!!

use default bridge network

By default, docker creates three networks, if you do not set network when you run a container bridge is used, you can check details about each network

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
d40edaa25285 bridge bridge local
22a73a59ecee host host local
d812e787ee49 none null local

$ docker network inspect bridge
[
{
"Name": "bridge",
"Id": "d40edaa25285c66d856252d9396eaa03167b108e2a00c80a41ec73ddf90bfbae",
"Created": "2019-10-18T17:30:49.803348286+08:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.17.0.0/16",
"Gateway": "172.17.0.2" #gateway for this bridge which docker0 in root namespace
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
# containers who uses this bridge and it's ip and mac
"Containers": {
"b10936e9e3f7a8703e8360ffb994e1736455d54ef77bd72f32c99d8be3573550": {
"Name": "nervous_lamport",
"EndpointID": "04dead82abb38c42c55c547c77ac10cf65701528c5ac5b05576b2d910fcc2e2f",
"MacAddress": "02:42:ac:11:00:01",
"IPv4Address": "172.17.0.1/16",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.bridge.default_bridge": "true",
"com.docker.network.bridge.enable_icc": "true",
"com.docker.network.bridge.enable_ip_masquerade": "true",
"com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
"com.docker.network.bridge.name": "docker0",
"com.docker.network.driver.mtu": "1500"
},
"Labels": {}
}
]

$ ifconfig docker0
docker0 Link encap:Ethernet HWaddr 02:42:32:26:c0:ce
inet addr:172.17.0.2 Bcast:172.17.255.255 Mask:255.255.0.0
inet6 addr: fe80::42:32ff:fe26:c0ce/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:21131 errors:0 dropped:0 overruns:0 frame:0
TX packets:21649 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1212359 (1.2 MB) TX bytes:66638333 (66.6 MB)

# show bridge info with brctl command
$ brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.02423226c0ce no vethd0a3d97

# check veth pair one in host, the other is in container
# vethd0a3d97 is the veth pair which is in the host bridge
# the peer is @if44(index is 44) which is in container

$ ip link
45: vethd0a3d97@if44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
link/ether 62:75:74:03:97:c1 brd ff:ff:ff:ff:ff:ff link-netnsid 0

$ docker exec -it b10936e9e3f7 ip link
44: eth0@if45: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:ac:11:00:01 brd ff:ff:ff:ff:ff:ff link-netnsid 0

# check the default route for this container, as you can see gateway is 172.17.0.2(docker0)
$ docker exec -it b10936e9e3f7 netstat -nrl
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 172.17.0.2 0.0.0.0 UG 0 0 0 eth0

Note:

  • docker network is separated from container, you can create it and rm it
  • veth paris are created and removed automatically when start or stop a container

Let’s create another container which still uses default bridge.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
$ brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.02423226c0ce no veth7e4c2a0
vethd0a3d97
#check veth pairs
$ ip link
45: vethd0a3d97@if44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
link/ether 62:75:74:03:97:c1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
51: veth7e4c2a0@if50: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
link/ether 6e:8a:8e:f8:ac:8c brd ff:ff:ff:ff:ff:ff link-netnsid 1

$ docker network inspect bridge
[
{
"Name": "bridge",
"Id": "d40edaa25285c66d856252d9396eaa03167b108e2a00c80a41ec73ddf90bfbae",
"Created": "2019-10-18T17:30:49.803348286+08:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.17.0.0/16",
"Gateway": "172.17.0.2"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"1128a3a6f4d0c3fe7f910253b368ec33f49e931765231f89a31bfa27174f2a19": {
"Name": "laughing_sanderson",
"EndpointID": "2fa47920e37e3b5956327a5a3d500b55d51743003f9a895a850f170d0bfc033f",
"MacAddress": "02:42:ac:11:00:03",
"IPv4Address": "172.17.0.3/16",
"IPv6Address": ""
},
"b10936e9e3f7a8703e8360ffb994e1736455d54ef77bd72f32c99d8be3573550": {
"Name": "nervous_lamport",
"EndpointID": "04dead82abb38c42c55c547c77ac10cf65701528c5ac5b05576b2d910fcc2e2f",
"MacAddress": "02:42:ac:11:00:01",
"IPv4Address": "172.17.0.1/16",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.bridge.default_bridge": "true",
"com.docker.network.bridge.enable_icc": "true",
"com.docker.network.bridge.enable_ip_masquerade": "true",
"com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
"com.docker.network.bridge.name": "docker0",
"com.docker.network.driver.mtu": "1500"
},
"Labels": {}
}
]

use customized bridge network

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# let create a bridge network with my_bridge from docker cli
$ docker network create --driver bridge my_bridge
35e2bf6f7cfce89c07c5ba6493c47fb1561be53c4e5d5f1364907a3b549c281a

# show docker network
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
d40edaa25285 bridge bridge local
22a73a59ecee host host local
35e2bf6f7cfc my_bridge bridge local
d812e787ee49 none null local

# show system bridge
$ brctl show
bridge name bridge id STP enabled interfaces
br-35e2bf6f7cfc 8000.0242e911394e no

# run a container with my_bridge
$ docker run -it -d --net my_bridge ubuntu:tool
983f5af3b0e39093e599d8cc1f169be8fd94e76b4c4dbbee5abadc41634ef6af

$ brctl show
bridge name bridge id STP enabled interfaces
br-35e2bf6f7cfc 8000.0242e911394e no veth83832ea

# inspect my_bridge
$ docker network inspect my_bridge
[
{
"Name": "my_bridge",
"Id": "35e2bf6f7cfce89c07c5ba6493c47fb1561be53c4e5d5f1364907a3b549c281a",
"Created": "2019-10-22T10:57:28.647218559+08:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "172.18.0.0/16",
"Gateway": "172.18.0.1" # gateway is different with docker0(another bridge)
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"983f5af3b0e39093e599d8cc1f169be8fd94e76b4c4dbbee5abadc41634ef6af": {
"Name": "frosty_murdock",
"EndpointID": "9f6c861df34fa5901ba460d411dd0cb38df545c58126d920d8479dc72911d472",
"MacAddress": "02:42:ac:12:00:02",
"IPv4Address": "172.18.0.2/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {}
}
]

# each bridge has a virtual device with bridge name(system name not bridge name used by docker)
$ ifconfig
br-35e2bf6f7cfc Link encap:Ethernet HWaddr 02:42:e9:11:39:4e
inet addr:172.18.0.1 Bcast:172.18.255.255 Mask:255.255.0.0
inet6 addr: fe80::42:e9ff:fe11:394e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:1186 (1.1 KB)

# check gw for the container
$ docker exec -it 983f5af3b0e3 netstat -nrl
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 172.18.0.1 0.0.0.0 UG 0 0 0 eth0