Docker’s networking subsystem is pluggable using drivers. Several drivers exist by default, and provide core networking functionality
bridge: The default network driver. If you don’t specify a driver, this is the type of network you are creating. Bridge networks are usually used when your applications run in standalone containers that need to communicate, will use a separate network namespace for this container, add veth pairs, one in the bridge, the other in the container.
host: For standalone containers, remove network isolation between the container and the Docker host, and use the host’s networking, use same network namespace(root) with host directly.
overlay and macvlan
none: For this container, disable all networking. Usually used in conjunction with a custom network driver. will use a separate network namespace but only with loopback interface.
More details about docker network, refer to
Bridge mode
As host and none driver are simple, host uses network namespace with host, while none uses a separate namespace but only with loopback interface, host driver has high performance but less isolation, none for customer defining it own network.
So here let’s explain Bridge driver with more details and see how traffic goes in container and goes out with ping command in docker.
From simplicity, ignore arp, dns, $ ping baidu from container.
Request packet out:
First container checks its routing table, it sees that for baidu(arp, dns ignore),it should send packet to gateway 172.17.0.2(docker0[each bridge has such virtual device] which is in root network namespace), as eth0(in container) and docker0 are in same subnet, so the request packet is sent by container with below info through vethpair(eth0 in container, the other end in bridge).
1 2 3 4
src mac: eth0(in container) mac dst mac: docker0 mac src ip: eth0(container) ip: 172.17.0.1 dst ip: baidu's ip
The packet goes directly to vethx(as it’s the peer of eth0 in container)
When packet reaches vethx, bridge CAM table is used to search which port should be sent to by checking the dst mac, as dst mac is docker0 mac, so packet is sent to docker0 without any change(because it's bridge(switch)).
When docker0 receives this packet with dst mac is itself, packet goes up(bypass the bridge) as it’s local, PREROUTING hook is called at IP layer for each table with priority(conntrack->nat), as no connection track for this request now, so conntrack table does not match, then check if any rule matches in nat table, still no match here. hence looking up routing table after PREROUTING.
there are raw mangle tables in PREROUTING hook, we did not check that because it's empty, just ignore it for simple, only check nat table at PREROUTING
By looking up routing table with dst ip(baidu’s ip), the default route is matched with gateway 10.117.7.253, out iface eth0(host), then goes to FORWARD phase(received packet that’s not for me, forward it out), check rules in that HOOK point.
The first rule in FORWARD(filter table) matches, jump to DOCKER-ISOLATION-STAGE-1
Also the first rule in DOCKER-ISOLATION-STAGE-1 matches, jump to DOCKER-ISOLATION-STAGE-2
The second rule in DOCKER-ISOLATION-STAGE-2 matches with target RETURN, so goes back to DOCKER-ISOLATION-STAGE-1, the second rule in DOCKER-ISOLATION-STAGE-2 matches with target RETURN, goes back to FORWARD chain, the fourth rule in FORWARD matches with target ACCEPT, FORWARFD hooks is done.
In FORWARD hook, we only check filter table, but there are other table: mangle as well, as it's empty, ignore checking it for simple
Now packet reaches POSTROUTING HOOK at IP layer, the first rule matches with target MASQUERADE(special SNAT with output interface ip), create connection at conntrack table after SNAT, then packet gos out on eth0(host) with below info.
In POSTROUTING Hook, we only check nat table as well, but there are other tables: mangle as well, as it's empty, ignore checking it for simple
1 2 3 4
src mac: eth0(in host) mac dst mac: gateway mac(10.117.7.253 mac) src ip : eth0(in host) ip --->because SNAT(MASQUERADE) dst ip: baidu ip
For short, eth0(container)—–>docker0(host)—routing—-postrouting SNAT(outgoing ip of interface)——physical eth0
Reply packet IN: when reply packet comes back with below info
1 2 3 4
src mac: gateway mac(10.117.7.253) dst mac: eth0(in host) mac src ip: baidu ip dst ip: eth0 (host) ip
Only ip and port for connection track entry, no mac address
As the dst mac is eth0’s mac, you can image packet bypass bridge(as dst mac is local), then the next step is to check hooks in PREROUTING, as we already created a connection at conntrack table, hence we found that entry before checking rule in nat table(PREROUTING), as we found the connection track, skip nat table rules in PREROUTING(most for DNAT), As we found the connection in conntrack, do DNAT for the packet, after DNAT, packet with below info, actually the mac is not important now, as it reaches IP layer(PREROUTING).
1 2 3 4
src mac: gateway mac dst mac: eth0(host) mac src ip: baidu ip dst ip: eth0(in container) ip(172.17.0.1)
After PREROUTING(DNAT), looks up routing table with above info the third route matches, result: out iface docker0, then goes to FORWARD HOOKS(as dst ip is none of host’s port, forwarding)
At FORWARD HOOK, the first rule matches, jump to DOCKER-ISOLATION-STAGE-1
At DOCKER-ISOLATION-STAGE-1, the second rule matches with target RETURN, hence goes back to FORWARD, the second rule in FORWARD matches with target ACCEPT(terminated here)
Check POSTROUTING rules, no one matches
Goes down neighbor system neigh_xmit() which calls dev_hard_header() to add skb’s mac address with below information, then call dev_queue_xmit() which calls dev->ndo_start_xmit()(here dev is docker0)
1 2 3 4
src mac: docker0 mac dst mac: eth0(container)172.17.0.1 mac src ip: baidu ip dst ip: eth0(container)172.17.0.1
Now packets goes into bridge(through docker0->ndo_start_xmit==br_dev_xmit), as for docker0(management port), its ndo_start_xmit is br_dev_xmit which flood(for multicast, broadcast identified by dst mac) to all ports or forward packet to the specific port through which we see the dst mac, so here packet is forward to vethxx(update skb->dev = vethxx which is docker0), call its dev->ndo_start_xmit() which is veth_xmit() which sends packet to the other end(its peer), so eth0(in container) receives it. All right, it’s done.
command to set docker network
This is parameter for docker run to set network for a container
--network="bridge" : Connect a container to a network
‘bridge’: create a network stack on the default Docker bridge
‘none’: no networking
‘container:<name|id>’: reuse another container’s network stack
‘host’: use the Docker host network stack
‘<network-name>|<network-id>’: connect to a user-defined network
As you can see docker can use network of another docker that means they share the same network(same network namespace, no new veth pair for this container as it shares with another container which already has one!!!
use default bridge network
By default, docker creates three networks, if you do not set network when you run a container bridge is used, you can check details about each network
# show bridge info with brctl command $ brctl show bridge name bridge id STP enabled interfaces docker0 8000.02423226c0ce no vethd0a3d97
# check veth pair one in host, the other is in container # vethd0a3d97 is the veth pair which is in the host bridge # the peer is @if44(index is 44) which is in container
$ ip link 45: vethd0a3d97@if44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default link/ether 62:75:74:03:97:c1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
$ docker exec -it b10936e9e3f7 ip link 44: eth0@if45: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether 02:42:ac:11:00:01 brd ff:ff:ff:ff:ff:ff link-netnsid 0
# check the default route for this container, as you can see gateway is 172.17.0.2(docker0) $ docker exec -it b10936e9e3f7 netstat -nrl Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 172.17.0.2 0.0.0.0 UG 0 0 0 eth0
Note:
docker network is separated from container, you can create it and rm it
veth paris are created and removed automatically when start or stop a container
Let’s create another container which still uses default bridge.
$ brctl show bridge name bridge id STP enabled interfaces docker0 8000.02423226c0ce no veth7e4c2a0 vethd0a3d97 #check veth pairs $ ip link 45: vethd0a3d97@if44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default link/ether 62:75:74:03:97:c1 brd ff:ff:ff:ff:ff:ff link-netnsid 0 51: veth7e4c2a0@if50: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default link/ether 6e:8a:8e:f8:ac:8c brd ff:ff:ff:ff:ff:ff link-netnsid 1
# let create a bridge network with my_bridge from docker cli $ docker network create --driver bridge my_bridge 35e2bf6f7cfce89c07c5ba6493c47fb1561be53c4e5d5f1364907a3b549c281a
# show docker network $ docker network ls NETWORK ID NAME DRIVER SCOPE d40edaa25285 bridge bridge local 22a73a59ecee host host local 35e2bf6f7cfc my_bridge bridge local d812e787ee49 none null local
# show system bridge $ brctl show bridge name bridge id STP enabled interfaces br-35e2bf6f7cfc 8000.0242e911394e no
# run a container with my_bridge $ docker run -it -d --net my_bridge ubuntu:tool 983f5af3b0e39093e599d8cc1f169be8fd94e76b4c4dbbee5abadc41634ef6af
$ brctl show bridge name bridge id STP enabled interfaces br-35e2bf6f7cfc 8000.0242e911394e no veth83832ea
# each bridge has a virtual device with bridge name(system name not bridge name used by docker) $ ifconfig br-35e2bf6f7cfc Link encap:Ethernet HWaddr 02:42:e9:11:39:4e inet addr:172.18.0.1 Bcast:172.18.255.255 Mask:255.255.0.0 inet6 addr: fe80::42:e9ff:fe11:394e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:10 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:1186 (1.1 KB)
# check gw for the container $ docker exec -it 983f5af3b0e3 netstat -nrl Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 172.18.0.1 0.0.0.0 UG 0 0 0 eth0