docker-network
Docker network
Overview
Docker’s networking subsystem is pluggable using drivers. Several drivers exist by default, and provide core networking functionality
bridge: The default network driver. If you don’t specify a driver, this is the type of network you are creating. Bridge networks are usually used when your applications run in standalone containers that need to communicate, will use a separate network namespace for this container, add veth pairs, one in the bridge, the other in the container.
host: For standalone containers, remove network isolation between the container and the Docker host, and use the host’s networking, use same network namespace(root) with host directly.
overlay and macvlan
none: For this container, disable all networking. Usually used in conjunction with a custom network driver. will use a separate network namespace but only with loopback interface.
More details about docker network, refer to
Bridge mode
As host and none driver are simple, host uses network namespace with host, while none uses a separate namespace but only with loopback interface
, host driver has high performance but less isolation, none for customer defining it own network.
So here let’s explain Bridge driver with more details and see how traffic goes in container and goes out with ping command in docker.
From simplicity, ignore arp, dns, $ ping baidu from container.
Request packet out:
- First container checks its routing table, it sees that for baidu(arp, dns ignore),it should send packet to gateway 172.17.0.2(
docker0[each bridge has such virtual device] which is in root network namespace
), as eth0(in container) and docker0 are in same subnet, so the request packet is sent by container with below info through vethpair(eth0 in container, the other end in bridge).
1 | src mac: eth0(in container) mac |
- The packet goes directly to vethx(as it’s the peer of eth0 in container)
- When packet reaches vethx, bridge CAM table is used to search which port should be sent to by checking the dst mac, as dst mac is docker0 mac, so
packet is sent to docker0 without any change(because it's bridge(switch))
. - When docker0 receives this packet with dst mac is itself, packet goes up(bypass the bridge) as it’s local, PREROUTING hook is called at IP layer for each table with priority(conntrack->nat), as no connection track for this request now, so conntrack table does not match, then check if any rule matches in nat table, still no match here. hence looking up routing table after PREROUTING.
there are raw mangle tables in PREROUTING hook, we did not check that because it's empty, just ignore it for simple, only check nat table at PREROUTING
By looking up routing table with dst ip(baidu’s ip), the default route is matched with gateway 10.117.7.253, out iface eth0(host), then goes to FORWARD phase(received packet that’s not for me, forward it out), check rules in that HOOK point.
The first rule in FORWARD(filter table) matches, jump to DOCKER-ISOLATION-STAGE-1
Also the first rule in DOCKER-ISOLATION-STAGE-1 matches, jump to DOCKER-ISOLATION-STAGE-2
The second rule in DOCKER-ISOLATION-STAGE-2 matches with target RETURN, so goes back to DOCKER-ISOLATION-STAGE-1, the second rule in DOCKER-ISOLATION-STAGE-2 matches with target RETURN, goes back to FORWARD chain, the fourth rule in FORWARD matches with target ACCEPT, FORWARFD hooks is done.
In FORWARD hook, we only check filter table, but there are other table: mangle as well, as it's empty, ignore checking it for simple
- Now packet reaches POSTROUTING HOOK at IP layer, the first rule matches with target MASQUERADE(special SNAT with output interface ip),
create connection at conntrack table after SNAT
, then packet gos out on eth0(host) with below info.
In POSTROUTING Hook, we only check nat table as well, but there are other tables: mangle as well, as it's empty, ignore checking it for simple
1 | src mac: eth0(in host) mac |
For short, eth0(container)—–>docker0(host)—routing—-postrouting SNAT(outgoing ip of interface)——physical eth0
Reply packet IN:
when reply packet comes back with below info
1 | src mac: gateway mac(10.117.7.253) |
Only ip and port for connection track entry, no mac address
1 | $ conntrack -L |
As the dst mac is eth0’s mac, you can image packet bypass bridge(as dst mac is local), then the next step is to check hooks in PREROUTING, as we already created a connection at conntrack table, hence we found that entry before checking rule in nat table(PREROUTING), as we found the connection track, skip nat table rules in PREROUTING(most for DNAT), As we found the connection in conntrack, do DNAT for the packet, after DNAT, packet with below info, actually the mac is not important now, as it reaches IP layer(PREROUTING).
1 | src mac: gateway mac |
After PREROUTING(DNAT), looks up routing table with above info
the third route matches,result: out iface docker0
, then goes to FORWARD HOOKS(as dst ip is none of host’s port, forwarding)At FORWARD HOOK, the first rule matches, jump to DOCKER-ISOLATION-STAGE-1
At DOCKER-ISOLATION-STAGE-1, the second rule matches with target RETURN, hence goes back to FORWARD, the second rule in FORWARD matches with target ACCEPT(terminated here)
Check POSTROUTING rules, no one matches
Goes down neighbor system
neigh_xmit()
which callsdev_hard_header()
to add skb’s mac address with below information, then calldev_queue_xmit()
which callsdev->ndo_start_xmit()
(here dev is docker0)
1 | src mac: docker0 mac |
- Now packets goes into bridge(through
docker0->ndo_start_xmit==br_dev_xmit
), as for docker0(management port), itsndo_start_xmit
isbr_dev_xmit
which flood(for multicast, broadcast identified by dst mac) to all ports or forward packet to the specific port through which we see the dst mac, so here packet is forward to vethxx(updateskb->dev = vethxx
which is docker0), call itsdev->ndo_start_xmit()
which isveth_xmit()
which sends packet to the other end(its peer), so eth0(in container) receives it. All right, it’s done.
command to set docker network
This is parameter for docker run to set network for a container
--network="bridge"
: Connect a container to a network
‘bridge’: create a network stack on the default Docker bridge
‘none’: no networking
‘container:<name|id>’: reuse another container’s network stack
‘host’: use the Docker host network stack
‘<network-name>|<network-id>’: connect to a user-defined network
As you can see docker can use network of another docker that means they share the same network(same network namespace, no new veth pair for this container as it shares with another container which already has one!!!
use default bridge network
By default, docker creates three networks, if you do not set network when you run a container bridge is used, you can check details about each network
1 | $ docker network ls |
Note:
- docker network is separated from container, you can create it and rm it
- veth paris are created and removed automatically when start or stop a container
Let’s create another container which still uses default bridge.
1 | $ brctl show |
use customized bridge network
1 | # let create a bridge network with my_bridge from docker cli |