TPROXY compatibility with Docker
Asked Answered
T

3

5

I'm trying to understand how TPROXY works in an effort to build a transparent proxy for Docker containers.

After lots of research I managed to create a network namespace, inject an veth interface into it and add TPROXY rules. The following script worked on a clean Ubuntu 18.04.3:

ip netns add ns0
ip link add br1 type bridge
ip link add veth0 type veth peer name veth1
ip link set veth0 master br1
ip link set veth1 netns ns0
ip addr add 192.168.3.1/24 dev br1
ip link set br1 up
ip link set veth0 up
ip netns exec ns0 ip addr add 192.168.3.2/24 dev veth1
ip netns exec ns0 ip link set veth1 up
ip netns exec ns0 ip route add default via 192.168.3.1
iptables -t mangle -A PREROUTING -i br1 -p tcp -j TPROXY --on-ip 127.0.0.1 --on-port 1234 --tproxy-mark 0x1/0x1
ip rule add fwmark 0x1 tab 30
ip route add local default dev lo tab 30

After that I launched a toy Python server from Cloudflare blog:

import socket

IP_TRANSPARENT = 19

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.setsockopt(socket.IPPROTO_IP, IP_TRANSPARENT, 1)

s.bind(('127.0.0.1', 1234))
s.listen(32)
print("[+] Bound to tcp://127.0.0.1:1234")
while True:
    c, (r_ip, r_port) = s.accept()
    l_ip, l_port = c.getsockname()
    print("[ ] Connection from tcp://%s:%d to tcp://%s:%d" % (r_ip, r_port, l_ip, l_port))
    c.send(b"hello world\n")
    c.close()

And finally by running ip netns exec ns0 curl 1.2.4.8 I was able to observe a connection from 192.168.3.2 to 1.2.4.8 and receive the "hello world" message.

The problem is that it seems to have compatibility issues with Docker. All worked well in a clean environment, but once I start Docker things start to go wrong. It seems like the TPROXY rule was no longer working. Running ip netns exec ns0 curl 192.168.3.1 gave "Connection reset" and running ip netns exec ns0 curl 1.2.4.8 timed out (both should have produced the "hello world" message). I tried restoring all iptables rules, deleting ip routes and rules generated by Docker and shutting down Docker, but none worked even if I didn't configure any networks or containers.

What is happening behind the scenes and how can I get TPROXY working normally?

Toting answered 29/8, 2019 at 12:25 Comment(0)
T
3

I traced all processes created by Docker using strace -f dockerd, and looked for lines containing exec. Most commands are iptables commands, which I have already excluded, and the lines with modprobe looked interesting. I loaded these modules one by one and figured out that the module causing the trouble is br_netfilter.

The module enables filtering of bridged packets through iptables, ip6tables and arptables. The iptables part can be disabled by executing echo "0" | sudo tee /proc/sys/net/bridge/bridge-nf-call-iptables. After executing the command, the script worked again without impacting Docker containers.

I am still confused though. I haven't understood the consequences of such a setting. I enabled packet tracing, but it seems that the packets matched the exact same set of rules before and after enabling bridge-nf-call-iptables, but in the former case the first TCP SYN packet got delivered to the Python server, in the latter case the packet got dropped for unknown reasons.

Toting answered 30/8, 2019 at 11:43 Comment(0)
L
3

I encountered the same problem and finally figured out why TProxy is incompatible with Docker.

By default, docker creates a bridge network for containers. Since a bridge is a layer two device, packets exchanged between containers are outside the scope of iptables. They are switched instead of routed. So docker relies on bridge-netfilter to enforce the iptable rules between bridged interfaces. Two excellent posts on StackExchange give a detailed summary of the history of bridge-netfilter.

Like chains on IP forwarding paths, Netfilter adds hooks on the data path of the bridge.bridge-netfilter calls iptable hooks on those bridge hooks, which are ordinarily called by IP layers in bridge code. A blog (http://devel.aanet.ru/linux-bridge/) explains how bridge hooks and ip hooks may be mixed and how bridge-netfilter ensure that each hook is called only once.

The case related to our problem is when a packet is sent from a bridged interface to a local process, i.e., from the docker to the proxy program running in the host. It traverses hooks NF_BR_PREROUTING (which calls NF_IENT_PRE_ROUTING) inside the bridge and goes into the host IP layer. From the perspective of the host IP layer, it's an ingress packet from the bridge interface, so it should call NF_IENT_PRE_ROUTING hooks again on this packet. To solve the problem, bridge-netfilter inserts a special hook that skips all NF_IENT_PRE_ROUTING hooks if it detects that the packet is switched from the bridge and therefore, these hooks have been called by the bridge.

TPROXY rules on the PREROUTING chain are called inside the bridge code before the packet enters the IP layer. But the ip_rcv_core function in the IP layer assumes that the Netfilter code is called after it, so it clears sock set by TPROXY rules in the sk_buff.

This is a known issue, and a patch has been sent to solve this problem.

But this patch is not welcomed as bridge-netfilter is not a beloved feature and is expected to be removed in the future once its functionality is replaced by tools like nftables. This paper introduces some progress on this topic:

In summary, we sadly cannot combine Docker with TPROXY unless one of the following good happens:

  • Docker deprecates bridge-netfilter.
  • kernel accepts patches fixing TPROXY compatibility with bridge-netfilter. (not very likely).

But one seemingly promising alternative is eBPF TPROXY support contributed by the cilium community:

Since it allows forwarding sockets to TPROXY on device layers (TC Ingress, more specifically), eBPF meets the same problem as bridge-netfilter (the selected sock is reset by ip_rcv_core). But it has successfully fixed the upstream. Though not verified, I think we can build a TPROXY-based transparent proxy for docker containers using an eBPF program.

Another choice is REDIRECT (DNAT) based transparent proxy which is carefully well considered and supported by bridge-netfilter.

Lungwort answered 1/3, 2023 at 17:45 Comment(1)
Great and detailed answer! Was scratching my head to figure out why TPROXY is only partially working (the container only receives packets sent from localhost but not LAN) with docker when the network_mode of the container is set to host... The problem goes away after disabling bridge-nf-call-iptables, which is enabled by default in Arch Linux.Plaided
P
0

Try running docker with -p 1234
"By default, when you create a container, it does not publish any of its ports to the outside world. To make a port available to services outside of Docker, or to Docker containers which are not connected to the container’s network, use the --publish or -p flag."

https://docs.docker.com/config/containers/container-networking/

Peipeiffer answered 29/8, 2019 at 13:32 Comment(2)
Thanks for the information, but everything was configured outside Docker. Docker was not used at all, and merely starting Docker daemon broke my TPROXY configuration.Toting
Ah, I see the problem. So you don't want docker. This was for running it in Docker.Peipeiffer

© 2022 - 2024 — McMap. All rights reserved.