TopHome
<2022-11-07 Mon>networkingk8s

How sidecars intercept traffic

If you are aware of Service Meshes in Kubernetes, like Istio, you would have heard of sidecars used to proxy traffic into the service mesh. Envoy is the most well known sidecar proxy, used by Istio and others, but there are also other contenders like Linkerd or Pipy.

Now, something has to be done to route traffic into the sidecar. All this while, I had simply abstracted out what must be happening under the word "proxy", without bothering to dive deeper, but it is more complicated than just that.

1. HTTP Proxy

Now, we can start with HTTP Proxies, given the more widespread nature of them. They are very easy to use - everyong has probably used them with browser settings, or the HTTP_PROXY env variable.

This is what a simple wget get request looks like:

wget localhost:8001
Data: GET / HTTP/1.1
User-Agent: Wget/1.19.5 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: localhost:8001
Connection: Keep-Alive

Now, the same request being proxied.

http_proxy=localhost:8001 wget localhost:8002
Data: GET http://localhost:8002/ HTTP/1.1
User-Agent: Wget/1.19.5 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: localhost:8002
Connection: Keep-Alive
Proxy-Connection: Keep-Alive

Essentially, the Host header is enough for a proxy to identify the destination to forward requests to.

2. TCP Proxy

Now, not all workloads are HTTP (including HTTPS in this bucket). Apart from other L7 protocols like RPC protocols, there could be simple direct TCP connections for different usecases.

How do we proxy a TCP connection?

The problem is obvious: there is no metadata pointing to where a connection should go to.

If you generally search the web for TCP proxies, you will realize that most TCP proxies are designed to proxy requests to a single destination, that is, a single fixed ip-port combination.

So, if you want to proxy connections to multiple destinations, you need to assign or configure a relevant port on the TCP proxy for each destination.

Clearly, this is problematic:

  1. This doesn't scale.
  2. You now need to correctly route source requests to the right port.

All in all, this simple approach to TCP proxying can't work. We should be able to listen at a single port and forward to different destinations. But, how do we do it?

3. Getting traffic to the TCP proxy

First, let's see how traffic is routed to the proxy. There is no environment variable that can do this obviously. In this Istio blog post, we see that the secret is actually IPTables.

Reproducing the most important rules relevant to outgoing connections from that blog post:

-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_OUTPUT ! -d 127.0.0.1/32 -o lo -j ISTIO_REDIRECT
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001

We can see that the logic is quite simple. All outgoing tcp connections are redirected to a single port 15001, which is managed by Envoy.

Aside, there are similar rules for all in-coming connections too:

-A PREROUTING -p tcp -j ISTIO_INBOUND
-A ISTIO_INBOUND -p tcp -m tcp --dport 80 -j ISTIO_IN_REDIRECT
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15001

which are selected by the local service port number.

So, now we see that all incoming and outgoing connections are being redirected to a single port on the Envoy proxy.

By the way, the story is not so simple. There is an alternative: TPROXY. This is an alternative approach, instead of using iptables REDIRECT, there is now different target called TPROXY which does something similar. The difference being that TPROXY does not actually change the packet destination address. However, you need to open the listening socket on your proxy with a particular option set, (SOL_IP, IP_TRANSPARENT) to be able to process these packets. (More on this later.)

But, what next? How does the proxy route these connections?

4. Understanding traffic at the TCP proxy

Istio's docs on traffic routing detail how information from various layers, like HTTP headers and TLS headers are used. However, we are interested in plain TCP traffic. The docs say that only the original destination/port are known to the proxy, but how?

The answer lies in this Stackoverflow QA: SO_ORIGINAL_DST from getsockopt.

The general structure of the getsockopt syscall is as follows:

getsockopt(fd, level, option, *data, *len)

You need to specify options using a combination of 2 parameters: level and option. There is a corresponding setsockopt that maybe more familiar to readers.

The basic socket level options are set using level SOL_SOCKET. For example, the option (SOL_SOCKET, SO_REUSEPORT) can be used to allow multiple listeners at the same port.

So, according to the Stack Overflow answer, we need to getsockopt(SOL_IP, SO_ORIGINAL_DST) to read the original destination of the packet. (Note: if instead of REDIRECT, we used TPROXY, according to Envoy docs, this should still work).

Consider this psuedo code at the proxy server:

socket = socket(options)
socket.bind(addr, port)
socket.listen()

while true:
  conn = socket.accept()
  # do something with this conn, usually in a different thread

For a normal server, these connections would be homogenous, different clients connecting. Here, for a proxy, the destination maybe different for each incoming connection.

The new code becomes:

while true:
  conn = socket.accept()
  destaddr = conn.getsockopt(SOL_IP, SO_ORIGINAL_DST)
  # process

As an aside, what are the available Socket option levels? What are the available options at each level? How do we know that such an option even exists? Different resources online come up with different values for both and very sparse explanations. Not good. The only answer I have found that I can trust, are man pages on my own machine. I discuss this in the next section.

But, we are not done yet. You might have wondered, if the REDIRECT works, why would the packets still contain the original destination? Didn't the packet get to the new destination? The answer is discussed in this issue: this information is obtained from elsewhere in the kernel, not from the packet or the socket.

Let us take a moment to appreciate this. Firstly, I don't have a better citation or proof for this point, except the comment in the issue. Nevertheless, I am going to accept this as truth.

This means that getsockopt(SOL_IP, SO_ORIGINAL_DST) is getting this information out-of-band. This means, as discussed in the linked issue, the actual REDIRECT has to happen in the same location as the running proxy. This means, as shown in the issue, the proxy cannot be in a different container (actually network namespace) as the REDIRECTion rules.

Given that this is so ugly, what is the clean way? From hints in various places, I surmise that with TPROXY, if you use getsockname, it returns the original destination (of the first SYN packet). While this is not proven to me, at least we know that with TPROXY, the flow has the original destination ip/port preserved, anyway.

So, with the TPROXY, option, the code could possible look like this:

socket = socket(options)
socket.setsockopt(SOL_IP, IP_TRANSPARENT)
socket.bind(addr, port)
socket.listen()

while true:
  conn = socket.accept()
  destaddr = conn.getsockname()
  # process

5. Socket Options

Looking at man pages of getsockopt and setsockopt is not exactly useful.

  • Looking at man 7 socket gives us a detailed listing of the Socket level, ie SOL_SOCKET options.
    • One of the options here allows attaching packet filters, both the classical BPF and the new eBPF.
  • Looking at man 7 ip tells us that the IP level is selected using SOL_IPPROTO_IP. What then of SOL_IP? Where is that defined? Not clear. At least this man page gives us a list of IP related options including the previously mentioned IPTRANSPARENT, though I can't see SOORIGINALDST anywhere. What then?
  • Looking at man 7 tcp, gives us a number of options under SOL_IPPROTO_TCP.
    • For example, there is an interesting option TCP_CONGESTION, allowing the selection of a congestion control algorithm.
  • There is a section under man 7 udp about SOL_IPPROTO_UDP, as expected.
  • Interestingly, man 7 packet talks about sending L2 packets on sockets. We have some options here under SOL_PACKET.
  • Also look at man 7 raw for raw L3 packet processing.

So, we have a nice listing of levels and options. But, I don't have the expected SOL_IP anywhere? What to do? Is that an alias or superseded by SOL_IPPROTO_IP?

6. Moral of the story

  1. TCP proxying is not as easy as HTTP proxying.
  2. There are at least 2 different ways to to TCP proxying:
    1. REDIRECT + getsockopt(SOL_SOCKET, SO_ORIGINAL_DST).
    2. TPROXY + setsockopt(SOL_IPPROTO_IP, IP_TRANSPARENT) + getsockname.
  3. You should use the second option if possible since it is the cleaner choice.
  4. Envoy so far uses the first option. There have been attempts to get the second option in, but this hasn't been taken in. See here.
  5. Socket options allow for a great level of control - but are poorly documented.