BPF/XDP test

purpose

We try to ingest 4 x 40Gbps on 4 Dualport NICs, clone the traffic, write one stream to disk and transparently transmit the traffic to 4 x 40Gpbs egress ports. We use Mellanox cards since the entire network infrastructure is based on Mellanox.

setting up an environment

To test, code and ideas are based on a BPF/XDP tutorial. See https://github.com/xdp-project/xdp-tutorial.git. BPF acts on the traffic coming into the kernel on the XDP level. We have Ethernet header information along with the content. Based on the header information we can decide how we redirect the traffic. In order to learn, how BPF works we set up test environments. A first simple test works entirely with virtual NICs and network namespaces. We'd like to redirect the traffic coming in one one virtual NIC to another virtual NIC.

We setup a left and a right virtual environment. The following commands need to be done as root or with sudo:

ip netns add left
ip netns add right
ip link add eth0 type veth peer name veth0
ip link add eth1 type veth peer name veth1

# put one end of each cable to the rooms

ip link set veth0 netns left
ip link set veth1 netns right

ip link set eth0 up
ip link set eth1 up
ip netns exec left ip link set dev lo up
ip netns exec left ip link set dev veth0 up
ip netns exec right ip link set dev lo up
ip netns exec right ip link set dev veth1 up

ip netns exec left ip addr add 10.0.0.1/24 dev veth0
ip netns exec right ip addr add 10.0.0.2/24 dev veth1
ip addr add 10.0.0.1/24 dev eth1

A minimal version of a redirection code, which does not further manipulation, reads:
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

#define bpf_printk(fmt, ...)                                    \
({                                                              \
               char ____fmt[] = fmt;                            \
               bpf_trace_printk(____fmt, sizeof(____fmt),       \
                                ##__VA_ARGS__);                 \
})

SEC("xdp_redirect")
int xdp_redirect_func(struct xdp_md *ctx)
{
        int action = XDP_PASS;
        unsigned ifindex = 9;
        action = bpf_redirect(ifindex, 0);
        bpf_printk("redirecting: ingress %d queue index: %d dest: %d\n", ctx->ingress_ifindex, ctx->rx_queue_index, ifindex);
        return XDP_REDIRECT;

}

SEC("xdp_pass")
int xdp_pass_func(struct xdp_md *ctx)
{

        bpf_printk("passing ingress %d\n", ctx->ingress_ifindex);
        return XDP_PASS;
}

char _license[] SEC("license") = "GPL";

The Makefile is:
LLC ?= llc
CLANG ?= clang

XDP_TARGETS  := xdp_prog_kern
XDP_C = ${XDP_TARGETS:=.c}
XDP_OBJ = ${XDP_C:.c=.o}

all: llvm-check $(XDP_OBJ)

llvm-check: $(CLANG) $(LLC)
        @for TOOL in $^ ; do \
                if [ ! $$(command -v $${TOOL} 2>/dev/null) ]; then \
                        echo "*** ERROR: Cannot find tool $${TOOL}" ;\
                        exit 1; \
                else true; fi; \
        done

$(XDP_OBJ): %.o: %.c  Makefile $(COMMON_MK) $(KERN_USER_H) $(EXTRA_DEPS)
        $(CLANG) -S \
            -target bpf \
            -D __BPF_TRACING__ \
            $(BPF_CFLAGS) \
            -Wall \
            -Wno-unused-value \
            -Wno-pointer-sign \
            -Wno-compare-distinct-pointer-types \
            -Werror \
            -O2 -emit-llvm -c -g -o ${@:.o=.ll} $<
        $(LLC) -march=bpf -filetype=obj -o $@ ${@:.o=.ll}

.PHONY: clean $(CLANG) $(LLC)

clean:
        rm -f *.ll
        rm -f $(XDP_OBJ)

To make this code run, we need to adapt the interface index, which should be the index of eth0:
id=$( ip address show dev eth0 |gawk -F ":" '/^[0-9]+:/ {print $1}' ) ; sed -i "s/ifindex = [0-9]*;/ifindex = $id;/" xdp_prog_kern.c

Compile the code with make and we are ready to go.

setting up BPF

We need to load the BPF blobs. At the XDP level both eth0 and veth0 in the left network name space need to get the xdp_pass section.
sudo ip link set dev eth0 xdp  obj ./xdp_prog_kern.o sec xdp_pass
sudo ip netns exec left ip link set dev veth0 xdp obj ./xdp_prog_kern.o sec xdp_pass

The right local NIC eth1 can load the either the redirect section or the pass section:
sudo ip --force link set dev eth1 xdp obj ./xdp_prog_kern.o sec xdp_redirect
or
sudo ip --force link set dev eth1 xdp obj ./xdp_prog_kern.o sec xdp_pass

which either triggers redirecting or passing the packages. The last two commands can be applied alternating.

the experiment

The right inner network name space NIC veth1 needs to constantly ping its peer in the normal space eth1:
sudo ip netns exec right ping 10.0.0.1

Listen, what arrives on the left inner network name space NIC veth0 by
sudo ip netns exec left tcpdump -l -i veth0

The bpf_printk function allows some communication from kernel land to user space:
sudo cat /sys/kernel/debug/tracing/trace_pipe

and we see what is going on.

Mutually change between between redirection and passing.

cloning

For cloning we need more. We need the helper function bpf_redirect_map along with AF_XDP sockets. See the advanced03-AF_XDP section in the tutorial.

See AF_XDP kernel documentation for details. It seems that a NIC writes into a user space UMEN buffer by circumventing the kernel network stack. An AF_XDP socket (XSK) is created in user space and accesses the buffer. Also two processes with two individual sockets can access the buffer, but only one NIC. The cloning process has to happen in user space.

Possible ways are:
  • we create two UMEM buffers, connected to different NICs. One process accesses the two buffers and reads from one and writes to the other one. The second process only reads from one buffer and writes the content to disks. See also io_uring.
  • we use bpf_clone_redirect on the TC level
  • we use only bpf_redirect or bpf_redirect_map on the XDP level and either redirect things to a NIC or write the stream to a disk. In this case the FBFUSE won't get the stream for the hour of recording.
  • Only redirection can also be done easily in user space by other tools.

The kernel documentation states a number of people working ob BPF. We possibly can ask one of them for some hints.

-- HenningFehrmann - 21 Dec 2021
Topic revision: r4 - 05 Jan 2022, HenningFehrmann
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback