A introduction to Channelbonding can be found here.

Description

Why?

Round-robin is the only way to get more than the speed of a single interface for a single TCP-connection. But it doesn't work as it is proposed to be. The Problems are the Switches which have a different transmission policy implemented for a tunrk/etherchannel. To prevent form out-of-order-delivery a MAC-based policy is used. Thus traffic from one node is always recieved on the same single port/interface of the other node.

One can work around this by using two or more (depending on the number ob interfaces bond together) separated physical Networks. This way the Switch doesn't need to know about the Channelbonding and only has one port assigned per MAC address.

Using more than 2 interfaces per node in a bond this setup gets very complexe.

A possible solution:
Using VLANs in stead of physicaly separated Networks.

Design

Read the VLAN description if you are not familar with VLANs. %IMAGE{"BondingVLANs2.jpg|thumb|300px|Diagram: Mixed Environment 1 and 2 nics"}% %IMAGE{"BondingVLANs4.jpg|thumb|300px|Diagram: Mixed Environment 1,2 and 4 nics"}%

n interfaces/node

For n interfaces per node the Layout is quite simple: n VLANs are created on the Switch. For each interface a virtual interface is created for a unique VLAN (same VLANs as on the Switch). Than these virtual interfaces are bond together. Finally the Switch is configured to deliver tagged packets of a single VLAN only to the single network-port of the node which has a VLAN-interface for that VLAN.

This way a packet send over interface 'b' is always recived at interface 'b' of the target node! As Packets are spread over all available interfaces on transmission, on the recive side packets are coming in on all interfaces. The Speed should scale linear with the number of interfaes. For each Group of 'b'-interfaces a kind of direct 'pipe' is created.

Mixed Environments

Physically separated Networks wouldn't be flexible enough to house node with different numbers of interfaces! -- Using VLANs, different interface configurations, are not a problem (if the number of interfaces is even or 1).

If n is the maximum number of interfaces per node on the network, n VLANs are created on the switch.
  • For a n-interfaces-node the configuration is the same as above.
  • On a node with less interfaces The n VLANs of the Switch are evenly attached to the physical interfaces. These virtual interfaces are bond together afterwards. This way there are more than one VLAN interface that share a single physical link, but as traffic is deliverd equally to all of them performance will not suffer. Benefit: The Server can use it's maximum bandwidth to serve multiple nodes at a single time, as traffic is spread equally over all available pipes .
  • On a node with a single link only. All VLANs are created on top of the same pyhsical interfaces. This causes some overhead, but the overall Networks performance is improved by equal usage of all pipes .

The Desing is illustrated in the two grafics on the right.

Notes

  1. All interfaces in the bond share the same MAC address. Thus all VLAN interfaces have have the same MAC address. As the VLAN interfaces are on top of the physical interfaces, the MAC addresses of the physical devices remain unchanged. To Revice packets of the VLANs (whose devices have all the MAC address of one physical interface) they have to switch to promisc mode where packets with different MAC addresses than the interfaces's one are not dropped by the driver. The Task moves to the kernel, largely increasing the overhead. To prevent this behaviour all physical interfaces have to inherit the MAC address of the first physical device.

Setup

Linux

Setting up a Working bond device using virtual interfaces is not trivial. There are many configurations to be set in the right order.

Calling sequence

  1. clean up!
  2. Load the bonding module: #modprobe bonding mode=0 miimon=100
  3. Check if eth0 is configured via DHCP
  4. Get IP and MAC address of eth0 ->
    1. Change eth1 's MAC address to eth0 's
    2. Change bond0 's MAC address to eth0 's
    3. Configure bond0 's IP and MAC address according to eth0 's and bring it up
  5. Create the VLANs: #vconfig add eth0 10 to #vconfig add eth1 40
  6. Add the virtual interfaces to the bond #ifenslave bond0 eth0.10 to eth1.30

The system should work right now.

Script

The Script's tasks:
  1. Get the IP-Address of eth0 . It will be used to generate the bond's IP.
  2. Get the MAC address of eth0
  3. Configure bond0 and eth1
  4. Create VLANs
  5. Attach VLAN interfaces to the bond

Solaris

As we were not able to get either VLANs nor Round-robin working on the Thumper no Setup-info is availabe right now.

Problems

  • Out of order delivery (Solution: bigger TCP Window, stacks, better reordering)

TO DO

Optimization

TO DO

Performance

Visit the NFS page for a first performance analysis

The following values have to be checked for a performance analysis:
  • Interrupts generated by out-of-order-delivery
  • Kernel CPU usage (VLAN-tagging + bonding-module)
  • Memory usage (Increased TCP-buffers for better performance)
  • Network Throughput
  • RTT, latecy

Lirst Overview

Performance compared to a Single Link:

Configuration application CPU sys: memory usage interupts/s throughput RTT
1 Link netperf (recive side) ~3% ~30k 940MB/s 100µs
NFS-mount Ramdisk (client)       120MB/s
2 Links (bond+VLANs) netperf (recive side) ~10% ~45k 1,90GB/s 130µs
NFS-mount Ramdisk (client)       230 to 245MB/s

There seem to be some more kernel load using Channelbonding and VLANs. The throughput scales linear with the number of links. Interupts seem to be a minor Problem. The CPU-percentage is measured for the whole system. 100% is maximum beeing all CPUs busy.

Measurement-tools

  • top (kernel CPU usage: CPU usage sys: )
  • vmstat 1 (for interupts and context changes, as well as Memory load)
  • Ifstat (trafic)
  • netperf (synthetic TCP-performance)
  • dd reading from an NFS-mounted RAM-disk (real TCP-performance)
  • ping for a first RTT impression

In general it is better not to use tools as vmstat , but read the information directly from the system-statistics located in the /proc/ Filesystem (especially /proc/stat , /proc/interupts and /proc/meminfo ).
Topic revision: r2 - 24 Jun 2009, JenniferSchenke
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback