Category:Kernel
Category:Network
%IMAGE{"Channelbonding-Example.jpg|thumb|300px|Channelbonding (using HP-Switches) "}%
Finally the Channelbonding-Documentation has been rewritten. I hope its new structure helps answering the questions! Some illustrations and performance measurements will be added soon
Matthias Linden 05:13, 3 October 2007 (CEST)
Concept
Channelbonding, trunking, etherchannel or
port-aggregation are all terms to describe a construction where network traffic is split over multiple physical links. For example multiple links that are used to connect two switches to achieve a higher backbone or redundancy.
Implentations
- Linux: The bonding module makes it possible to use multiple physical interfaces of a node as one single virtual interface.
- Sun Solaris: An other name, some differences, very much the same as linux bonding.
- Switches (Cisco,HP): Possibility to aggregate inter switch links to a fatter pipe or configure the switch to communicate with nodes, that use channelbonding.
Protocols
In general the configuration has to be done manually on all sides of a connection (Nodes and Switches). Apart from this every vendor uses it's own proprietary implementation to automate the channelbonding.
LACP
LACP (
Link Aggregation Control Protocol ) or IEEE 802.3ad is used to automatically configure a trunking-setup. The Switch is sensitive to special LACP-packets send out by the interfaces of a node. According to them the switch knows which ports can be merged to a virtual interface.
As a result of its basic design, LACP prevents
out-of-order-delivery of packages.
Both Linux an Solaris support LACP as well as HP,Cisco and many other. Nevertheless, every vendor adds its own extensions to the standard (For example
Cisco or
Sun).
hash-policy
The hash-policy decides which interface or port is used to transfer apPacket. It has significant influence on the throughput, reliability and functionality of the Bond/Tunk/LinkAggregation.
- round-robin is not a "real" hash-policy, as packets are just striped over the available interfaces.
- fail-over isn't a hash-policy either, as packets are transmitted over one single link as long as it is up and migrates to an other one if the first one fails.
- Layer2 or L2: A hash is generated over the src/dest MAC-Addresses of the ethernet frame. In general: (srcMAC XOR destMAC)%(# of interfaces)
- Layer3 or L3: The hash is generated over src/dest IP-Addresses.
- Layer4 or L4: The hash is generated over the TCP-header. This way src/dest IP and src/dest port have influence on the interface/port used.
important Notes
Using every mode apart from round-robin, packets of a single TCP or UDP stream will always pass the same port/interface, as the hash value stays the same! That is why even with multiple physical links in use the throughput of NFS, rsync or similar applications using one data stream will never be greater than the speed of one single interface. - Round-robin or using multiple streams could increase the speed.
Level3 and Level4 are not fully LACP conform, because out-of-order-delivery may occur in some cases.
Even though two communicating nodes use round-robin on their interfaces it is not guaranteed that the bandwidth is increased, because it depends on the trunking hash-policy of the Switch in between.
round-robin
Round-robin or striping is a concept for splitting off traffic over multiple interfaces. The stream's packets are send out sequentially to the interfaces: First packet to first the first interface, second packet to the second one and so on.
This way the load is equally spread over all available interfaces.
As the quality of the paths, where the traffic is going through, may vary, there might be problems with out-of-order delivery of packages, especially
TCP traffic.
Increasing the value for the interfaces' buffer-size and the number of wrong-ordered-packages-before-a-retransmit would help reducing the effects of out-of-order-delivery, increasing the throughput and reducing the CPU-load.
Switches
As switches work by directing packets to a port depending on the destination MAC address. They have a problem with nodes connected over multiple interfaces configured as a bond, because they all share the same MAC-address. Telling the switch that a single MAC-address can be found behind multiple ports by trunking them together, this problem is solved. Which port is used on transmission depends on the
hash-policy for the trunk.
As trunking only takes effect on the receive side of the connection, performance will increase transfer rates to multiple nodes. The speed to one node is at maximum the speed of on line, but sending to multiple nodes increases the outgoing traffic to the theoretical maximum.
HP
Channelbonding is called
Trunking . Every port can be member of a
trunk, a group of physical links forming one virtual link. They use a Level2 or Level3 (only in routing mode) hash_policy in conjunction with some load-balancing-magic. Ports can also be configured "LACP sensitive" to automatically detect ports that can be used for a trunk.
It works fine for the 10Gbit-links of the
ProCurve2900 resulting in a 20Gbit/s "fat pipe".
As HP uses a MAC-address based hash policy, a stream from one node to an other will always flow over the same, single outgoing port of the Switch, regardless of the configuration of the nodes.
To achieve higher transfer-rates than the speed of one single line, one has to use a workaround using VLANs.
Cisco
The tested Cisco Switch supports LACP and
etherchannel (=trunking) per port. Level2 to Level3 can be selected as hash-policies.
other Vendors
Some older Switches, from the time before LACP was created, support round-robin-like algorithms for their trunking.
Linux
The linux bonding module enables channelbonding on a node.
layout
The virtual interface on top of the bonded interfaces is provided via
bond0 . The member-interfaces have to be added to this bond using the provided script
ifenslave (compile from the kernel-source or via apt-get). The bond-device inherits the MAC-address of the first interface that is attached to the bond.
In
ifconfig the bonded interfaces have a
SLAVE -flag whereas the bond0 carries a
MASTER -flag.
configuration
# modprobe bonding mode=0 miimon=100 xmit_hash_policy=1
The mode configures the transmission strategy / hash_policy. mode can be:
- 0: round-robin
- 1: fail-over, one link is used, all other links are backup
- 2: XOR-hash-policy
- 4: LACP, configure xmit_hash_policy to 0 (Level0) or 1 (Level3+Level4)
- 5: balance-tlb: Adaptive send load balance without the need of special Switch support. (I didn't get this working properly)
- 6: balance-abl: Same as 5 but adds receive load balance doing some arp-magic.
miimon is millisecond-interval between MII checks whether the bonded interfaces are still up. One can also use arp checks to a target host, which is better for fail-over configurations.
ifenslave
Ifenslave is the Script to attach/detach interfaces to a bond. It can be obtained via
apt-get install ifenslave on Debian/Ubuntu or build directly from the kernel sources.
To attach interfaces, first bring them down:
# ifconfig eth0 down
# ifconfig eth1 down
Then configure the bond by hand and bring it up:
# ifconfig bond0 192.168.x.y netmask 255.255.0.0
and add eth0 and eth1 to it
# ifenslave bond0 eth0
# ifenslave bond0 eth1
Now the bond is configured and consists of the two physical interfaces eth0 and eth1.
Notes (Except for mode 5 or 6):
- The bond0 interface inherits it's MAC-address from the first interface that is added to it. All attached interfaces share the bond's MAC-address.
- If a special MAC-address is required for the bonding interface for DHCP or the VLAN workaround it can be set via ifconfig bond0 hw ether .
To detach interfaces from the bond use:
# ifenslave -d bond0 eth0
Try
# cat /proc/net/bonding/bond0
to get further information on the current configuration.
If the nodes have two interfaces each, traffic over a bond is send out with approximately
2GBit/s, if the traffic goes to two or more destination host.
limitations
As the throughput of one stream from one node to an other over a switch is limited by the hash-policy of the switch, which in general is Layer2, to the maximum speed of one links only, a performance increase only occurs transferring to multiple nodes.
workarounds
To achieve more than the speed of a single line for a node to node transfer over a switch some workarounds have to be applied:
The Main article on that subject can be found here . For further information visit the
VLAN-article.
Every network interface of a single node is configured to send out/receive
VLAN-tagged packets, the first nic sends out to
VLAN 1, the second to VLAN2 an so on. The switch is configured to deliver
VLAN-tagged packets only to one port per node per
VLAN. This way a MAC address is only present
once per
VLAN and the switch doesn't have
the problem described above. The nodes are configured to use round-robin to transfer the packets. The switch doesn't know anything about Trunks and it's particular hash_policy is worked-around.
As the Traffic is evenly spread over the outgoing links and an assignment of a packat to a lane(
VLAN) is not broken by the switch, the Traffic will be recived over all available incoming links of the destination node.
known problems
mode 0 round-robin
As packets are striped there is no guarantee that they are recived in order.
mode 5,6 balanced-tlb and -alb
The general layout of tlb and alb is the same. The interfaces keep their MAC-addresses, such that no switch configuration is required.
- On some nodes the ifenslave script causes the kernel to crash. On a newer Kernel (>2.6.20) it works fine.
- If the first link (natively even MAC address) fails the second inherits the (even) MAC address. If the first nic comes back, the addresses are not swapped again. Unplugging the second nic causes a reswap. As the MAC address is set on the hardware-layer, the IPMI-card (attached to the first nic) is no longer reachable for two reasons:
- For access from the network the hardware address of the first nic has changed.
- As the IPMI-configuration requires a MAC-address set to work properly, no connection to the outside world can be established.
- As alb is done by ARP-negotiation a lot of arp-packets are required to maintain the receive-balance.
- In general alb requires additional system-memory.
Solaris
Solaris fully supports LACP. Hash-policies can be set to L2, L3 or L4, referring to the communication layer the hash is generated from. That meanes traffic from one node to another is alwayes
limited to the
maximum speed of one link for a single
TCP-connection. Round-robin doesn't seem to be available.
Visit our
Solaris-page for a description how to set up a bond on a SUN box.
Sun Trunking
The sun trunking software allowes round-robin as hash-policy.
Configuration seemsto be done via
nettr . Haven't tested it yet.
#nettr ... policy=2 ...
links
Linux-Bonding detailed description