Published:12/13/2016 Last Updated:12/13/2016
Download PDF [1914 KB]
This article explains the link aggregation feature of Data Plane Development Kit (DPDK) ports on Open vSwitch* (OVS), and shows how to configure them. Link aggregation can be used for high availability, traffic load balancing and extending the link capacity using multiple links/ports. Link aggregation combines multiple network connections in parallel in order to increase the throughput beyond what a single connection could sustain, and to provide redundancy in case one of the links should fail. Link aggregation support for OVS-DPDK is available in OVS 2.4 and later.
Figure 1: OVS-DPDK link aggregation test setup
The test setup uses two hypervisors (physical host machines) both running OVS 2.6 with DPDK 16.7 and QEMU 2.6. The VMs (VM1 and VM2, respectively) running on each hypervisor are connected to a bridge named br0. The two hypervisors are connected to each other using an aggregated link consisting of two physical interfaces named dpdk0 and dpdk1. The member ports (dpdk0, dpdk1) on each host must have the same link properties, such as speed and bandwidth, to form an aggregated link. However, it is not necessary that the port names should be the same on both hosts to form an aggregated link. The VMs in each hypervisor can reach each other via the aggregated ports between the host machines.
At the time of writing, OVS considers each member port in an aggregated port as an independent OpenFlow* port. When a user issues the following command to see the available OpenFlow ports in OVS-DPDK, the member ports are displayed separately, without any bond interface information.
ovs-ofctl show br0
This makes it impossible to program OpenFlow rules on bond ports, and also limits the OVS to operate only in the NORMAL action mode. In the NORMAL action mode, OVS operates like a traditional MAC learning switch.
The following link aggregation modes are supported in OVS with DPDK.
Active/standby failover mode is where one of the ports in the link aggregation port is active and all others are in standby mode. One MAC address (MAC address of the active link) is used as the MAC address of the aggregated link.
Note: No traffic load balancing is offered in this mode.
Load balance the traffic based on source MAC and VLAN. This mode uses a simple hashing algorithm on source MAC and VLAN to choose the port in an aggregated link to forward the traffic. This mode is a simple static link aggregation similar to the mode-2 bonds in Linux* bonding driver1.
The preferred load-balancing mode. It uses 5-tuple (source and destination IP, source and destination port, protocol) to balance traffic across the ports in an aggregated link. This mode is similar to mode-4 bonds in Linux bonding driver1. It uses Link Aggregation Control Protocol (LACP)2 for signaling/controlling the link aggregation between switches. LACP offers high resiliency for link failure detection and additional diagnostic information about the bond. It is observed that the balance-tcp is less performant due to the overhead of hashing on more header fields when compared to the balance-slb.
The test setup uses two identical host machines with the following configuration:
Hardware: Intel® Xeon® processor E5-2695 V3 product family, Intel® Server Board S2600WT2, and Intel® 82599ES 10-G SFI/SFP+ (rev 01) NIC.
Software: Ubuntu* 16.04, Kernel version 4.2.0-42-generic, OVS 2.6, DPDK 16.07, and QEMU2.6.
To test the configuration, make sure iPerf* is installed on both VMs. iPerf can be run in client mode or server mode.
To set up the link aggregation, run the following commands on each hypervisor (physical host 1 and physical host 2):
ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
ovs-vsctl add-bond br0 dpdkbond1 dpdk0 dpdk1
-- set Interface dpdk0 type=dpdk \
-- set Interface dpdk1 type=dpdk
ovs-vsctl add-port br0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuser
ovs-ofctl del-flows br0
ovs-ofctl add-flow br0 actions=NORMAL
ovs-vsctl set port dpdkbond1 bond_mode=active-backup
ovs-appctl bond/show
ip addr flush eth0
ip addr add <ip-addr> dev eth0
In this example, 10.0.0.1/24 and 10.0.0.5/24 are the <ip-addr> for VM1 and VM2, respectively.
ip link set dev eth0 up
iperf –s –p 8080
iperf -c 10.0.0.01 -p 8080
After 10 seconds the client shows a series of results for traffic between VM1 and VM2, which is similar to the following, Figure 2, though the numbers may vary.
Figure 2: Screenshot of iPerf client on VM2 in Active-Backup mode
Only the active port in the bond interface is used for traffic forwarding. In this example, the stats of dpdk1 (port: 2, active port) shows all traffic is on dpdk1. A small number of packets on port dpdk0 (port: 1) are related to link negotiation. The OpenFlow port numbers assigned to ports dpdk0 and dpdk1 are port: 1 and port: 2, respectively.
Figure 3: OpenFlow port statistics on physical host-1 in Active-Backup mode
ovs-vsctl set port dpdkbond1 bond_mode=balance-slb
ovs-appctl bond/show
ip link add link eth0 name eth0.10 type vlan id 10
ip link add link eth0 name eth0.20 type vlan id 20
ip addr flush eth0
ip addr flush eth0.10
ip addr add <ip-addr1> dev eth0.10
10.0.0.1/24 and 10.0.0.5/24 are the <ip-addr1> for VM1 and VM2, respectively, for the logical interface eth0.10.
ip addr flush eth0.20
ip addr add <ip-addr2> dev eth0.20
20.0.0.1/24 and 20.0.0.5/24 are the <ip-addr2> for VM1 and VM2, respectively, for the logical interface eth0.20.
ip link set dev eth0.10 up
ip link set dev eth0.20 up
iperf –s –u –p 8080
iperf -c 10.0.0.5 -u -p 8080 –b 1G
iperf -c 20.0.0.5 -u -p 8080 –b 1G
In this example each stream uses a separate port in the bond interface for the traffic. The port stats show the same.
Figure 4: OpenFlow port statistics at physical host-1 in balance-slb mode
ovs-vsctl set port dpdkbond1 bond_mode=balance-tcp
ovs-vsctl set port dpdkbond1 lacp=active
Disabling LACP will fall back the balance-tcp bond interface to the default mode (active-standby). To disable LACP on bond interface:
ovs-vsctl set port dpdkbond1 lacp=passive
ovs-appctl bond/show
ip addr flush eth0
ip addr add <ip-addr> dev eth0
In this example, 10.0.0.1/24 and 10.0.0.5/24 are the <ip-addr> for VM1 and VM2, respectively.
ip link set dev eth0 up
iperf –s –p 9000
iperf -c 10.0.0.5 -p 9000
iperf -c 10.0.0.5 -p 9000
The two independent TCP streams are load balanced between two ports in the bond interface as the iPerf client uses different source ports for each stream.
Figure 5: Screenshot of iPerf server on VM2 in balance-tcp mode.
The statistics of bond member ports (highlighted in Figure 6) show that the streams are balanced between the ports.
Figure 6: OpenFlow port statistics on physical host-1 in balance-tcp mode
ovs-vsctl set port dpdkbond1 lacp=passive
ovs-vsctl set port dpdkbond1 lacp=off
ovs-vsctl set port dpdkbond1 other_config:lacp-fallback-ab=true
ovs-vsctl set port dpdkbond1 other_config:lacp-time=fast
ovs-vsctl set port dpdbond1 other_config:lacp-time=slow
ovs-vsctl set port dpdkbond1 other_config:bond_updelay=1000
ovs-vsctl set port dpdkbond1 other_config:bond-rebalance-interval=10000
ovs-appctl bond/show
ovs-appctl bond/show dpdkbond0
The following bond interface information is displayed for the given test setup in a balance-tcp mode.
Figure 7: ‘bond show’ on physical host in balance-tcp mode
Link aggregation is a useful method for combining multiple links to form a single (aggregated) link. The main features of link aggregation are:
OVS-DPDK offers three modes of link aggregation:
Sugesh Chandran is a network software engineer with Intel. His work is primarily focused on accelerated software switching solutions in the user space running on Intel® architecture. His contributions to Open vSwitch with DPDK include tunneling acceleration and enabling hardware acceleration in OVS-DPDK.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804