The Data Plane Development Kit (DPDK) provides high-performance packet processing libraries and user space drivers. Open vSwitch* (OvS) is integrated with DPDK and provides an option to use a DPDK-optimized virtual host (vhost) path in OvS. Using OvS with DPDK (OvS-DPDK) provides a huge increase in network packet throughput and much lower latencies.
Several performance hot-spot areas inside OvS were also optimized using the DPDK packet processing libraries. For example, the forwarding plane has been optimized to run in user space as separate threads of the vswitch daemon (vswitchd). Implementation of DPDK-optimized vHost guest interface(s) allows for high-performance VM-to-VM (virtual machine to virtual machine) or PHY-VM-PHY (physical machine to virtual machine to physical machine) type use cases.
This article shows step-by-step how to configure OvS-DPDK for inter-VM application use cases. Specifically, we create an OvS vSwitch bridge with two DPDK vhost-user ports. Each port is hooked up to a separate VM. We then run a simple iperf3 throughput test to determine the performance. We compare the performance with that of a non-DPDK OvS configuration, so we can see how much improvement OvS-DPDK gives us.
We configure OvS-DPDK with two vhost-user ports and allocate them to two VMs. We then run a simple iPerf3* test case. The following diagram captures the setup.
The software prerequisites for this tutorial are shown in the table below. In addition, you will need a test machine with an Intel® processor equipped with Intel® Virtualization Technology (Intel® VT) for IA-32, Intel® 64 and Intel® Architecture (Intel® VT-x) and Intel® VT for Directed I/O (Intel® VT-d) in order to create and run a VM. The system we are using in this demo is a two-socket, 28 cores per socket, enabled server, giving us 56 cores total. The CPU model used is an Intel® Xeon® Platinum 8180 processor 2.50 GHz.
|Linux*||3.6 or newer|
|GCC* (GNU Compiler Collection)||4.9 or newer|
|QEMU*||2.2 or newer|
Follow these steps to prepare your system:
sudo dnf groupinstall "Development Tools" sudo dnf groupinstall "Virtualization" sudo dnf install qemu sudo dnf install automake tunctl kernel-tools pciutils hwloc numactl sudo dnf install libpcap-devel sudo dnf install numactl-devel sudo dnf install libtool
To start, we download and untar the DPDK in our home directory with the following commands:
wget http://fast.dpdk.org/rel/dpdk-17.08.1.tar.xz tar xf dpdk.tar.gz
To build the DPDK, run the following commands, which will configure the DPDK build, export an environment variable DPDK_DIR, and then build the DPDK.
cd dpdk-stable-17.08.1 export DPDK_DIR=`pwd`/build make config T=x86_64-native-linuxapp-gcc sed -ri 's,(PMD_PCAP=).*,\1y,' build/.config make
To build the OvS with DPDK we must first download the OvS and untar the file with the following commands:
wget http://openvswitch.org/releases/openvswitch-2.8.1.tar.gz tar -xzvf openvswitch-2.8.1.tar.gz
With the DPDK target environment built, we now can build it with DPDK support enabled. The standard documentation for OvS with DPDK build is the OvS with DPDK installation guide. Here we cover the basic steps.
cd openvswitch-2.8.1/ export OVS_DIR=`pwd` sudo ./boot.sh sudo ./configure --with-dpdk="$DPDK_DIR/" CFLAGS="-g -Ofast" sudo make 'CFLAGS=-g -Ofast -march=native' -j10
We now have full OvS built with DPDK support enabled. All the standard OvS utilities can be found under $OVS_DIR/utilities/, and OvS DB under $OVS_DIR/ovsdb/. We will use the utilities under these locations for the next steps.
Before we can start the OvS daemon “ovs-vswitchd”, we need to initialize the OvS DB and start ovsdb-server. The following commands show how to clear/create a new OvS DB and ovsdb_server instance.
sudo pkill -9 ovs sudo rm -rf /usr/local/var/run/openvswitch sudo rm -rf /usr/local/etc/openvswitch/ sudo rm -f /usr/local/etc/openvswitch/conf.db mkdir -p /usr/local/etc/openvswitch mkdir -p /usr/local/var/run/openvswitch cd $OVS_DIR sudo ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db ./vswitchd/vswitch.ovsschema sudo ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach sudo ./utilities/ovs-vsctl --no-wait init
To configure Fedora for optimal use of OvS-DPDK, we need to change the GRUB command-line options that are passed to Fedora at boot time for our system. To do this we edit the following config file:
Change the setting GRUB_CMDLINE_LINUX_DEFAULT to the following:
GRUB_CMDLINE_LINUX_DEFAULT="default_hugepagesz=1G hugepagesz=1G hugepages=16 hugepagesz=2M hugepages=2048 iommu=pt intel_iommu=on isolcpus=1-27,29-55"
This makes GRUB aware of the new options to pass to Fedora during boot time. We set
isolcpus so that the Linux* scheduler is restricted to two physical cores. Later, we will allocate the remaining cores to the DPDK. Also, we set the number of pages and page size for hugepages. For details on why hugepages are required and how they can help to improve performance, please see the explanation in the Getting Started Guide for Linux on dpdk.org.
isolcpus setting varies depending on how many cores are available per CPU.
After the file has been updated run the following commands:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg sudo reboot
A reboot applies the new settings. If you haven’t already done so, during the boot enter the BIOS and enable Intel® VT-x and Intel® VT-d
Once logged back into your Fedora session, you’ll create a mount path for your HugePages, one of default pagesize and the other of pagesize set to 2 MBs:
mkdir -p /mnt/huge mkdir -p /mnt/huge_2mb mount -t hugetlbfs hugetlbfs /mnt/huge mount -t hugetlbfs none /mnt/huge_2mb -o pagesize=2MB
To ensure that the changes are in effect, run the commands below:
grep HugePages_ /proc/meminfo cat /proc/cmdline
If the changes took place, your output from the above commands should look similar to the image below:
View HugePage tables.
Since the OvS daemon “ovs-vswitchd” and OvS database are not persistent between reboots, we must start them manually. To use VFIO (virtual function I/O), both the kernel and BIOS must support and be configured to use I/O virtualization. By enabling Intel® VT-d and loading the VFIO-PCI driver, I/O performance for the VMs will improve. For the data access by the VMs will bypass the hypervisor:
sudo modprobe vfio-pci sudo modprobe openvswitch cd $OVS_DIR sudo ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach sudo ./vswitchd/ovs-vswitchd unix:/usr/local/var/run/openvswitch/db.sock --pidfile --detach
The OvS database contains user set options for OvS and the DPDK. To pass in arguments to the DPDK we use the command-line utility as follows:
‘sudo ovs-vsctl ovs-vsctl set Open_vSwitch . <argument>’ .
To configure OvS to use DPDK, enter the following command:
sudo ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
Once the OvS is set up to use DPDK, we need to change one OvS setting and two important DPDK configuration settings.
pmd-cpu (poll mode drive-mask: PMD (poll-mode driver) threads can be created and pinned to CPU cores by explicitly specifying pmd-cpu-mask. These threads poll the DPDK devices for new packets instead of having the NIC driver send an interrupt when a new packet arrives.
dpdk-lcore-mask: Specifies the CPU cores on which dpdk lcore threads should be spawned. A hex string is expected. dpdk-socket-mem: Comma-separated list of memory to preallocate from hugepages on specific sockets.
The following commands are used to configure these settings:
cd $OVS_DIR sudo ./utilities/ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10000001 sudo ./utilities/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xffffffeffffffe sudo ./utilities/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024"
For dpdk-lcore-mask we used a mask of 0xffffffeffffffe to specify the CPU cores on which dpdk-lcore should spawn. In our system, we have the dpdk-lcore threads spawn on all cores, except cores 0 and 28. Those cores are reserved for the Linux* scheduler. Similarly, for the pmd-cpu-mask, we used the mask 0x10000001 to spawn 1 pmd thread for non-uniform memory access (NUMA) Node 0, and another pmd thread for NUMA Node 1. Lastly, since we have a two-socket system, we allocate 1 GB of memory per NUMA Node; that is, “1024, 1024”. For a single-socket system, the string would just be “1024”.
For our sample test case, we will create a bridge and add two DPDK vhost-user ports. To create an OvS bridge and two DPDK ports, run the following commands:
cd $OVS_DIR sudo ./utilities/ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev sudo ./utilities/ovs-vsctl add-port br0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuser sudo ./utilities/ovs-vsctl add-port br0 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuser
To ensure that the bridge and vhost-user ports have been properly set up and configured, run the command:
sudo ./utilities/ovs-vsctl show
If all is successful you should see output like the image below:
OvS show command output.
To bind your NIC device to the DPDK, you must run the dpdk-devbind.py command. For example, to bind eth1 from the current driver and move to use the vfio-pci driver, run dpdk-devbind.py --bind=vfio-pci eth1. To use the vfio-pci driver, run modsprobe to load it and its dependencies.
This is what it looked like on my system, with 2 x 10 Gb interfaces available:
sudo modprobe vfio-pci sudo cp $DPDK_DIR/usertools/dpdk-devbind.py /usr/bin/ sudo dpdk-devbind --bind=vfio-pci enp61s0f0
To check whether the NIC cards you specified are bound to the DPDK, run the command:
sudo dpdk-devbind.py --status
Output of script to bind the NICs.
Creating VMs is out of scope for this article, but a how-to introduction can be read more. Once we have two VMs created (in this example, virtual disks centos7vm1.qcow2 and centosvm2.qcow2), the following commands show how to use the DPDK vhost-user ports we created earlier.
Ensure that the QEMU* version on the system is v2.2.0 or above, as discussed under “DPDK vhost-user Prerequisites” in the OvS DPDK Install Guide.
sudo qemu-system-x86_64 -m 1024 -smp 4 -cpu host -hda /home/user/centos7vm1.qcow2 -boot c -enable-kvm -no-reboot -net none -nographic \ -chardev socket,id=char1,path=/run/openvswitch/vhost-user1 \ -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce \ -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 \ -object memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc sudo qemu-system-x86_64 -m 1024 -smp 4 -cpu host -hda /home/user/centosvm2.qcow2 -boot c -enable-kvm -no-reboot -net none -nographic \ -chardev socket,id=char2,path=/run/openvswitch/vhost-user2 \ -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce \ -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 \ -object memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc
In the previous step, we configured two VMs, each with a Virtio* NIC that is connected to the OvS-DPDK bridge.
Configure the NIC IP address on both VMs to be on the same subnet. Install iPerf3 from http://software.es.net/iperf, and then run a simple network test case. On one VM, start iPerf3 in server mode
iperf3 -s and run the iperf3 client on the other VM,
iperf3 –c server_ip. The network throughput and performance varies, depending on your system hardware capabilities and configuration.
iPerf performance using OvS-DPDK.
To configure two VMs with tap devices on non-DPDK OvS bridge (br0), refer to the instructions in the document Open vSwitch with KVM. Then start the VMs using the same images we used previously, for example:
iPerf performance for OvS without DPDK.
We can see that the OvS-DPDK transfer rate is roughly ~1.45x greater than OvS without DPDK.
Since the OvS daemon, device bind of NIC, hugetable mount, and drivers are not persistent between reboots, place the following commands in a shell script, and then run it after a reboot.
sudo modprobe vfio-pci sudo modprobe openvswitch sudo mount -t hugetlbfs hugetlbfs /mnt/huge sudo mount -t hugetlbfs none /mnt/huge_2mb -o pagesize=2MB cd $OVS_DIR sudo ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach sudo ./vswitchd/ovs-vswitchd unix:/usr/local/var/run/openvswitch/db.sock --pidfile --detach sudo ./utilities/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true sudo ./utilities/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0x123 sudo ./utilities/ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0x123 sudo ./utilities/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024" sudo dpdk-devbind.py --bind=vfio-pci <ethX>
Although Fedora 27 does not have OvS and DPDK packages in its repository, it is easy to build. In this article we discussed how to build OvS with DPDK, configure, and use OvS- DPDK for enhanced network throughput performance. We also covered how to configure a simple OvS-DPDK bridge with DPDK vhost-user ports for an inter-VM application use case. By becoming familiar with this simple use case, you’ll know how to deploy to physical hosts in a production environment.
Yaser Ahmed is a software engineer at Intel Corporation who has an MS degree in Applied Statistics from DePaul University and a BS degree in Electrical Engineering from the University of Minnesota.
Ashok Emani is a Senior Software Engineer at Intel Corporation with over 14 years of work experience spanning Embedded/Systems programming, Storage/IO technologies, Computer architecture, Virtualization and Performance analysis/benchmarking.
Los compiladores Intel pueden o no optimizar al mismo nivel para los microprocesadores que no son Intel en optimizaciones que no son exclusivas de los microprocesadores Intel. Estas optimizaciones incluyen los conjuntos de instrucciones SSE2, SSE3 y SSSE3, y otras optimizaciones. Intel no garantiza la disponibilidad, funcionalidad o eficacia de ninguna optimización en microprocesadores que no sean fabricados por Intel. Las optimizaciones dependientes del microprocesador en este producto fueron diseñadas para usarse con microprocesadores Intel. Ciertas optimizaciones no específicas de la microarquitectura Intel se reservan para los microprocesadores Intel. Consulte las guías de referencia y para el usuario para obtener más información acerca de los conjuntos de instrucciones específicos cubiertos por este aviso.
Revisión del aviso n.° 20110804