Using Open vSwitch* with DPDK for Inter-VM NFV Applications

Overview

The Data Plane Development Kit (DPDK) provides high-performance packet processing libraries and user space drivers. Open vSwitch* (OvS) is integrated with DPDK and provides an option to use a DPDK-optimized virtual host (vhost) path in OvS. Using OvS with DPDK (OvS-DPDK) provides a huge increase in network packet throughput and much lower latencies.

Several performance hot-spot areas inside OvS were also optimized using the DPDK packet processing libraries. For example, the forwarding plane has been optimized to run in user space as separate threads of the vswitch daemon (vswitchd). Implementation of DPDK-optimized vHost guest interface(s) allows for high-performance VM-to-VM (virtual machine to virtual machine) or PHY-VM-PHY (physical machine to virtual machine to physical machine) type use cases.

This article shows step-by-step how to configure OvS-DPDK for inter-VM application use cases. Specifically, we create an OvS vSwitch bridge with two DPDK vhost-user ports. Each port is hooked up to a separate VM. We then run a simple iperf3 throughput test to determine the performance. We compare the performance with that of a non-DPDK OvS configuration, so we can see how much improvement OvS-DPDK gives us.

We configure OvS-DPDK with two vhost-user ports and allocate them to two VMs. We then run a simple iPerf3* test case. The following diagram captures the setup.


Test configuration.

Requirements

The software prerequisites for this tutorial are shown in the table below. In addition, you will need a test machine with an Intel® processor equipped with Intel® Virtualization Technology (Intel® VT) for IA-32, Intel® 64 and Intel® Architecture (Intel® VT-x) and Intel® VT for Directed I/O (Intel® VT-d) in order to create and run a VM. The system we are using in this demo is a two-socket, 28 cores per socket, enabled server, giving us 56 cores total. The CPU model used is an Intel® Xeon® Platinum 8180 processor 2.50 GHz.

SoftwareVersion
Linux*3.6 or newer
GCC* (GNU Compiler Collection)4.9 or newer
QEMU*2.2 or newer

Install the Prerequisites

Follow these steps to prepare your system:

sudo dnf groupinstall "Development Tools"
sudo dnf groupinstall "Virtualization"
sudo dnf install qemu
sudo dnf install automake tunctl kernel-tools pciutils hwloc numactl
sudo dnf install libpcap-devel
sudo dnf install numactl-devel
sudo dnf install libtool

Building DPDK

To start, we download and untar the DPDK in our home directory with the following commands:

wget http://fast.dpdk.org/rel/dpdk-17.08.1.tar.xz
tar xf dpdk.tar.gz

To build the DPDK, run the following commands, which will configure the DPDK build, export an environment variable DPDK_DIR, and then build the DPDK.

cd dpdk-stable-17.08.1
export DPDK_DIR=`pwd`/build
make config T=x86_64-native-linuxapp-gcc
sed -ri 's,(PMD_PCAP=).*,\1y,' build/.config
make

Building OvS-DPDK

To build the OvS with DPDK we must first download the OvS and untar the file with the following commands:

wget http://openvswitch.org/releases/openvswitch-2.8.1.tar.gz
tar -xzvf openvswitch-2.8.1.tar.gz

With the DPDK target environment built, we now can build it with DPDK support enabled. The standard documentation for OvS with DPDK build is the OvS with DPDK installation guide. Here we cover the basic steps.

cd openvswitch-2.8.1/
export OVS_DIR=`pwd`
sudo ./boot.sh
sudo ./configure --with-dpdk="$DPDK_DIR/" CFLAGS="-g -Ofast"
sudo make 'CFLAGS=-g -Ofast -march=native' -j10

We now have full OvS built with DPDK support enabled. All the standard OvS utilities can be found under $OVS_DIR/utilities/, and OvS DB under $OVS_DIR/ovsdb/. We will use the utilities under these locations for the next steps.

Create OvS DB and Start OvS DB-Server

Before we can start the OvS daemon “ovs-vswitchd”, we need to initialize the OvS DB and start ovsdb-server. The following commands show how to clear/create a new OvS DB and ovsdb_server instance.

sudo pkill -9 ovs
sudo rm -rf /usr/local/var/run/openvswitch
sudo rm -rf /usr/local/etc/openvswitch/
sudo rm -f /usr/local/etc/openvswitch/conf.db
mkdir -p /usr/local/etc/openvswitch
mkdir -p /usr/local/var/run/openvswitch
cd $OVS_DIR
sudo ./ovsdb/ovsdb-tool create /usr/local/etc/openvswitch/conf.db ./vswitchd/vswitch.ovsschema
sudo ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach
sudo ./utilities/ovs-vsctl --no-wait init

Configure Fedora* 27 for OvS-DPDK

To configure Fedora for optimal use of OvS-DPDK, we need to change the GRUB command-line options that are passed to Fedora at boot time for our system. To do this we edit the following config file:

/etc/default/grub

Change the setting GRUB_CMDLINE_LINUX_DEFAULT to the following:

GRUB_CMDLINE_LINUX_DEFAULT="default_hugepagesz=1G hugepagesz=1G hugepages=16 hugepagesz=2M hugepages=2048 iommu=pt intel_iommu=on isolcpus=1-27,29-55"

This makes GRUB aware of the new options to pass to Fedora during boot time. We set isolcpus so that the Linux* scheduler is restricted to two physical cores. Later, we will allocate the remaining cores to the DPDK. Also, we set the number of pages and page size for hugepages. For details on why hugepages are required and how they can help to improve performance, please see the explanation in the Getting Started Guide for Linux on dpdk.org.

Note: The isolcpus setting varies depending on how many cores are available per CPU.

After the file has been updated run the following commands:

sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo reboot

A reboot applies the new settings. If you haven’t already done so, during the boot enter the BIOS and enable Intel® VT-x and Intel® VT-d

Once logged back into your Fedora session, you’ll create a mount path for your HugePages, one of default pagesize and the other of pagesize set to 2 MBs:

mkdir -p /mnt/huge
mkdir -p /mnt/huge_2mb
mount -t hugetlbfs hugetlbfs /mnt/huge
mount -t hugetlbfs none /mnt/huge_2mb -o pagesize=2MB

To ensure that the changes are in effect, run the commands below:

grep HugePages_ /proc/meminfo
cat /proc/cmdline

If the changes took place, your output from the above commands should look similar to the image below:


View HugePage tables.

Configuring OvS-DPDK Settings

Since the OvS daemon “ovs-vswitchd” and OvS database are not persistent between reboots, we must start them manually. To use VFIO (virtual function I/O), both the kernel and BIOS must support and be configured to use I/O virtualization. By enabling Intel® VT-d and loading the VFIO-PCI driver, I/O performance for the VMs will improve. For the data access by the VMs will bypass the hypervisor:

sudo modprobe vfio-pci
sudo modprobe openvswitch
cd $OVS_DIR
sudo ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach
sudo ./vswitchd/ovs-vswitchd unix:/usr/local/var/run/openvswitch/db.sock --pidfile --detach

The OvS database contains user set options for OvS and the DPDK. To pass in arguments to the DPDK we use the command-line utility as follows:

‘sudo ovs-vsctl ovs-vsctl set Open_vSwitch . ​<argument>’ .

To configure OvS to use DPDK, enter the following command:

sudo ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true

Once the OvS is set up to use DPDK, we need to change one OvS setting and two important DPDK configuration settings.

OvS Settings

pmd-cpu (poll mode drive-mask: PMD (poll-mode driver) threads can be created and pinned to CPU cores by explicitly specifying pmd-cpu-mask. These threads poll the DPDK devices for new packets instead of having the NIC driver send an interrupt when a new packet arrives.

DPDK Settings

dpdk-lcore-mask: Specifies the CPU cores on which dpdk lcore threads should be spawned. A hex string is expected.
dpdk-socket-mem: Comma-separated list of memory to preallocate from hugepages on specific sockets.

Configure the Settings

The following commands are used to configure these settings:

cd $OVS_DIR
sudo ./utilities/ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10000001
sudo ./utilities/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xffffffeffffffe
sudo ./utilities/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024"

For dpdk-lcore-mask we used a mask of 0xffffffeffffffe to specify the CPU cores on which dpdk-lcore should spawn. In our system, we have the dpdk-lcore threads spawn on all cores, except cores 0 and 28. Those cores are reserved for the Linux* scheduler. Similarly, for the pmd-cpu-mask, we used the mask 0x10000001 to spawn 1 pmd thread for non-uniform memory access (NUMA) Node 0, and another pmd thread for NUMA Node 1. Lastly, since we have a two-socket system, we allocate 1 GB of memory per NUMA Node; that is, “1024, 1024”. For a single-socket system, the string would just be “1024”.

Creating an OvS-DPDK Bridge and Ports

For our sample test case, we will create a bridge and add two DPDK vhost-user ports. To create an OvS bridge and two DPDK ports, run the following commands:

cd $OVS_DIR
sudo ./utilities/ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
sudo ./utilities/ovs-vsctl add-port br0 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuser
sudo ./utilities/ovs-vsctl add-port br0 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuser

To ensure that the bridge and vhost-user ports have been properly set up and configured, run the command:

sudo ./utilities/ovs-vsctl show

If all is successful you should see output like the image below:


OvS show command output.

Binding Devices to DPDK

To bind your NIC device to the DPDK, you must run the dpdk-devbind.py command. For example, to bind eth1 from the current driver and move to use the vfio-pci driver, run dpdk-devbind.py --bind=vfio-pci eth1. To use the vfio-pci driver, run modsprobe to load it and its dependencies.

This is what it looked like on my system, with 2 x 10 Gb interfaces available:

sudo modprobe vfio-pci
sudo cp $DPDK_DIR/usertools/dpdk-devbind.py /usr/bin/
sudo dpdk-devbind --bind=vfio-pci enp61s0f0

To check whether the NIC cards you specified are bound to the DPDK, run the command:

sudo dpdk-devbind.py --status


Output of script to bind the NICs.

Using DPDK vhost-user Ports with VMs

Creating VMs is out of scope for this article, but a how-to introduction can be read more. Once we have two VMs created (in this example, virtual disks centos7vm1.qcow2 and centosvm2.qcow2), the following commands show how to use the DPDK vhost-user ports we created earlier.

Ensure that the QEMU* version on the system is v2.2.0 or above, as discussed under “DPDK vhost-user Prerequisites” in the OvS DPDK Install Guide.

sudo qemu-system-x86_64 -m 1024 -smp 4 -cpu host -hda /home/user/centos7vm1.qcow2 -boot c -enable-kvm -no-reboot -net none -nographic \
-chardev socket,id=char1,path=/run/openvswitch/vhost-user1 \
-netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce \
-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 \
-object memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages,share=on
-numa node,memdev=mem -mem-prealloc
sudo qemu-system-x86_64 -m 1024 -smp 4 -cpu host -hda /home/user/centosvm2.qcow2 -boot c -enable-kvm -no-reboot -net none -nographic \
-chardev socket,id=char2,path=/run/openvswitch/vhost-user2 \
-netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce \
-device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 \
-object memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages,share=on
-numa node,memdev=mem -mem-prealloc

DPDK vhost-user Inter-VM Test Case with iperf3

In the previous step, we configured two VMs, each with a Virtio* NIC that is connected to the OvS-DPDK bridge.

Configure the NIC IP address on both VMs to be on the same subnet. Install iPerf3 from http://software.es.net/iperf, and then run a simple network test case. On one VM, start iPerf3 in server mode iperf3 -s and run the iperf3 client on the other VM, iperf3 –c server_ip. The network throughput and performance varies, depending on your system hardware capabilities and configuration.

OvS using DPDK


iPerf performance using OvS-DPDK.

To configure two VMs with tap devices on non-DPDK OvS bridge (br0), refer to the instructions in the document Open vSwitch with KVM. Then start the VMs using the same images we used previously, for example:

OvS without DPDK

iPerf performance for OvS without DPDK.

We can see that the OvS-DPDK transfer rate is roughly ~1.45x greater than OvS without DPDK.

Helpful Initialization Script

Since the OvS daemon, device bind of NIC, hugetable mount, and drivers are not persistent between reboots, place the following commands in a shell script, and then run it after a reboot.

sudo modprobe vfio-pci
sudo modprobe openvswitch
sudo mount -t hugetlbfs hugetlbfs /mnt/huge
sudo mount -t hugetlbfs none /mnt/huge_2mb -o pagesize=2MB

cd $OVS_DIR
sudo ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach
sudo ./vswitchd/ovs-vswitchd unix:/usr/local/var/run/openvswitch/db.sock --pidfile --detach
sudo ./utilities/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
sudo ./utilities/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0x123
sudo ./utilities/ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=0x123
sudo ./utilities/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024"
sudo dpdk-devbind.py --bind=vfio-pci <ethX>

Summary

Although Fedora 27 does not have OvS and DPDK packages in its repository, it is easy to build. In this article we discussed how to build OvS with DPDK, configure, and use OvS- DPDK for enhanced network throughput performance. We also covered how to configure a simple OvS-DPDK bridge with DPDK vhost-user ports for an inter-VM application use case. By becoming familiar with this simple use case, you’ll know how to deploy to physical hosts in a production environment.

About the Authors

Yaser Ahmed is a software engineer at Intel Corporation who has an MS degree in Applied Statistics from DePaul University and a BS degree in Electrical Engineering from the University of Minnesota.

Ashok Emani is a Senior Software Engineer at Intel Corporation with over 14 years of work experience spanning Embedded/Systems programming, Storage/IO technologies, Computer architecture, Virtualization and Performance analysis/benchmarking.

For more complete information about compiler optimizations, see our Optimization Notice.

17 comments

Top

Hi Ashok,

Got it . I bind both the port . Its working now. Thanks a ton.

But I am stuck on running the VM with the with the vhost. I am using ubuntu 14.04. I created a image with the following command

qemu-img create -f qcow2 ubuntuvm1.qcow2 20G

then I run the command you mentioned by changing the image name.I got the following error.

ERROR:/build/qemu-_D3HGx/qemu-2.0.0+dfsg/qom/object.c:437:object_new_with_type: assertion failed: (type != NULL)

I have tried many ways. But I am not getting successful. Do I need to change anything else. Or can you suggest me a command so that I can use a kvm mage created from libvert which have the format xxx.img . Please help

Please help !!

 

 

Plese help

aemani's picture

biju, you will need to ensure all same IOMMU group devices are bound to VFIO, please see

https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1559408

Check your "dpdk_nic_bind --status" to ensure.

Hi Ashok,

I could not successfully run the ovs-vswitchd. Please see the attached log below. It says "VFIO group is not viable!"

I could successfully bind vfio-pci driver to port 04:00.0.

I am using ubuntu 14.04 and latest version of open vswitch .

Please not that I have run open vSwitch with dpdk  using igb_uio driver in the same platform before. Clearly it is a problem with vfio-pci driver. Please help !!

 

 

/ovs$ sudo vswitchd/ovs-vswitchd --dpdk -c 0x2 -n 4 --socket-mem 2048 -- unix:/usr/local/var/run/openvswitch/db.sock --pidfile --detach
2016-04-08T17:58:49Z|00001|dpdk|INFO|No -vhost_sock_dir provided - defaulting to /usr/local/var/run/openvswitch
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 4 on socket 0
EAL: Detected lcore 5 as core 5 on socket 0
EAL: Detected lcore 6 as core 6 on socket 0
EAL: Detected lcore 7 as core 7 on socket 0
EAL: Detected lcore 8 as core 0 on socket 0
EAL: Detected lcore 9 as core 1 on socket 0
EAL: Detected lcore 10 as core 2 on socket 0
EAL: Detected lcore 11 as core 3 on socket 0
EAL: Detected lcore 12 as core 4 on socket 0
EAL: Detected lcore 13 as core 5 on socket 0
EAL: Detected lcore 14 as core 6 on socket 0
EAL: Detected lcore 15 as core 7 on socket 0
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 16 lcore(s)
EAL: Searching for IVSHMEM devices...
EAL: No IVSHMEM configuration found!
EAL: Setting up physically contiguous memory...
EAL: Ask a virtual area of 0x400000000 bytes
EAL: Virtual area found at 0x7f6a00000000 (size = 0x400000000)
EAL: Requesting 2 pages of size 1024MB from socket 0
EAL: TSC frequency is ~1999998 KHz
EAL: Master lcore 1 is ready (tid=96ca1700;cpuset=[1])
EAL: PCI device 0000:03:00.0 on NUMA socket 0
EAL:   probe driver: 8086:15ad rte_ixgbe_pmd
EAL:   Not managed by a supported kernel driver, skipped
EAL: PCI device 0000:03:00.1 on NUMA socket 0
EAL:   probe driver: 8086:15ad rte_ixgbe_pmd
EAL:   Not managed by a supported kernel driver, skipped
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL:   probe driver: 8086:10fb rte_ixgbe_pmd
EAL:   0000:04:00.0 VFIO group is not viable!
EAL: Error - exiting with code: 1
  Cause: Requested device 0000:04:00.0 cannot be used

 

Hi Ashok,

I followed your guide to set up the same environment with you, unfortunately, I can't ping successfully between 2 VMs,

I use almost same command to run the VM, exception no "-nographic" option, because I found I can't start VM if I enable command "-nographic", my command is as following:

sudo qemu-system-x86_64 -m 1024 -smp 4 -cpu host -hda ~/../soc2/ubuntu_VM1.img -boot c -enable-kvm -no-reboot -net none \
-chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user1 \
-netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce \
-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 \
-object memory-backend-file,id=mem,size=1024M,mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem -mem-prealloc

I use DPDK 2.2 & OVS v2.5.0,

The ip address at one VM is 10.156.24.211/24, the other is 10.156.22.212/24.

I even tried to set up 2 flows, it still can't ping sucessfully,

sudo utilities/ovs-ofctl show br0

OFPT_FEATURES_REPLY (xid=0x2): dpid:0000000af7837b6e
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 1(dpdk0): addr:00:0a:f7:83:7b:6e
     config:     0
     state:      0
     current:    10GB-FD
     advertised: 1GB-HD 1GB-FD
     speed: 10000 Mbps now, 0 Mbps max
 2(vhost-user1): addr:00:00:00:00:00:00
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 3(vhost-user2): addr:00:00:00:00:00:00
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br0): addr:00:0a:f7:83:7b:6e
     config:     PORT_DOWN
     state:      LINK_DOWN
     current:    10MB-FD COPPER
     speed: 10 Mbps now, 0 Mbps max

sudo ovs-ofctl dump-flows br0
    
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=211.153s, table=0, n_packets=171, n_bytes=7182, idle_age=0, dl_src=00:00:00:00:00:01 actions=output:3
 cookie=0x0, duration=204.738s, table=0, n_packets=130, n_bytes=5460, idle_age=0, dl_src=00:00:00:00:00:02 actions=output:2
 cookie=0x0, duration=91735.874s, table=0, n_packets=8992, n_bytes=381408, idle_age=1668, hard_age=65534, priority=0 actions=NORMAL

Could you please help to take a look in your free time? Thanks very much in advance!

Thanks Ashok

Now I got it.

You are using OvS in standalone mode as legacy switch and the VMs port is not DPDK port.

If possible, can you make a clear distinction between ivshmem, kni and native, and also between vfio and uio? Thanks in advance..

aemani's picture

Ali,

Once the VM(s) are powered up, we can configure the static IP for the virtio NIC from VM console. No flows were created on the vSwitch, default/normal mode is used.

Hi Ashok

I might be wrong but I think there are two unclear points in here.

1) There must be flow-entries in the OvS+DPDK for vhost-user(s) which you didn't mention.

2) How can one get a static IP for DPDK port on QEMU VMs. I cannot understand how you use iperf3 for the DPDK ports.

Thanks in advance.

Pages

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.