Building vhost-user For OVS Today Using DPDK 2.0.0

Published:06/09/2015   Last Updated:06/09/2015

UPDATE:  16 June 2015 - The vhost-user patch was accepted and committed  no longer requiring the pre-commit patch be included with the helper script.

I recently discovered that DPDK support for vhost-user offloading was added to mainline of Open vSwitch. In case you are new to vhost-user here is a great introduction some documentation from DPDK 2.0.0 .

Our goal is to build, install, and configure the faster data path between two VMs.  This post guides you through

 step-by-step and explains how to build up OVS with DPDK 2.0.0 and perpare for a test of the fast VM to VM and VM to Host packet processing. Then we review  a script that pulls the build all together and a script to enable and configure OVS on host boot.. So let’s get started.

 

Adding Build Components

This list will vary depending on the state of the system you might find yourself working on and feel free send in a notes of other packages you found vital to your success.  Here we have chosen CentOS and Ubuntu to verify this procedure..

Required Packages

Ubuntu

CentOS

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install git
sudo apt-get install fuse libfuse-dev
sudo apt-get install dh-autoreconf
sudo apt-get install openssl
sudo apt-get install libssl-dev

sudo yum update
sudo yum install git
sudo yum install openssl-devel
sudo yum install rpm-build
sudo yum install redhat-rpm-config
sudo yum install fuse fuse-devel

 

Download DPDK

Now we are going to grab a copy of DPDK v.2.0.0 and unroll it.

wget http://dpdk.org/browse/dpdk/snapshot/dpdk-2.0.0.tar.gz
tar xvzpf dpdk-2.0.0.tar.gz
cd dpdk-2.0.0/

 

Modify DPDK Build Configuration for Our Use Case

Next, we need to configure the SDK for our interests and for use with OVS and Openstack we are going to be using the single library build with vhost-user support. Keep in mind that the default DPDK configuration file builds a lot more deployment use cases, but we have specific interests we have to enable. There are two options to make the modifications:

Option 1: Edit SDK configuration files:

Update `config/common_linuxapp` so that DPDK generate single lib file. This modification also is required for IVSHMEM build. We’ll talk about this in a minute.

CONFIG_RTE_BUILD_COMBINE_LIBS=y

Update `config/common_linuxapp` so that DPDK is built with vhost libraries:

CONFIG_RTE_LIBRTE_VHOST=y

Options 2: Use a the following patch:

--- config/common_linuxapp 2014-12-19 15:38:39.000000000 -0800
+++ config/common_linuxapp.new 2015-04-13 18:52:18.411217460 -0700
@@ -81,7 +81,7 @@
 #
 # Combine to one single library
 #
-CONFIG_RTE_BUILD_COMBINE_LIBS=n
+CONFIG_RTE_BUILD_COMBINE_LIBS=y
 CONFIG_RTE_LIBNAME="intel_dpdk"
 #
@@ -372,7 +372,7 @@
 # fuse-devel is needed to run vhost.
 # fuse-devel enables user space char driver development
 #
-CONFIG_RTE_LIBRTE_VHOST=n
+CONFIG_RTE_LIBRTE_VHOST=y
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 #

So let’s stop for a moment and discuss the build option IVSHMEM. Enamored at the prospect of building the fastest vSwitch available this might be of interest.

The DPDK IVSHMEM library uses fast zero-copy data sharing among virtual machines. This could mean host-to-guest or guest-to-guest. This solution leverages QEUMU’s IVSHMEM mechanism. The use of this mechanism comes at a cost to both security and data integrity. 

From a security perspective, the virtual machine using shared memory needs to be trusted. As the name implies for this fast mechanism the memory is shared. Also, having all the virtual machines sharing this piece of memory opens up the possibility of corrupted data. In addition, there are no hooks in Openstack to take advantage of this fast method of packet processing. 

After weighing all these options carefully and you still want to build the fastest vSwitch, then I will show you how. To opt out just replace x86_64-ivshmem-linuxapp-gcc with x86_64-native-linuxapp-gcc.

To build and install the library which includes IVSHMEM (shared memory):

make config T=x86_64-ivshmem-linuxapp-gcc
make install T=x86_64-ivshmem-linuxapp-gcc

Build and Install the eventfd_link Driver

Here are a couple great videos explaining the role and how eventfd_link fits into the packet processing architecture.

Open vSwitch 2014 Fall Conference: Accelerating the Path to the Guest presented by: Maryam Tahhan, Kevin Traynor, and Mark Gray, Intel
Accelerating Network Intensive Workloads Using the DPDK netdev presented by Gerald Rogers, Intel

Build and install eventfd_link

cd dpdk-2.0.0/lib/librte_vhost/eventfd_link/
make
sudo insmod eventfd_link.ko

Build and Integrate OVS with DPDK for vhost-user

For now, let’s pull the trunk of OVS. This can be pinned to another commit tag and different patches.  Here we will find the latest integration with DPDK v2.0.0 and vhost-user patches already applied earlier this month. 

git clone https://github.com/openvswitch/ovs.git
cd ovs
./boot.sh
./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --with-dpdk=../dpdk-2.0.0/x86_64-ivshmem-linuxapp-gcc/ --enable-ssl

Trying to match the configuration for host’s package environment.

Setup System Boot

Add the following options to the kernel bootline using the /etc/default/grub and append the following to GRUB_CMDLINE_LINUX=

iommu=pt intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=8

Tell grub about the changes

Ubuntu

CentOS

update-grub

grub2-mkconfig --output=/boot/grub2/grub.cfg

Reboot the host and ensure the BIOS settings are enabled for VT-d (DirectIO).

Mount the hugetable filesystem

mount -t hugetlbfs -o pagesize=1G none /dev/hugepages

Setup DPDK Devices

DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO modules. UIO requires inserting an out of tree driver igb_uio.ko that is available in DPDK.

Setup for both methods are described below and could be placed in the systems startup file, like/etc/rc.local for example.

# for example
RTE_SDK=”${HOME}/dpdk-2.0.0
RTE_TARGET =”x86_64-ivshmem-linuxapp-gcc”

# Install and bind UIO interface eth0
sudo modprobe uio
sudo insmod ${RTE_SDK}/${RTE_TARGET}/kmod/igb_uio.ko
sudo ${RTE_SDK}/tools/dpdk_nic_bind.py --bind=igb_uio eth0

# Install and bind VFIO interface eth1
# NOTE: VFIO needs to be supported in the kernel and the BIOS.
sudo modprobe vfio-pci
sudo chmod a+x /dev/vfio
sudo chmod 0666 /dev/vfio/*
sudo ${RTE_SDK}/tools/dpdk_nic_bind.py --bind=vfio-pci eth1

Here only UIO was looked at the UIO driver.

OVS Schema Upgrades

Take care of schema changes on upgrades. Save DB and then adjust.

/etc/init.d/openvswitch-switch stop
ovsdb-tool convert /etc/openvswitch/conf.db vswitchd/vswitch.ovsschema

These are the set of patches applied by the script.

cd ${home}/src/${ovs_path}
git checkout 7762f7c39a8f5f115427b598d9e768f9336af466
patch -p1 <../../dpdk-vhost-user-2.patch
patch -p1 <../../ovs-ctl-add-dpdk.patch

dpdk-vhost-user-2.patch is netdev-dpdk: add dpdk vhost-user ports proposed by Ciara Loftus Using these patches here >

ovs-ctl-add-dpdk.patch adds ovs-ctl –DPDK_OPTS=”” processing on the command line and does not work yet. Add these patches and any others today and remove them as features appear upstream tomorrow.

 

Starting ovs-vswtichd with DPDK Options

ovs-vswtichd needs to be started with DPDK options for your multi-core system. These are options that passed to the DPDK Library to assign memory, cores and ports. This can be used OVS to pass different options to initialize DPDK libraries to be fast and secure. 

We want to rerun ovs-vswitchd with the DPDK 2.0.0 and set DPDK_OPT= that can be tuned for our system. Here we reserve the 4th through the 11th core of 12 cores on socket0 of a duel socket system. Also, NUMA memory reserved by socket0.

killall ovs-vswitchd
ovs-vswitchd --dpdk -c 0x0FF8 -n 4 --socket-mem 1024,0 -- \
unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info \
--mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log \
--pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach –monitor

Restart Open vSwitch

This varies for system to system.  It might work like this on older systems

Ubuntu

CentOS

/etc/init.d/openvswitch-vswitch restart

/etc/init.d/openvswitch restart

And now with systemd these files are involved for configuration

/usr/lib/systemd/system/openvswitch-nonetwork.service
/usr/share/openvswitch/scripts/ovs-ctl

Enable and start using systemd

systemctl status openvswitch.service
systemctl enable openvswitch.service
systemctl start openvswitch.service

A Build Script Putting It All Together

 

Now that we have all the individual tasks identified we can put all of it in a script to build mainline OVS with patches for DPDK 2.0.0 netdev and vhost-user. This script pins to a certain git commit hash.  Get the script and build now and build now. 

wget https://raw.githubusercontent.com/xsited/ssg/master/scripts/build_ovs_dpdk.sh
chmod +x build_ovs_dpdk.sh
./build_ovs_dpdk.sh

The binaries are installed over the host system package. 

cd src/ovs/
make install

Now we can use the package manager to remove to easily return the system to the previous and default version.

apt-get purge openvswitch-switch
apt-get install openvswitch-switch

NOTE: Save your OVS db if you have one to save.

 

A Script to Run On Host Boot

Once Open vSwitch is built and installed we need a startup procedure to enable vhost-user. This describes the file rc.local.include which is generated after executing the build script and serves as a hint on how to start and run vhost-user on your system.

Load UIO Drivers

# Change DPDK_HOME for your system
DPDK_HOME=
modprobe uio
insmod $(DPDK_HOME)/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko

Install eventfd_link

insmod $(DPDK_HOME)/lib/librte_vhost/eventfd_link/eventfd_link.ko

Mount hugepages

mount -t hugetlbfs -o pagesize=1G none /dev/hugepages

Unbind interfaces

The interface names may differ and this example uses my systems interfaces. 

$(DPDK_HOME)/tools/dpdk_nic_bind.py --bind=igb_uio p514p1
$(DPDK_HOME)/tools/dpdk_nic_bind.py --bind=igb_uio p514p2
$(DPDK_HOME)/tools/dpdk_nic_bind.py --status

There may 1G interfaces that are DPDK-enabled and are available on some of today's laptops. Give it a try.

Restart ovs-vswitchd with DPDK

Make sure the dpdk options are first in the list and terminated with the bare double dash.

killall ovs-vswitchd
ovs-vswitchd --dpdk -c 0x0FF8 -n 4 --socket-mem 1024,0 -- unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor

 

Configure a Two Port Bridge

Once ovs-vswitchd is enabled with DPDK and running we can build a switch br0 and add ports dpdk0 to dpdk1.

ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
ovs-vsctl show
f3f6693e-03aa-4d36-ae09-ff4a17689467
Bridge "br0"
    Port "br0"
        Interface "br0"
        type: internal
    Port "dpdk1"
        Interface "dpdk1"
        type: dpdk
    Port "dpdk0"
        Interface "dpdk0"
        type: dpdk
    ovs_version: "2.3.90"

Traffic now should flow and the system is ready for testing VM packet processing. Initial performance looks promising.

Other Troubleshooting Tips

Here area set of commands that came in handy when troubleshooting.

Check DirectIO

dmesg | grep -e DMAR -e IOMMU
[ 0.000000] ACPI: DMAR 0x00000000BDFA9618 000120 (v01 INTEL S2600WP
06222004 INTL 20090903)
[ 0.122335] dmar: IOMMU 0: reg_base_addr fbffe000 ver 1:0 cap
d2078c106f0466 ecap f020de
[ 0.122342] dmar: IOMMU 1: reg_base_addr ebffc000 ver 1:0 cap
d2078c106f0466 ecap f020de
[ 0.122470] IOAPIC id 2 under DRHD base 0xfbffe000 IOMMU 0
[ 0.122472] IOAPIC id 0 under DRHD base 0xebffc000 IOMMU 1
[ 0.122473] IOAPIC id 1 under DRHD base 0xebffc000 IOMMU 1

Check Hugepages

grep -i huge /proc/meminfo
AnonHugePages: 1312768 kB
HugePages_Total: 60
HugePages_Free: 59
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB

 

Get Binding Status

tools/dpdk_nic_bind.py --status
Network devices using DPDK-compatible driver 
============================================
0000:05:00.0 'Ethernet Controller 10-Gigabit X540-AT2' drv=igb_uio unused=
0000:05:00.1 'Ethernet Controller 10-Gigabit X540-AT2' drv=igb_uio unused=

Network devices using kernel driver
===================================
0000:02:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=p785p1 drv=ixgbe unused=igb_uio
0000:02:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=p785p2 drv=ixgbe unused=igb_uio
0000:08:00.0 'I350 Gigabit Network Connection' if=eth0 drv=igb unused=igb_uio *Active*
0000:08:00.1 'I350 Gigabit Network Connection' if=eth1 drv=igb unused=igb_uio *Active*

Other network devices
=====================
<none>

 

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.