Building vhost-user For OVS Today Using DPDK 2.0.0
Published:06/09/2015 Last Updated:06/09/2015
UPDATE: 16 June 2015 - The vhost-user patch was accepted and committed no longer requiring the pre-commit patch be included with the helper script.
I recently discovered that DPDK support for vhost-user offloading was added to mainline of Open vSwitch. In case you are new to vhost-user here is a great introduction some documentation from DPDK 2.0.0 .
Our goal is to build, install, and configure the faster data path between two VMs. This post guides you through
step-by-step and explains how to build up OVS with DPDK 2.0.0 and perpare for a test of the fast VM to VM and VM to Host packet processing. Then we review a script that pulls the build all together and a script to enable and configure OVS on host boot.. So let’s get started.
Adding Build Components
This list will vary depending on the state of the system you might find yourself working on and feel free send in a notes of other packages you found vital to your success. Here we have chosen CentOS and Ubuntu to verify this procedure..
Required Packages
Ubuntu |
CentOS |
sudo apt-get update |
sudo yum update
|
Download DPDK
Now we are going to grab a copy of DPDK v.2.0.0 and unroll it.
wget http://dpdk.org/browse/dpdk/snapshot/dpdk-2.0.0.tar.gz
tar xvzpf dpdk-2.0.0.tar.gz
cd dpdk-2.0.0/
Modify DPDK Build Configuration for Our Use Case
Next, we need to configure the SDK for our interests and for use with OVS and Openstack we are going to be using the single library build with vhost-user support. Keep in mind that the default DPDK configuration file builds a lot more deployment use cases, but we have specific interests we have to enable. There are two options to make the modifications:
Option 1: Edit SDK configuration files:
Update `config/common_linuxapp` so that DPDK generate single lib file. This modification also is required for IVSHMEM build. We’ll talk about this in a minute.
CONFIG_RTE_BUILD_COMBINE_LIBS=y
Update `config/common_linuxapp` so that DPDK is built with vhost libraries:
CONFIG_RTE_LIBRTE_VHOST=y
Options 2: Use a the following patch:
--- config/common_linuxapp 2014-12-19 15:38:39.000000000 -0800
+++ config/common_linuxapp.new 2015-04-13 18:52:18.411217460 -0700
@@ -81,7 +81,7 @@
#
# Combine to one single library
#
-CONFIG_RTE_BUILD_COMBINE_LIBS=n
+CONFIG_RTE_BUILD_COMBINE_LIBS=y
CONFIG_RTE_LIBNAME="intel_dpdk"
#
@@ -372,7 +372,7 @@
# fuse-devel is needed to run vhost.
# fuse-devel enables user space char driver development
#
-CONFIG_RTE_LIBRTE_VHOST=n
+CONFIG_RTE_LIBRTE_VHOST=y
CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
#
So let’s stop for a moment and discuss the build option IVSHMEM. Enamored at the prospect of building the fastest vSwitch available this might be of interest.
The DPDK IVSHMEM library uses fast zero-copy data sharing among virtual machines. This could mean host-to-guest or guest-to-guest. This solution leverages QEUMU’s IVSHMEM mechanism. The use of this mechanism comes at a cost to both security and data integrity.
From a security perspective, the virtual machine using shared memory needs to be trusted. As the name implies for this fast mechanism the memory is shared. Also, having all the virtual machines sharing this piece of memory opens up the possibility of corrupted data. In addition, there are no hooks in Openstack to take advantage of this fast method of packet processing.
After weighing all these options carefully and you still want to build the fastest vSwitch, then I will show you how. To opt out just replace x86_64-ivshmem-linuxapp-gcc with x86_64-native-linuxapp-gcc.
To build and install the library which includes IVSHMEM (shared memory):
make config T=x86_64-ivshmem-linuxapp-gcc
make install T=x86_64-ivshmem-linuxapp-gcc
Build and Install the eventfd_link Driver
Here are a couple great videos explaining the role and how eventfd_link fits into the packet processing architecture.
Open vSwitch 2014 Fall Conference: Accelerating the Path to the Guest presented by: Maryam Tahhan, Kevin Traynor, and Mark Gray, Intel
Accelerating Network Intensive Workloads Using the DPDK netdev presented by Gerald Rogers, Intel
Build and install eventfd_link
cd dpdk-2.0.0/lib/librte_vhost/eventfd_link/
make
sudo insmod eventfd_link.ko
Build and Integrate OVS with DPDK for vhost-user
For now, let’s pull the trunk of OVS. This can be pinned to another commit tag and different patches. Here we will find the latest integration with DPDK v2.0.0 and vhost-user patches already applied earlier this month.
git clone https://github.com/openvswitch/ovs.git
cd ovs
./boot.sh
./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --with-dpdk=../dpdk-2.0.0/x86_64-ivshmem-linuxapp-gcc/ --enable-ssl
Trying to match the configuration for host’s package environment.
Setup System Boot
Add the following options to the kernel bootline using the /etc/default/grub and append the following to GRUB_CMDLINE_LINUX=
iommu=pt intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=8
Tell grub about the changes
Ubuntu |
CentOS |
update-grub |
grub2-mkconfig --output=/boot/grub2/grub.cfg |
Reboot the host and ensure the BIOS settings are enabled for VT-d (DirectIO).
Mount the hugetable filesystem
mount -t hugetlbfs -o pagesize=1G none /dev/hugepages
Setup DPDK Devices
DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO modules. UIO requires inserting an out of tree driver igb_uio.ko that is available in DPDK.
Setup for both methods are described below and could be placed in the systems startup file, like/etc/rc.local for example.
# for example
RTE_SDK=”${HOME}/dpdk-2.0.0
RTE_TARGET =”x86_64-ivshmem-linuxapp-gcc”
# Install and bind UIO interface eth0
sudo modprobe uio
sudo insmod ${RTE_SDK}/${RTE_TARGET}/kmod/igb_uio.ko
sudo ${RTE_SDK}/tools/dpdk_nic_bind.py --bind=igb_uio eth0
# Install and bind VFIO interface eth1
# NOTE: VFIO needs to be supported in the kernel and the BIOS.
sudo modprobe vfio-pci
sudo chmod a+x /dev/vfio
sudo chmod 0666 /dev/vfio/*
sudo ${RTE_SDK}/tools/dpdk_nic_bind.py --bind=vfio-pci eth1
Here only UIO was looked at the UIO driver.
OVS Schema Upgrades
Take care of schema changes on upgrades. Save DB and then adjust.
/etc/init.d/openvswitch-switch stop
ovsdb-tool convert /etc/openvswitch/conf.db vswitchd/vswitch.ovsschema
These are the set of patches applied by the script.
cd ${home}/src/${ovs_path}
git checkout 7762f7c39a8f5f115427b598d9e768f9336af466
patch -p1 <../../dpdk-vhost-user-2.patch
patch -p1 <../../ovs-ctl-add-dpdk.patch
dpdk-vhost-user-2.patch is netdev-dpdk: add dpdk vhost-user ports proposed by Ciara Loftus Using these patches here >
ovs-ctl-add-dpdk.patch adds ovs-ctl –DPDK_OPTS=”” processing on the command line and does not work yet. Add these patches and any others today and remove them as features appear upstream tomorrow.
Starting ovs-vswtichd with DPDK Options
ovs-vswtichd needs to be started with DPDK options for your multi-core system. These are options that passed to the DPDK Library to assign memory, cores and ports. This can be used OVS to pass different options to initialize DPDK libraries to be fast and secure.
We want to rerun ovs-vswitchd with the DPDK 2.0.0 and set DPDK_OPT= that can be tuned for our system. Here we reserve the 4th through the 11th core of 12 cores on socket0 of a duel socket system. Also, NUMA memory reserved by socket0.
killall ovs-vswitchd
ovs-vswitchd --dpdk -c 0x0FF8 -n 4 --socket-mem 1024,0 -- \
unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info \
--mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log \
--pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach –monitor
Restart Open vSwitch
This varies for system to system. It might work like this on older systems
Ubuntu |
CentOS |
/etc/init.d/openvswitch-vswitch restart |
/etc/init.d/openvswitch restart |
And now with systemd these files are involved for configuration
/usr/lib/systemd/system/openvswitch-nonetwork.service
/usr/share/openvswitch/scripts/ovs-ctl
Enable and start using systemd
systemctl status openvswitch.service
systemctl enable openvswitch.service
systemctl start openvswitch.service
A Build Script Putting It All Together
Now that we have all the individual tasks identified we can put all of it in a script to build mainline OVS with patches for DPDK 2.0.0 netdev and vhost-user. This script pins to a certain git commit hash. Get the script and build now and build now.
wget https://raw.githubusercontent.com/xsited/ssg/master/scripts/build_ovs_dpdk.sh
chmod +x build_ovs_dpdk.sh
./build_ovs_dpdk.sh
The binaries are installed over the host system package.
cd src/ovs/
make install
Now we can use the package manager to remove to easily return the system to the previous and default version.
apt-get purge openvswitch-switch
apt-get install openvswitch-switch
NOTE: Save your OVS db if you have one to save.
A Script to Run On Host Boot
Once Open vSwitch is built and installed we need a startup procedure to enable vhost-user. This describes the file rc.local.include which is generated after executing the build script and serves as a hint on how to start and run vhost-user on your system.
Load UIO Drivers
# Change DPDK_HOME for your system
DPDK_HOME=
modprobe uio
insmod $(DPDK_HOME)/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko
Install eventfd_link
insmod $(DPDK_HOME)/lib/librte_vhost/eventfd_link/eventfd_link.ko
Mount hugepages
mount -t hugetlbfs -o pagesize=1G none /dev/hugepages
Unbind interfaces
The interface names may differ and this example uses my systems interfaces.
$(DPDK_HOME)/tools/dpdk_nic_bind.py --bind=igb_uio p514p1
$(DPDK_HOME)/tools/dpdk_nic_bind.py --bind=igb_uio p514p2
$(DPDK_HOME)/tools/dpdk_nic_bind.py --status
There may 1G interfaces that are DPDK-enabled and are available on some of today's laptops. Give it a try.
Restart ovs-vswitchd with DPDK
Make sure the dpdk options are first in the list and terminated with the bare double dash.
killall ovs-vswitchd
ovs-vswitchd --dpdk -c 0x0FF8 -n 4 --socket-mem 1024,0 -- unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor
Configure a Two Port Bridge
Once ovs-vswitchd is enabled with DPDK and running we can build a switch br0 and add ports dpdk0 to dpdk1.
ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
ovs-vsctl show
f3f6693e-03aa-4d36-ae09-ff4a17689467
Bridge "br0"
Port "br0"
Interface "br0"
type: internal
Port "dpdk1"
Interface "dpdk1"
type: dpdk
Port "dpdk0"
Interface "dpdk0"
type: dpdk
ovs_version: "2.3.90"
Traffic now should flow and the system is ready for testing VM packet processing. Initial performance looks promising.
Other Troubleshooting Tips
Here area set of commands that came in handy when troubleshooting.
Check DirectIO
dmesg | grep -e DMAR -e IOMMU
[ 0.000000] ACPI: DMAR 0x00000000BDFA9618 000120 (v01 INTEL S2600WP
06222004 INTL 20090903)
[ 0.122335] dmar: IOMMU 0: reg_base_addr fbffe000 ver 1:0 cap
d2078c106f0466 ecap f020de
[ 0.122342] dmar: IOMMU 1: reg_base_addr ebffc000 ver 1:0 cap
d2078c106f0466 ecap f020de
[ 0.122470] IOAPIC id 2 under DRHD base 0xfbffe000 IOMMU 0
[ 0.122472] IOAPIC id 0 under DRHD base 0xebffc000 IOMMU 1
[ 0.122473] IOAPIC id 1 under DRHD base 0xebffc000 IOMMU 1
Check Hugepages
grep -i huge /proc/meminfo
AnonHugePages: 1312768 kB
HugePages_Total: 60
HugePages_Free: 59
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Get Binding Status
tools/dpdk_nic_bind.py --status
Network devices using DPDK-compatible driver
============================================
0000:05:00.0 'Ethernet Controller 10-Gigabit X540-AT2' drv=igb_uio unused=
0000:05:00.1 'Ethernet Controller 10-Gigabit X540-AT2' drv=igb_uio unused=
Network devices using kernel driver
===================================
0000:02:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=p785p1 drv=ixgbe unused=igb_uio
0000:02:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' if=p785p2 drv=ixgbe unused=igb_uio
0000:08:00.0 'I350 Gigabit Network Connection' if=eth0 drv=igb unused=igb_uio *Active*
0000:08:00.1 'I350 Gigabit Network Connection' if=eth1 drv=igb unused=igb_uio *Active*
Other network devices
=====================
<none>
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.