This article describes the concept of vHost User non-uniform memory access (NUMA) awareness, how it can be tested, and the benefits the feature brings to Open vSwitch (OVS) with the Data Plane Development Kit (DPDK). This article was written with users of OVS in mind who wish to know more about the feature. It may also be beneficial to users who are configuring a multi-socket virtualized OVS DPDK setup that uses vHost User ports as the guest access method for virtual machines (VMs) and want to configure and verify the optimal setup.
Note: At the time of writing, vHost User NUMA awareness in OVS with DPDK is only available on the OVS master branch. Users can download the OVS master branch as a zip here. Installation steps for OVS with DPDK are available here.
vHost User NUMA awareness was introduced in DPDK v2.2 to address a limitation in the DPDK, surrounding the inefficient allocation of vHost memory in setups with multiple NUMA nodes. In order to understand the limitation the feature addresses, one must first understand the three different types of memory that vHost User devices comprise (see Figure 1).
|#||Memory managed by||Description|
|1||DPDK||Device tracking memory|
|2||OVS||Backend buffers (mbufs)|
|3||QEMU||Guest memory (device and memory buffers)|
Figure 1: Table describing the different types of vHost User memory in Open vSwitch* with the Data Plane Development Kit.
For an optimized data path, all three memory types should be allocated on the same node. However this wasn’t possible before DPDK v2.2, because the device-tracking structures for each device (managed by DPDK) had to all come from the same node, even if the devices themselves were attached to VMs on different nodes. This created a scenario where device tracking memory and guest memory are on different nodes, introducing additional Intel® QuickPath Interconnect (QPI) traffic and a potential performance issue (see Figure 2).
Figure 2: Dual-node Open vSwitch* with the Data Plane Development Kit configuration before vHost NUMA awareness capability.
In DPDK v2.2 and later, vHost structures are dynamically associated with guest memory. This means that when the device memory is first allocated, it resides in a temporary memory structure. It stays there until information about the guest memory is communicated from QEMU* to the DPDK. The DPDK uses this information to derive the NUMA node ID that the guest memory of the vHost User device resides on. The DPDK can then allocate a permanent memory structure on this correct node, allowing for the guest memory and device tracking memory to be located on the same node.
One last type of memory needs to be correctly allocated, which is the back-end buffers, or ‘mbufs’. These are allocated by OVS and in order to ensure an efficient data path, they must also be allocated from the same node as the guest memory and device tracking memory. This is now achieved by the DPDK sending the NUMA node information of the guest to OVS, and then OVS allocating memory for these buffers on the correct node. Before the addition of this feature, these buffers were always allocated on the node of the DPDK master lcore, which wasn’t always the same node that the vHost User device was on.
The final piece of the puzzle involves the placement of OVS poll mode driver (PMD) threads. PMD threads are the threads that do the heavy lifting in OVS and perform tasks such as continuous polling of input ports for packets, classifying packets once received, and executing actions on the packets once they are classified. Before this feature was introduced in OVS, the PMD threads servicing vHost User ports had to all be pinned to cores on the same NUMA node, that node being that of the DPDK master lcore. However, now PMD threads can be placed on the same node as the device’s memory buffers, guest memory, and device tracking memory. Figure 3 depicts this optimal memory profile for vHost User devices in OVS with the DPDK in a multiple NUMA node setup.
Figure 3: Dual node Open vSwitch* with the Data Plane Development Kit configuration with vHost NUMA awareness capability.
The test environment requires a host platform with at least two NUMA nodes. The host is running an instance OVS with DPDK and has two vHost User devices configured on the switch, ‘vhost0’ and ‘vhost1’. Two VMs are running on separate NUMA nodes, ‘VM0’ and ‘VM1’. ‘vhost0’ is attached to ‘VM0’ and ‘vhost1’ is attached to ‘VM1’. Figure 2 shows this configuration.
The setup used in this article consists of the following hardware and software components:
|Processor||Intel® Xeon® processor E5-2695 v3 @ 2.30 GHz|
|Data Plane Development Kit||v16.04|
Before installing DPDK and OVS, ensure that the NUMA libraries are installed on the system. For example, to install these on a Fedora OS, use:
sudo yum install numactl-libs sudo yum install numactl-devel
Ensure the DPDK is built with the following configuration option enabled:
Now OVS can be built and linked with the DPDK.
Configure the switch as described in the “Test Environment” section, with two vHost User ports. Configure the ‘pmd-cpu-mask’ to enable PMD threads to be pinned to cores in both NUMA nodes. For example in a 28-core system where cores 0–13 are located on NUMA node 0 and 14–27 are located on NUMA node 1, set the following mask to enable one core on each node:
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=10001
Before launching the VMs, check the PMD distribution with the following command:
Because the VMs are not yet launched and information about the guest memory is not yet known, the PMD threads associated with the vHost User ports will be located on the same NUMA node:
pmd thread numa_id 0 core_id 0: port: dpdkvhostuser1 queue-id: 0 port: dpdkvhostuser0 queue-id: 0
Now launch two VMs, VM0 on node 0 and VM1 on node 1. To ensure the intended placement of the VM cores, use the ‘taskset’ command. For example:
sudo taskset 0x2 qemu-system-x86_64 -name VM0 -cpu … sudo taskset 0x2000 qemu-system-x86_64 –name VM1 -cpu …
Check the logs of the VMs. VM1 will print a log similar to the following:
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR VHOST_CONFIG: reallocate vq from 0 to 1 node VHOST_CONFIG: reallocate dev from 0 to 1 node
This means that the device tracking memory has been moved from the temporary memory structure on the original node (0) to a permanent structure on the correct node (1).
Another way to verify successful relocation is to check the PMD distribution again using the ‘pmd-rxq-show’ utility:
pmd thread numa_id 1 core_id 20: port: dpdkvhostuser1 queue-id: 0 pmd thread numa_id 0 core_id 0: port: dpdkvhostuser0 queue-id: 0
‘dpdkvhostuser1’ is now serviced by a thread on NUMA node 1, which is the node on which the VM it is attached to is running.
In this article we described and showed how DPDK and OVS dynamically reallocates memory and relocates threads according to how the test environment is set up. We have demonstrated the different ways you can verify the correct operation of the vHost User NUMA awareness feature.
For more details on the DPDK vHost library, refer to the DPDK documentation.
For more information on configuring vHost User in Open vSwitch, refer to INSTALL.DPDK.rst.
Have a question? Feel free to follow up with the query on the Open vSwitch discussion mailing thread.
To learn more about OVS with DPDK, check out the following videos and articles on Intel® Developer Zone and Intel® Network Builders University.
Ciara Loftus is a network software engineer with Intel. Her work is primarily focused on accelerated software switching solutions in user space running on Intel® architecture. Her contributions to OVS with DPDK include the addition of vHost Cuse and vHost User ports and NUMA-aware vHost User.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804