OpenStack* Enhanced Platform Awareness

What is it and how can it be used to improve NFV performance?

Overview

Enhanced Platform Awareness (EPA) is a concept that relies on a set of OpenStack Nova* features called Host Aggregates and Availability Zones. To understand EPA, read the white paper OpenStack* Enhanced Platform Awareness - Enabling Virtual Machines to Automatically Take Advantage of Advanced Hardware Capabilities(https://01.org/sites/default/files/page/openstack-epa_wp_fin.pdf).

The Nova scheduler uses Host Aggregates and Availability Zones to determine which host a guest should be launched on based on the capabilities of the host and the requested features of the virtual machine (VM). Most of these features appeared in the Grizzly release and have evolved in Kilo as a set of filter and weights used by the Nova scheduler to determine where a VM should be deployed. For an NFV implementation example, read the Intel® Network Builders white paper A Path to Line-Rate-Capable NFV Deployments with Intel® Architecture and the OpenStack* Kilo Release.

This paper is a hands-on walkthrough whose goal is to demonstrate how EPA can be used in OpenStack deployments, and to help enable EPA in your lab by unlocking the mysteries behind Flavors, Host Aggregates, and Availability Zones.

OpenStack Architecture

Within the OpenStack architecture most of the focus is in the Nova* component, and we briefly look at how Glance might play a role in simplifying future configurations.

Flavors, Filter Attributes, and Extra Specifications

It is desirable to list the filter attributes and determine standard flavor attributes, available extra specifications the scheduler acts on, and any other value add features filter attributes available as criteria for the scheduler.

INSTANCE_ID = “c32ac737-1788-4420-b200-2a107d5ad335”
nova boot --flavor 2 --image $INSTANCE_ID testinstance

In this example, flavor 2 represents a flavor template with the name m1.bigger, which contains the following:

  • 1 vCPU
  • 2G RAM,
  • 10G root disk,
  • 20G ephemeral disk

Managing Flavors

A flavor is a guest instance type or virtual hardware template. A flavor specifies a set of VM resources, such as the number of virtual CPUs, the amount of memory, and the disk space assigned to a VM instance. An example of an instance flavor is a kernel zone with 16 virtual CPUs and 16384 MB of RAM.

For more information:

IBM presented extra specifications attributes for some of their technology in this example (http://www-01.ibm.com/support/knowledgecenter/SST55W_4.3.0/liaca/liacaflavorextraspecs.html)

Extra Specifications

A flavor might include properties that are in addition to the base flavor properties. These extra specifications are key-value pairs that can be used to provide advanced configuration in addition to the configuration provided by the base flavor properties. This configuration is specific to the hypervisor.

An advanced configuration provided with flavor extra specifications might include the following extra specification example. Note that all key-value pairs appended to the extra specification field are specifically for the scheduler filters.

Config IO limit for the specified instance type
nova-manage flavor set_key --name m1.small --key quota:disk_read_bytes_sec --value 10240000
nova-manage flavor set_key --name m1.small --key quota:disk_write_bytes_sec --value 10240000
Config CPU limit for the specified instance type
nova-manage flavor set_key --name m1.small --key quota:cpu_quota --value 5000
nova-manage flavor set_key --name m1.small --key quota:cpu_period --value 2500
Config Bandwidth limit for instance network traffic
nova-manage flavor set_key --name m1.small --key quota:vif_inboud_average --value 10240
nova-manage flavor set_key --name m1.small --key quota:vif_outboud_average --value 10240

For more information:

OpenStack Introduction to Image Flavors: http://docs.openstack.org/openstack-ops/content/flavors.html

Modifying Flavor Specifications (from Oracle's Installing and Configuring OpenStack in Oracle® Solaris 11.2 document): http://docs.orack-aggregate1le.com/cd/E36784_01/html/E54155/flavoredit.html

Driving in the Fast Lane – CPU Pinning and NUMA Topology Awareness in OpenStack Compute by Steve Gordon, Sr. Technical Product Manager, Red Hat: http://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-openstack-compute/

Extra Specification and Namespaces

extra_specs is an overloaded parameter that contains key-value pairs. If an extra specs key contains a colon (:), anything before the colon is treated as a namespace, and anything after the colon is treated as the key to be matched.

Here is an example of an unscoped key:

nova flavor-key 1 set capabilities:vcpus='>= 6'
nova flavor-key 1 set capabilities:vcpus_used='== 0'
nova flavor-show 1

+----------------------------+-----------------------------------------------------------------------+  
| Property                   | Value                                                                 |  
+----------------------------+-----------------------------------------------------------------------+  
| OS-FLV-DISABLED:disabled   | False                                                                 |  
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                     |  
| disk                       | 0                                                                     |  
| extra_specs                | {u'capabilities:vcpus': u'>= 6', u:capabilities:vcpus_used': u'== 0'} |   
| id                         | 1                                                                     |  
| name                       | m1.tiny                                                               |  
| os-flavor-access:is_public | True                                                                  |  
| ram                        | 512                                                                   |  
| rxtx_factor                | 1.0                                                                   |  
| swap                       |                                                                       |  
| vcpus                      | 1                                                                     |  
+----------------------------+-----------------------------------------------------------------------+  

extra_specs is useful when multiple filters are enabled and conflicts must be avoided.

Filtering Strategies

The Filter Scheduler uses filters to select the host that is ultimately chosen to launch the target guest. There are a lot of filtering options, configuration flexibility, and even the ability to roll your own. In addition to the documentation, the best way to learn about the filters and the way they work is studying the source code. There are a few specific locations to dig into:

From the many filters available, there are a few for which it is important to understand how they work and the method used to configure them.

RamFilter

The OpenStack developer documentation highlights the simplicity of the filters by reviewing the RamFilter (https://github.com/openstack/nova/blob/master/nova/scheduler/filters/exact_ram_filter.py), which requests an exact amount of memory to be required to launch the guest image.

class ExactRamFilter(filters.BaseHostFilter):
    """Exact RAM Filter."""

    def host_passes(self, host_state, filter_properties):
        """Return True if host has the exact amount of RAM available."""
        instance_type = filter_properties.get('instance_type')
        requested_ram = instance_type['memory_mb']
        if requested_ram != host_state.free_ram_mb:
            LOG.debug("%(host_state)s does not have exactly "
                      "%(requested_ram)s MB usable RAM, it has "
                      "%(usable_ram)s.",
                      {'host_state': host_state,
                       'requested_ram': requested_ram,
                       'usable_ram': host_state.free_ram_mb})
            return False

        return True

This class method host_passes returns True if the memory requested by the guest is exactly the amount available to the host being evaluated for selection. A slightly more complex and more useful version of this filter uses the ram_allocation_ratio and compares the virtual RAM to physical RAM allocation ratio, which is at least 1.5 by default.

JsonFilter

This filter allows operators to write rules matching hosts capabilities based on simple JSON-like syntax. The operations for comparing host state properties are “=”, “<”, “>”, “in”, “<=”, and “>=”, and can be combined with “not”, “or”, and “and”. Make sure the JsonFilter is added to the scheduler_default_filters parameter in /etc/nova/nova.conf to enable this functionality.

The example below found in the unit test will filter all hosts with free RAM greater than or equal to 1024 MB and with free disk space greater than or equal to 200 GB.

['and',
['>=', '$free_ram_mb', 1024],
['>=', '$free_disk_mb', 200 * 1024]
]

Many filters use the scheduler_hints parameter for the nova boot command when launching the guest instance.

nova boot --image cirros-0.3.1-x86_64-uec --flavor 1 \
--hint query=['>=','$free_ram_mb',1024] test-instance

ComputeCapabilitiesFilter

The ComputeCapabilitiesFilter will only pass hosts whose capabilities satisfy the requested specifications. All hosts are passed if no extra_specs are specified.

Recalling the earlier discussion of extra specifications and namespace: to avoid conflicts when the filter AggregateInstanceExtraSpecsFilter is enabled, use the namespace capabilities when adding extra specifications.

ImagePropertiesFilter

The ImagePropertiesFilter filters hosts based on properties defined on the instance's image. It passes hosts that can support the specified image properties contained in the instance. Properties include the architecture, hypervisor type, hypervisor version (for Xen* hypervisor type only), and virtual machine mode.

For example, an instance might require a host that runs an ARM*-based processor and QEMU* as the hypervisor. You can decorate an image with these properties by using:

glance image-update $img-uuid --property architecture=x86_64 \
   --property hypervisor_type=qemu

The image properties that the filter checks for are:

  • Architecture: Describes the machine architecture required by the image. Examples are i686, x86_64, arm, and ppc64.
  • hypervisor_type: Describes the hypervisor required by the image. Examples are xen, qemu, and xenapi.
  • hypervisor_version_requires: Describes the hypervisor version required by the image. The property is supported for the Xen hypervisor type only. It can be used to enable support for multiple hypervisor versions and to prevent instances with newer Xen tools from being provisioned on an older version of a hypervisor. If available, the property value is compared to the hypervisor version of the compute host.

To filter the available hosts by the hypervisor version, add the hypervisor_version_requires property on the image as metadata and pass an operator and a required hypervisor version as its value:

glance image-update img-uuid --property hypervisor_type=xen \
   --property hypervisor_version_requires=">=4.3"

hypervisor_type: Describes the hypervisor application binary interface (ABI) required by the image. Examples: xen for Xen 3.0 paravirtual ABI, hvm for native ABI, uml for User Mode Linux paravirtual ABI, and exe for container virt executable ABI.

A host can support multiple hypervisors. For example, a host could define [u'i686', u'qemu', u'hvm'] and [u'x86_64', u'qemu', u'hvm'] and then the image properties for guest instance might be [u'x86_64', u'qemu', u'hvm'], then the guest could be deployed on this host.

Image properties for the guest instance can be defined as subsets. For example, the image properties for a guest is [u'x86_64', u'hvm'] then guest can be deployed on the host whose supported hypervisor was [u'x86_64', u'qemu', u'hvm'].

Filter Weights

The Filter Scheduler uses weights during the evaluation and selection process to give more or less preferential treatment to a host.

The Filter Scheduler weights hosts are based on the configuration option scheduler_weight_classes, which defaults to nova.scheduler.weights.all_weighers, which selects the only weigher available: the RamWeigher. Hosts are then weighted and sorted with the largest weight winning.

Filter Scheduler finds local list of acceptable hosts by repeated filtering and weighing. Each time the Scheduler deploys to a host, the resources are consumed and adjusted accordingly for the next host selection evaluation. This becomes more useful when a large number of instances is requested because the weight is computed for each request.

In the end, Filter Scheduler sorts selected hosts by their weight and provisions instances on them.

Availability Zones and Host Aggregates

Host Aggregation

Host aggregates are a way to group hosts that have a particular feature or capability. The grouping criteria could be as simple as installed memory or CPU type to as complex as NUMA topology. Using host aggregates in OpenStack can be completely arbitrary.

To create a host aggregate we use the nova aggregate-create command:


$ nova aggregate-create rack-aggregate1 
+----+-----------------+-------------------+-------+----------+ 
| Id | Name            | Availability Zone | Hosts | Metadata | 
+----+-----------------+-------------------+-------+----------+ 
| 1  | rack-aggregate1 | None              |       |          | 
+----+-----------------+-------------------+-------+----------+ 

This creates a host aggregate that is exposed to operator as an availability zone.

$ nova aggregate-create rack-aggregate2 tokyo-az 
+----+-----------------+-------------------+-------+----------+ 
| Id | Name            | Availability Zone | Hosts | Metadata | 
+----+-----------------+-------------------+-------+----------+ 
| 2  | test-aggregate2 | test-az           |       |          | 
+----+-----------------+-------------------+-------+----------+ 

This command creates an aggregate within the Tokyo availability zone, rather than the default availability zone. Aggregates can be used to further subdivide availability zones as an optional parameter when launching the guest with the nova boot command.

Add a host to a host aggregate, rack-aggregate2. Since this host aggregate defines the availability zone tokyo-az, adding a host to this aggregate makes it a part of the tokyo-az availability zone.

$ nova aggregate-add-host 2 stack-compute1
Aggregate 2 has been successfully updated.

+----+-----------------+-------------------+---------------------+------------------------------------+ 
| Id | Name            | Availability Zone | Hosts               | Metadata                           | 
+----+-----------------+-------------------+---------------------+------------------------------------+ 
| 2  | test-aggregate2 | tokyo-az          | [u'stack-compute1'] | {u'availability_zone': u'tokyo-az'}| 
+----+-----------------+-------------------+---------------------+------------------------------------+

So availability zones and host aggregates both segregate a group of hosts, but an administrator would use host aggregates to group hosts that have a unique hardware or special performance characteristics.

Host aggregates are not explicitly exposed to operators. Instead administrators map flavors to host aggregates. To do this, administrators set metadata on a host aggregate and match flavor extra specifications. The scheduler then matches guest launch requests for an instance of the given flavor to a host aggregate with the same key-value pair in its metadata. Compute nodes or hosts can be in more than one host aggregate.

Command-Line Interface

The nova command-line tool supports the following aggregate-related commands.

nova aggregate-list

Print a list of all aggregates.

nova aggregate-create <name> [availability-zone]

Create a new aggregate named <name>, and optionally in the availability zone [availability-zone] if specified. The command returns the ID of the newly created aggregate. Hosts can be made available to multiple host aggregates. Be careful when adding a host to an additional host aggregate when the host is also in an availability zone. Pay attention when using the aggregate-set-metadata and aggregate-update commands to avoid user confusion when they boot instances in different availability zones. An error occurs if you cannot add a particular host to an aggregate zone for which it is not intended.

nova aggregate-delete <id>

Delete an aggregate with id <id>

nova aggregate-details <id>

Show details of the aggregate with id <id>

nova aggregate-add-host <id> <host>

Add a host with name <host> to the aggregate with id <id>

nova aggregate-remove-host <id> <host>

Remove the host with name <host> from the aggregate with id <id>

nova aggregate-set-metadata <id> <key=value> [<key=value> ...]

Add or update metadata (key-value pairs) associated with the aggregate with id <id>

nova aggregate-update <id> <name> [<availability_zone>]

Update the name and availability zone (optional) for the aggregate.

nova host-list

List all hosts by service.

nova host-update --maintenance [enable | disable]

Put/resume host into/from maintenance.

Availability Zones

An availability zone is a way to specify a particular location in which a guest should boot.

The most common usage for availability zones is to group together available hosts that are connected to the network. As the number of hosts grows, availability zones may be defined by geographic location.

To specify the availability zone in which your guest will be launched, add the availability-zone parameter to the nova boot command:

nova boot --flavor 2 --image 1fe4b52c-bda5-11e2-a40b-f23c91aec05e \ 
   --availability-zone tokyo-az testinstance 
nova show testinstance 
+-------------------------------------+--------------------------------------------------------------+ 
| Property                            | Value                                                        | 
+-------------------------------------+--------------------------------------------------------------+ 
| status                              | BUILD                                                        | 
| updated                             | 2013-05-21T19:46:06Z                                         | 
| OS-EXT-STS:task_state               | spawning                                                     | 
| OS-EXT-SRV-ATTR:host                | styx                                                         | 
| key_name                            | None                                                         | 
| image                               | cirros-0.3.1-x86_64-uec(64d985ba-2cfa-434d-b789-06eac141c260)| 
| private network                     | 10.0.0.2                                                     | 
| hostId                              | f038bdf5ff35e90f0a47e08954938b16f731261da344e87ca7172d3b     | 
| OS-EXT-STS:vm_state                 | building                                                     | 
| OS-EXT-SRV-ATTR:instance_name       | instance-00000002                                            | 
| OS-EXT-SRV-ATTR:hypervisor_hostname | styx                                                         | 
| flavor                              | m1.bigger (2)                                                | 
| id                                  | 107d332a-a351-451e-9cd8-aa251ce56006                         | 
| security_groups                     | [{u'name': u'default'}]                                      | 
| user_id                             | d0089a5a8f5440b587606bc9c5b2448d                             | 
| name                                | testinstance                                                 | 
| created                             | 2013-05-21T19:45:48Z                                         | 
| tenant_id                           | 6c9cfd6c838d4c29b58049625efad798                             | 
| OS-DCF:diskConfig                   | MANUAL                                                       | 
| metadata                            | {}                                                           | 
| accessIPv4                          |                                                              | 
| accessIPv6                          |                                                              | 
| progress                            | 0                                                            | 
| OS-EXT-STS:power_state              | 0                                                            | 
| OS-EXT-AZ:availability_zone         | tokyo-az                                                     |
| config_drive                        |                                                              | 
+-------------------------------------+--------------------------------------------------------------+

This example specifies that the m1.bigger flavor instance Tokyo data center will be launched in the availability zone. The availability zone for a host is set in the nova.conf file using node_availability_zone. The following options also can be configured for the availability zones in the /etc/nova/nova.conf file:

default_availability_zone = nova

Default compute node availability_zone

default_schedule_zone = None

Availability zone to use when not specified

internal_service_availability_zone = internal

The availability_zone to associate internal services

Administrators are able to optionally expose a host aggregate as an availability zone.

Availability zones are different from host aggregates in that they are explicitly exposed to the operator, and hosts can only be in a single availability zone. Administrators can use default_availability_zone to configure a default availability zone where instances will be scheduled because the user fails to specify one.

The Scheduler and Filters

Overview

Defining Workflow Activities for Deploying a Guest

Host aggregates are a way for schedulers to know where to place guest based on some characteristics. In this example, we want to deploy a guest in a specific rack in the Tokyo data center.

Here is the workflow for using host aggregates:

1. Check whether the scheduler enables host aggregates.

$ cat /etc/nova/nova.conf | grep scheduler_default_filters
scheduler_default_filters=AggregateInstanceExtraSpecsFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter

For this particular host configuration the scheduler will examine the following filters:

  • Are hosts operational and enabled? (ComputeFilter)
  • Are hosts in the requested availability zone? (AvailabilityZoneFilter)
  • Do hosts have sufficient RAM available? (RamFilter)
  • Can hosts service the request? (ComputeFilter)
  • Do hosts satisfy the extra specs associated with the instance type? (ComputeCapabilitiesFilter).
  • Do hosts satisfy any architecture, hypervisor type, or VM mode properties specified on the instance's image properties? (ImagePropertiesFilter.

Additional filters can be found in the Scheduler section of the OpenStack Configuration Reference (http://docs.openstack.org/liberty/config-reference/content/section_compute-scheduler.html).

2. Create the host aggregate:

nova aggregate-create rack-aggregate1 tokyo-az

This command creates a new aggregate in the Tokyo availability zone and creates an id.

+----+----------------+-------------------+-------+----------+ 
| Id | Name           | Availability Zone | Hosts | Metadata | 
+----+----------------+-------------------+-------+----------+ 
| 1  | rack-aggregate1| tokyo-az          |       |          | 
+----+----------------+-------------------+-------+----------+

3. Add the host aggregate characteristics using the rack-aggregate1 id:

nova aggregate-set-metadata 1 fastnic=true

4. Add hosts to aggregate rack-aggregate1 for the scheduler to launch guests.

nova aggregate-add-host 1 styx
nova aggregate-add-host 1 kerberos

+----+------------------+-------------------+--------------------- -+----------------------+ 
| Id | Name             | Availability Zone | Hosts                 | Metadata             | 
+----+------------------+-------------------+-----------------------+----------------------+ 
| 1  | rack-aggregate1  | nova              | [u'styx', u'kerberos']| {u'fastnic': u'true'}| 
+----+------------------+-------------------+-----------------------+----------------------+

5. Create a flavor m1.bigger and apply the rack-aggregate1 property:

nova flavor-create m1.bigger 6 16384 80 4
nova-manage instance_type set_key --name= m1.bigger --key=rack-aggregate1 --value=true

This creates the new flavor and specifies the extra_specs property, as you can see with the flavor-show command:

nova flavor-show m1.biggerc

+----------------------------+----------------------------+ 
| Property                   | Value                      | 
+----------------------------+----------------------------+ 
| OS-FLV-DISABLED:disabled   | False                      | 
| OS-FLV-EXT-DATA:ephemeral  | 0                          | 
| disk                       | 80                         | 
| extra_specs                | {u'fastnic': u'true'}      | 
| id                         | 42                         | 
| name                       | m1.bigger                  | 
| os-flavor-access:is_public | True                       | 
| ram                        | 16384                      | 
| rxtx_factor                | 1.0                        | 
| swap                       |                            | 
| vcpus                      | 4                          | 
+----------------------------+----------------------------+

6. Operators can use the flavor to ensure their guests are launched in rack-aggregate1:

$ nova boot --image f69a1e3e-bdb1-11e2-a40b-f23c91aec05e --flavor m1.bigger

Now that the test-az availability zone has been defined and contains one host, a user can boot an instance and request this availability zone.

$ nova boot --flavor 10 --image 64d985ba-2cfa-434d-b789-06eac141c260 \    --availability-zone tokyo-az testinstance
$ nova show testinstance

+-------------------------------------+----------------------------------------------------------------+ 
| Property                            | Value                                                          | 
+-------------------------------------+----------------------------------------------------------------+ 
| status                              | BUILD                                                          | 
| updated                             | 2015-05-21T11:36:023                                           | 
| OS-EXT-STS:task_state               | spawning                                                       | 
| OS-EXT-SRV-ATTR:host                | devstack                                                       | 
| key_name                            | None                                                           | 
| image                               | cirros-0.3.1-x86_64-uec (64d985ba-2cfa-434d-b789-06eac141c260) | 
| private network                     | 10.0.0.2                                                       | 
| hostId                              | f038bdf5ff35e90f0a47e08954938b16f731261da344e87ca7172d3b       | 
| OS-EXT-STS:vm_state                 | building                                                       | 
| OS-EXT-SRV-ATTR:instance_name       | instance-00000002                                              | 
| OS-EXT-SRV-ATTR:hypervisor_hostname | styx                                                           | 
| flavor                              | m1.bigger (10)                                                 | 
| id                                  | 107d332a-a351-451e-9cd8-aa251ce56006                           | 
| security_groups                     | [{u'name': u'default'}]                                        | 
| user_id                             | d0089a5a8f5440b587606bc9c5b2448d                               | 
| name                                | testinstance                                                   | 
| created                             | 2015-05-21T11:36:023                                           | 
| tenant_id                           | 6c9cfd6c838d4c29b58049625efad798                               | 
| OS-DCF:diskConfig                   | MANUAL                                                         | 
| metadata                            | {}                                                             | 
| accessIPv4                          |                                                                | 
| accessIPv6                          |                                                                | 
| progress                            | 0                                                              | 
| OS-EXT-STS:power_state              | 0                                                              | 
| OS-EXT-AZ:availability_zone         | tokyo-az                                                       | 
| config_drive                        |                                                                | 
+-------------------------------------+----------------------------------------------------------------+ 

The above examples show how host-aggregates provide an API-driven mechanism for cloud administrators to define availability zones. The other use case that host aggregates serves is a way to tag a group of hosts with a type of capability. When creating custom flavors, you can set a requirement for a capability. When a request is made to boot an instance of that type, it will only consider hosts in host aggregates tagged with this capability in its metadata.

We can add some metadata to the original host aggregate we created that was not also an availability zone, rack-aggregate1.

$ nova aggregate-set-metadata 1 fastnic=true
Aggregate 1 has been successfully updated.

+----+-----------------+-------------------+-------+----------------------------+ 
| Id | Name            | Availability Zone | Hosts | Metadata                   | 
+----+-----------------+-------------------+-------+----------------------------+ 
| 1  | rack-aggregate1 | None              | []    | {u'fastnic': u'true'}      | 

The scheduler in this case knows the following:

  • flavor m1.bigger requires fastnic to be true
  • all hosts in the rack-aggregate1 have fastnic=true
  • the kerberos and styx are hosts in the rack-aggregate1 

The scheduler starts the new guest on the host that is the most available of the two.

There are some other considerations of when to use host aggregates or availability zones. An operator of OpenStack can only use availability zones, and the administrator is the only one that can set up host aggregates. Host aggregates most likely need to be set up ahead of time. Here are some guidelines of when to use each construct.

  • If there is a physical separation between hosts, use availability zones.
  • If there is a hardware capabilities separation between hosts, use host aggregates.
  • If hosts within a particular grouping are spread across multiple locations, use host aggregates to group together hosts from multiple availability zones by creating a host aggregate with the desired metadata in each zone.
  • If operators want to group guests, use availability zones because they can be specified without administrative assistance.

Availability zones enable operators to choose from a group of hosts. Host aggregates enable an administrator to specify the way host hardware is utilized.  

The Nova scheduler is responsible for determining which host or compute nodes to launch a guest instance based on a series of configurable filters and weights. In the next release, Liberty, work is underway to decouple the scheduler from Nova and create an object for image metadata. The current scheduler framework plays a significant role on resource utilization.

Also new is the caching scheduler, which uses the existing facilities for applying scheduler filters and weights but caches the list of available hosts. When a user request is passed to the caching scheduler it attempts to perform scheduling based on the list of cached hosts, with a view to improving scheduler performance.

A new scheduler filter, AggregateImagePropertiesIsolation, has been introduced. The new filter schedules instances to hosts based on matching namespace-scoped image properties with host aggregate properties. Hosts that do not belong to any host aggregate remain valid scheduling targets for instances based on all images. The new Nova service configuration keys aggregate_image_properties_isolation_namespace and aggregate_image_properties_isolation_separator are used to determine which image properties are examined by the filter.

Setting Up Filtering Use Cases Specific for the Intel® Architecture Platform
Trusted Computing Group

OpenStack Folsom release introduced Trusted Computing Group (TCP) where an attestation server is used when launching a guest instance to determine guest authorization for a target host. See the following:

https://github.com/openstack/nova/blob/master/nova/scheduler/filters/trusted_filter.py

1. Set the following value in nova.conf

scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler,TrustedFilter

2. Add the trusted computing section to nova.conf

[trusted_computing]       
server=10.10.10.10       
port=8181       
server_ca_file=/etc/nova/ssl.10.1.71.206.crt       
api_url=/AttestationService/resources/PoolofHosts       
auth_blob=i-am-openstack 

3. Add the "trusted" requirement to an existing flavor by running

nova-manage instance_type set_key m1.tiny trust:trusted_host trusted

4. Restart the nova-compute and nova-scheduler service.

PCI Passthrough and SR-IOV

Please make sure your compute node has PCI passthrough support enabled per http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM

Configure Nova

  • Compute node:

pci_passthrough_whitelist: White list of PCI devices available to VMs.

For example:

pci_passthrough_whitelist=[{ "vendor_id":"8086","product_id":"1520"}]

defines all PCI devices in the platform with vendor_id as 0x8086 and product_id as 0x1520 will be assignable to the instances.

  • Controller node:

pci_alias: An alias for a PCI passthrough device requirement.

For example:

pci_alias={"vendor_id":"8086", "product_id":"1520", "name":"a1"}

defines pci alias 'a1' to present a request for PCI devices with vendor_id as 0x8086 and product_id as 0x1520.

  • Scheduler node:

enable pci devices filter.

For example:

scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler scheduler_available_filters=nova.scheduler.filters.all_filters scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter 

Create a flavor

Note: You don't need this step for SR-IOV NIC support.

For additional information, readhttps://wiki.openstack.org/wiki/SR-IOV-Passthrough-For-Networkingfor passing a PCI device request through port creation.

Configure a flavor that request pci devices. For example:

nova flavor-key  m1.large set  "pci_passthrough:alias"="a1:2"

Update a flavor that requires two PCI devices, each with vendor_id as 0x8086 and product_id as 0x1520.

Create a VM

For additional information, read https://wiki.openstack.org/wiki/SR-IOV-Passthrough-For-Networking for passing a PCI device request port-id.

nova boot --image new1 --key_name test --flavor m1.large 123

Create a VM with the PCI requirements. The image is the image that contains the driver to the assigned devices; the “test” is the key pair.

Check assignment instance

nova show 123

To check VM status until it becomes active.

nova ssh --private 123 -i test.pem

to log in to the guest; 'lspci' will show you all the devices.

How to check PCI status with PCI API patches

The PCI API patches extends the servers/OS hypervisor to show PCI information for instance and compute nodes, and also provides a resource endpoint to show PCI information.

  • Get the patches from https://github.com/yjiang5/pci_api.git, apply the patch or copy the extension plug-in files. Update the policy file with two new policies and restart the nova api service.

"compute_extension:instance_pci": "",    "compute_extension:pci": "rule:admin_api",

  • Try the PCI API.

nova pci-list node_id

shows all PCI devices on a compute node with node_id as the id. (Use nova hypervisor-list to get all compute_node in the system.)

nova list

will get the pci assignment to a instance, the 'os-pci:pci' contains the id of the PCI device.

nova pci-show id

shows the details of a PCI device.

PCI passthrough use notes

  • alias "device_type"

alias define with device_type is optional; currently there is no way to discover the type of PCI device from the hypervisor, so don't define device_type in alias now:

pci_alias={"vendor_id":"8086", "product_id":"1520", "name":"a1, "device_type":"NIC""}

If an alias with device_type defined in nova.conf, device_type will be part of specification of pci request, and will fail to schedule a compute node to meet this request. This behavior might need to be improved with an enhancement scheduler, which can be configured to ignore device type.

For more information:

https://wiki.openstack.org/wiki/Pci_passthrough

https://wiki.openstack.org/wiki/SR-IOV-Passthrough-For-Networking

Summary

We hope the information and examples in this paper have helped you understand how EPA Flavors, Host Aggregates, and Availability Zones can be used to optimize your OpenStack deployment. Check out the articles and videos in the next section to learn more.

For more information:

Whitepapers and Related Documentation

OpenStack* Enhanced Platform Awareness - Enabling Virtual Machines to Automatically Take Advantage of Advanced Hardware Capabilities (https://01.org/sites/default/files/page/openstack-epa_wp_fin.pdf).

Developing High-Performance, Flexible SDN & NFV Solutions with Intel® Open Network Platform Server Reference Architecture (http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/software-defined-networks-onp-paper.pdf)

Video Lectures

Divide and Conquer: Resource Segregation in the OpenStack Cloud (https://www.youtube.com/watch?v=H6I3fauKDb0)

OpenStack Enhancements to Support NFV Use Cases (https://www.youtube.com/watch?v=5hZmE8ZCLLo)

Deliver Cloud-ready Applications to End-users Using the Glance Artifact Repository (https://www.youtube.com/watch?v=mbRrWFMBlLM)

Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.