Telco-Grade Service Chaining

Introduction

Service chaining is an emerging set of technologies and processes that enable telecom service providers to configure network services dynamically in software without having to make changes to the network at the hardware level. Network Function Virtualization (NFV) is an initiative to virtualize and cloudify telecom services that are currently being carried out by proprietary software and hardware.

In order for service chaining to be fit for purpose for NFV deployments, better methods are required to improve resilience and availability of service chains.

To highlight these gaps, a team from the NPG Communications Infrastructure Division in Intel Shannon (Ireland) developed two proof-of-concept (PoC) demos. These demos were presented in the Intel booth at NFV World Congress 2016.

  • Deterministic Service Chaining Using a Virtualized Service Assurance Manager (vSAM)
  • Service Chain Performance Monitoring Using Network Service Header (NSH)

While implementations for demonstration purpose were developed with prototype level code, it is hoped to further advance these concepts with open-source implementations. Integration of vSAM into the OpenStack Tacker project is under consideration with PoC partners as well as contribution of deterministic service chaining logic into the Open Daylight project. As regards the Service Chain Performance Monitoring PoC, it is planned to contribute the key components for NSH time-stamping as sample applications into the Data Plane Development Kit (DPDK) open-source project, pending approval of the IETF draft.

Concepts were demonstrated for Virtual Gi-LAN (vGi-LAN) but may also be applied to other use cases such as Virtual Customer Premises Equipment (vCPE) and Mobile Edge Cloud (MEC).

Deterministic Service Chaining Using vSAM

A Gi-LAN is the part of the network that connects the mobile network to data networks, such as the Internet and operator cloud services (see Figure 1). The Gi-LAN contains assorted appliances running applications such as Deep Packet Inspection (DPI), Firewall, URL Filtering, Network Address Translation (NAT), Video and Web optimizers, Session Border Controllers, and so on. In an NFV deployment these service functions run as Virtual Network Functions (VNFs), and it is advantageous for service provides to chain these VNFs as service chains for flexibility and maintainability.

Gi-LAN overview

Figure 1:Gi-LAN overview

In order for service chaining for NFV use cases such as vGi-LAN to meet the 5 x 9s reliability requirement, better methods are required to monitor the performance of service chain entities. Furthermore it is proposed that an open API is required to inform a controller of significant events that impact service chain performance. Finally, intelligent controllers with deterministic logic are needed to perform remedial actions based on such an API.

Figure 2 shows an overview of the vSAM concept, which may address these requirements.  vSAM performs real-time monitoring of key performance indicators (KPIs) such as CPU usage, memory usage, network i/o and disk i/o for VNFs, and bandwidth usage, packet-loss, delay and delay variation KPIs for WAN links between NFV sites. It also provides an API which enables an intelligent service chaining controller to use this KPI information to ensure that service chains utilize the optimum path across multiple sites

The vSAM PoC was implemented as an ETSI NFV PoC in collaboration with Telenor, Brocade, and Creanord. See reference [2] in the Links section for the ETSI wiki page for this PoC, which includes the PoC proposal and final report.

vSAM overview

Figure 2:vSAM overview

Figure 3 shows a more detailed view of how vSAM would operate in a network, taking the example of three Gi-LAN sites.

For the purpose of the PoC, three Gi-LAN sites were simulated with traffic initiated through Gi-LAN A. The VNFs on the Gi-LAN B and C sites effectively act as hot backups for Gi-LAN A, and they are being continuously monitored for their suitability as alternate paths for Gi-LAN A service chains for voice and video.

vSAM proof-of-concept architecture overview

Figure 3:vSAM proof-of-concept architecture overview

The key points of the architecture are as follows:

  • An implementation of vSAM was co-developed by Brocade and Intel. vSAM uses southbound HTTP REST APIs for monitors, which provide KPI data related to the health of WAN links between sites and VNF resource usage. It also has a northbound API to a service chaining controller on which it can send notifications in case of KPI violations
  • A virtual network probe product from Creanord was integrated with vSAM through a HTTP REST-API. This uses OAM protocols such as TWAMP, Y.1731, and UDP Echo to measure latency, jitter, and packet loss between Gi-LAN sites.
  • A lightweight VNF resource-usage monitor based on the collectd libvirt plug-in was implemented to track KPIs for CPU usage, memory usage, disk i/o and network interface load for each VNF in a Gi-LAN site. This component was also integrated with vSAM through a HTTP REST-API.
  • An intelligent SFC (Service Function Chaining) controller was developed by Brocade to support deterministic service chaining based on vSAM.
  • A MongoDB* NoSQL database was used as the vSAM KPI data repository. VNF and site interconnect WAN link KPI data for all three sites is written to this repository by the vSAM instance on each site.

Figures 4 and 5 show the infrastructure view for the vSAM demo system simulation of three Gi-LANs as deployed in the Intel lab. The system is deployed on three Intel® Wildcat Pass Servers with Dual Xeon E5-2699 v3 18-core (Haswell) CPUs and Intel® 82599 10Gbe (Niantic) NICs. The servers are interconnected by a Brocade TurboIron* 24 x 10Gbe Switch.

Overview of the vSAM proof-of-concept demo infrastructure

Figure 4:Overview of the vSAM proof-of-concept demo infrastructure

Overview of the vSAM proof-of-concept demo infrastructure – Gi-LAN A

Figure 5:Overview of the vSAM proof-of-concept demo infrastructure – Gi-LAN A

The key points of the infrastructure are as follows:

  • Brocade Vyatta* virtual routers were used to simulate data center routers, such as Data Center Interconnect (DCI), Data Center Edge (DCE), Internet Peering Router (IPR), and the Gilan vRouter simulates the routing of traffic coming from the mobile network via the Packet-Gateway.
  • The Creanord EchoVault* vProbe is attached to the DCI vRouter for sending test traffic between Gi-LAN sites to monitor the health of the site interconnect links.
  • The host shown in the diagram is a virtual machine (VM) that simulates a physical server. VNFs run as Linux* containers under this VM./li>
  • OSS components (on the right side of the host VM) run as a mix of Linux containers and VMs and communicate over the OSS control plane subnet.

Data-plane traffic is routed as follows (uplink direction):

  • The Gi-LAN vRouter performs the first level of classification to determine whether traffic should be routed through service chains, that is, if traffic is detected as video or voice.
  • The DCE vRouter determines whether traffic should be routed on the video or voice service chains and forwards traffic to the first VNF node in the service chain.
  • If the service chain can be fulfilled by local VNFs, the service-chain path will be a number of hops along the local VNFs.
  • If one of the local VNFs is unavailable or overloaded, the service chain may be re-directed by the DCE vRouter to another site’s VNF and the service-chain path is completed there.

    NOTE: the next hop for a VNF’s uplink traffic is determined by its IP routing table, which is set by the service chaining controller via a HTTP REST API.
  • The last node in the service chain routes traffic out to the IPR and thus out to the Internet.

For the downlink direction, traffic passes through vRouters and service chain VNFs in the reverse direction. The service chaining controller also sets the next hop in VNFs for downlink traffic.

A graphical user interface (GUI) was developed in collaboration with Armour Interactive for the purpose of demonstrating PoC use-cases.

The GUI screen represents three Gi-LAN sites and displays the status of the VNFs on all sites, the status of the WAN links between sites, and the status of Gi-LAN A service chains for voice and video. The information that renders the status of these entities to the screen is being read from the vSAM repository, which holds KPI information for the three sites. A HTTP REST-API was developed to allow VNFs and inter-site links to be impaired.

Figures 6–8 show screenshots for the key use cases.

The first screenshot (see Figure 6) demonstrates normal conditions, when all local VNFs are healthy. In this case the video (blue line) and voice (pink line) service chains are fulfilled within the local Gi-LAN A site.

As Figure 6 shows, video traffic is flowing through the Content Delivery Network (CDN) in Gi-LAN A.

vSAM GUI screenshot - normal conditions

Figure 6:vSAM GUI screenshot - normal conditions

The screenshot in Figure 7 shows how the CDN VNF on Gi-LAN A can be impaired using the impairment API.

The impairment API call invokes Linux stress commands to be run on the CDN VNF on Gi-LAN A to artificially increase CPU usage. The CPU usage KPI now exceeds the preconfigured threshold value in vSAM, so it sends a notification to the service chaining controller of a KPI violation event.

The service chaining controller analyzes KPI data in the vSAM repository and determines that the video service chain path should now be routed through the CDN on Gi-LAN B.

As Figure 7 shows, video traffic is flowing through the CDN in Gi-LAN B.

vSAM GUI screenshot – Gi-LAN A CDN impaired

Figure 7:vSAM GUI screenshot – Gi-LAN A CDN impaired

In the Figure 8 GUI screenshot, again using the impairment API, the WAN link between Gi-LAN A and Gi-LAN B is impaired. The API call invokes Linux traffic control commands to be run on Gi-LAN A to impair the latency on the vProbe link with Gi-LAN B.

The latency KPI on the link between Gi-LAN A and Gi-LAN B now exceeds the preconfigured threshold value in vSAM, which triggers a notification by vSAM to the service-chaining controller of a KPI violation event.

The service-chaining controller analyzes KPI data in the vSAM repository and determines that the video service chain path should now be routed through the CDN on Gi-LAN C.

As Figure 8 shows, video traffic is flowing through the CDN in Gi-LAN C.

vSAM GUI screenshot – Gi-LAN A CDN and Gi-LAN A-B link impaired

Figure 8:vSAM GUI screenshot – Gi-LAN A CDN and Gi-LAN A-B link impaired

Service Chain Performance Monitoring by NSH

A second PoC was developed and demonstrated at NFV World Congress 2016 to illustrate how Network Service Header (NSH) can be used for real-time inline performance monitoring of service chains.

NSH is a new protocol for service-chaining which enables information about a service chain to be carried as headers in the actual data packet. NSH supports user-defined metadata, which the PoC team used to define a header structure for packet time-stamping. This enables packets to be time-stamped at significant points as they traverse a service chain such as VNF ingress and egress points.

An IETF draft has been submitted for this NSH time-stamping feature. See reference [4] in the Links section for this IETF draft.

Figure 9 shows an overview of the Service Chaining Performance Monitoring demo system. The use case of Gi-LAN service chaining is again used to demonstrate this concept, with three Gi-LAN sites simulated on three Intel® Wildcat Pass Servers with Dual Xeon E5-2699 v3 18-core (Haswell) CPUs. A fourth Haswell server is used for running NSH time-stamping applications such as the controller, database, REST API and GUI.

Overview of service chain performance monitoring demo

Figure 9:Overview of service chain performance monitoring demo

A traffic generator is continuously generating a preconfigured set of flows through service chains that traverse VNFs on the three Haswell servers shown in the figure. Each server uses two Intel® X710 4x10Gbe (Fortville) NICs with modified firmware supporting NSH filtering for Service Function platforms.

VNFs are simulated by Data Plane Development Kit (DPDK) sample applications with Fortville SR-IOV network interfaces. Service chains are set up by statically configuring the next hop for a chain of VNFs

DPDK-based IEEE 1588 Precision Time Protocol (PTP) is used to synchronize time across all VNFs with micro-second precision. The master PTP thread runs in the NSH-Time-stamping Gateway node shown in Figure 9, and PTP client threads running in each VNF synchronize their time from this. See reference [6] in the Links section for further detail on the DPDK PTP feature.

The NSH Time-stamping Gateway node as shown in Figure 9 provides an API that can be used to start and stop insertion of time-stamping control headers into packets. This informs VNFs to time-stamp packets at their ingress and egress points as packets traverse a chain.

The last VNF node in a service chain forwards the NSH time-stamped packet to the Time-stamp Database (TS-DB) node as shown in Figure 9. The TS-DB node strips the time-stamp information from the packet and writes it to a MongoDB* database. Database querys based on service chain ID or VNF ID can thus be run on this database to determine delay along a service chain or within a VNF.

For the PoC demo, an impairment API was implemented to simulate performance degradation of a VNF or a vLink in order to show how this is detected by Service Chain Performance Monitoring.

Again a GUI was developed in collaboration with Armour Interactive to demonstrate PoC use-cases.

Service chaining performance monitoring GUI – Hop-by-hop graph for normal conditions

Figure 10:Service chaining performance monitoring GUI – Hop-by-hop graph for normal conditions

Similarly to the vSAM GUI, a view of three Gi-LAN sites is presented. For this PoC, service chains are static as the objective is to demonstrate the concept of service-chain performance monitoring using NSH. Both the NSH-TS Gateway API for NSH time-stamping of flows and the impairment API can be called from the GUI, so that specific NSH time-stamping tests can be managed. A HTTP TEST API was also implemented to allow the GUI to gather service chain timestamp data from the TS-DB in order to render the service chain performance graph as seen in Figure 10.

The GUI screenshot in Figure 10 shows the hop-by-hop view for all three service chains (Chain 1, Chain 2, and Chain 3). The graph has been rendered by NSH time-stamping data from the TS-DB database. The Y-axis of the graph is the delay in time as packets traverse the service chain and the X-axis shows the VNFs that packets traverse along the chain. The dots on the Y-axis show the delay between the ingress and egress point in a VNF, while the dashed line indicates the delay in the vLinks between VNFs.

Thus, a delay in VNF processing or the vLinks between VNFs can be diagnosed in real time.  This is illustrated in the GUI screenshot in Figure 11 by using the VNF impairment API to artificially insert a delay into VNF processing in the SBC VNF on Gi-LAN B (on Chain 1) and the FW VNF on Gi-LAN C (on Chain 3).

The effect on the service chain performance graph of the increased delay in VNF processing for Gi-LAN B SBC and Gi-LAN C FW is immediately visible in the longer line between VNF ingress and egress points on the Y-axis.

Service chaining performance monitoring GUI – Hop-by-hop graph for impaired VNFs

Figure 11:Service chaining performance monitoring GUI – Hop-by-hop graph for impaired VNFs

In the initial phase of development the NSH time-stamping data is just used to render service chain performance graphs on a GUI. However an API could be provided to vSAM to monitor service chain KPI thresholds and to inform a controller to adapt service chains to address performance hotspots.

Conclusion

The Telco-grade Service Chaining PoC team received a lot of positive feedback on the demos presented at NFV World Congress 2016 from several service providers as well as software and equipment vendors. The general feedback was that the concepts are innovative and will enhance service assurance for NFV deployments.

In particular, customers agreed that to run service chains in a production environment with 5 x 9s reliability this minimal level of monitoring of service chain entities (and possibly more) is required.

The concepts presented in the demos described here enable new ways to address the stringent requirements for service assurance in NFV deployments.

Links

  1. NFV World Congress 2016 keynote on Telco-grade Service Chaining (Rory Browne)
    http://www.layer123.com/download&doc=Intel-0416-Browne-Telco_Grade_Service_Chaining
  2. ETSI NFV PoC wiki page - Virtualized Service Assurance Management (vSAM) in vGi-LAN:
    http://nfvwiki.etsi.org/index.php?title=Virtualised_service_assurance_management_in_vGi-LAN
  3. Brocade blog on vSAM:
    http://community.brocade.com/t5/SDN-NFV/Service-Aware-Transport-for-Multi-site-NFV-Resiliency/ba-p/84943
  4. NSH Time-stamping IETF Draft:
    https://tools.ietf.org/html/draft-browne-sfc-nsh-timestamp-00
  5. NSH IETF Draft:
    https://tools.ietf.org/pdf/draft-ietf-sfc-nsh-04.pdf
  6. Data Plane Development Kit (DPDK)
    http://dpdk.org

About the Author

Brendan Ryan is a senior software engineer in the Communication Infrastructure Division of Intel’s Network Platform Group (NPG), based in Intel Shannon (Ireland). Brendan has over 20 years’ experience in telecoms software development and has recently been working on PoC development and customer enablement towards the adoption of SDN and NFV technologies in telecoms networks.

For more complete information about compiler optimizations, see our Optimization Notice.