Case Study - Authentication and Authorization for the Autonomous Vehicle Data Center Platform

Abstract

The Autonomous Vehicle Data Center (AVDC) platform supports research and development of technology for the fully autonomous car. Platform security involves controlling access to the data, both of users and the petabytes (PB) of sensor data, the various processes such as machine learning and simulation tasks, and the physical resources of compute, storage, and networking. The whole system must be protected from accidental and/or malicious tampering and hijacking. This case study discusses the factors that influenced our choice to implement an authentication and authorization solution.

AVDC Platform

The AVDC platform handles data ingestion, storage, machine-learning training and inference, simulation, and web access. The web portal provides the ability to launch and monitor tasks including manual data labeling tools and monitor progress of various tasks. Vehicles drive into an ingestion garage where data recorded during a drive are uploaded into the data center. Depending on the variety and number of sensors contained in a vehicle, up to 20 terabytes (TB) of data may be recorded per vehicle per hour.

The platform employs an Apache Hadoop* ecosystem to provide big data support and a microservice architecture to support data ingestion, machine learning, and simulation tasks. To provide its security needs, we needed an authorization and authentication mechanism that works across both Hadoop and microservice solutions.

Autonomous Vehicles Data Center platform components and tasks
Figure 1. Autonomous Vehicles Data Center platform - components and tasks

Existing Authentication and Authorization Solutions

JSON Web Token

JSON Web Token (JWT) is an open standard that defines a compact and self-contained way to securely transmit information between parties as a JSON object.1, 2 A JWT concatenates header, payload, and signature information "header.payload.signature."

The digital signature could be a shared secret (symmetric cryptography using HMAC3 algorithm) or public-private key pair (asymmetric cryptography using RSA algorithm). The signature can be used to both verify integrity of the claim and its authorship. Additionally, the token can be encrypted to provide privacy to the claims granted.

Next, we explore authentication and authorization solutions that use JWT. The service itself is deployed in high availability (HA) mode.

OAuth 2

OAuth 2 is an open authorization framework accessible via HTTP that follows a delegation model to an authorization service, which issues tokens that capture access scope, type, and validity time interval among other attributes. OAuth 2 provides authorization flows for web, desktop, and mobile applications4,5.

The single authentication service could be shared by multiple applications, such as a Twitter*, email etc. With OAuth 2, the users are registered with the Authorization Service (or with a backend identity service that the Authorization Service uses), as opposed to being registered with each of the applications, which in turn reduces the number of personal data exposure points.

OAuth 2 supports multiple grant types, where a grant determines access type. Some examples of grant type are: Authorization Code, Implicit, Password, Client Credentials, Device Code, and Refresh Token6. For example, web applications, which typically desire user awareness implemented by challenging the user for username and password input, employ Authorization Code type grants. Longer running microservices that support used-initiated web services could be transparent to the user, requiring only an implicit grant.

Apache Knox*

The Apache Knox Gateway*7 is a system that provides centralized authentication, access, and auditing for Apache Hadoop* services. The single entry point implies the need to open only a single firewall port to access the entire Hadoop cluster; in addition, it hides the Hadoop cluster topology from attackers. Knox can be configured to work with an LDAP/AD8 backend for identity and access credentials.

Comparison of differences between OAuth 2 and Apache Knox*
PROS

OAuth 2

  • Industry standard, very flexible.
  • Works with different resources types and protocols.
  • Supports application-determined granular access control based on user roles.
  • Supports multilayer authorization control.
  • Authorization delegation model reduces user data exposure.

Apache Knox*

  • Protects and controls access to Hadoop services possibly across multiple clusters.
  • Clients need to interact with a single service across their Hadoop ecosystem.
CONS

OAuth 2

  • Authorization Service single point of failure; could bring down access to multiple applications. Typically deployed in high availability (HA) mode with load balancing.

Apache Knox*

  • Knox Gateway single point of failure; could deny access to all dependent Hadoop services. Deployed in HA mode.
  • Focused on Hadoop ecosystem.
  • Requires development of custom plugins for non-Hadoop applications.
  • Microservices can be accessed directly if their endpoints are known.
  • Low granularity access control.
  • Difficult to provide multilayered authorization control.

AVDC-Adopted Authentication and Authorization Solution

Before we introduced microservices on our platform, we adopted Knox for authentication and authorization. However, the custom plugins that we needed to develop for our microservices soon became development burden that prompted the decision to replace Knox with OAuth 2. OAuth's uniform token support across HTTP and other resources was generally a more applicable solution. Further, it supported the ability for individual applications to define fine-grained access control. For example, a single user might possess an admin role with respect to an application, while with respect to another application just read access and with yet another application perhaps no access. OAuth essentially provides the ability to define multiple roles and what those roles translate to with respect to each application.

A V D C authentication and authorization data flow
Figure 2. AVDC authentication and authorization data flow

The following steps illustrate control flow on the AVDC platform.

  1. User provides credentials to the web portal. These are sent to the Authorization Service.
  2. The Authorization Service contacts the Active Directory instance to establish whether the credentials are valid.
  3. The Authorization Service returns a JWT signed using its private key if the credentials are valid.
  4. The return token is saved in the web portal for the duration of the session.
  5. When any service on the AVDC platform (for example, a data search request, launching a machine-learning training task, or other) is accessed through the web portal, the token saved in the session is forwarded along with the request.
  6. The AVDC Authentication and Authorization service retrieves the token and establishes whether the signature is valid using the Authorization Service's public key. If it is valid, n access scope-related information in the token payload is used by the respective applications to determine accessible functionality and the same provided (step 6a in Figure 2). For Cloudera*-based services, the request is passed on with the token stripped and replaced with the Kerberos*9 principal and the keytab associated with the service (step 6b in Figure 2).

The AVDC platform uses an LDAP/AD identity backend to maintain user information and access capabilities. The identity service returns Kerberos tokens in JWT format. The token signature is based on an RSA public-private key to avoid having to deal with shared secrets and protecting them on each of the registered applications. Instead, Authentication Service's public key was distributed to each of the registered applications, to enable them to verify the token signature.

Summary

The AVDC platform uses the OAuth 2 authentication and authorization framework in conjunction with JWT with RSA public private key signatures because it provides ease of use across both Hadoop resources and microservices. OAuth 2 also facilitates fine-grained access control, enabling our platform to provide a rich set of access capabilities across applications that span data ingestion, machine learning, and simulation.

References

1. Introduction to JSON Web Tokens

2. 5 Easy Steps to Understanding JSON Web Tokens

3. Hash-based message authentication codes (HMAC)

4. Introduction to OAuth 2

5. OAuth 2 Authorization Framework

6. A Guide to OAuth 2 Grants

7. Announcing Apache Knox 1.0.0!

8. LDAP user authentication using Microsoft Active Directory - IBM

9. Kerberos Principals and Keytabs - Cloudera

For more complete information about compiler optimizations, see our Optimization Notice.