A Kerberos-Based Big Data Security Solution

Big Data Cover Image
Alibaba improved its Cloud E-MapReduce (EMR) and Cloud HBase user identity information management, with lower maintenance costs, by using a new authentication framework developed by Intel. The Hadoop* Authentication Service (HAS) is a pluggable authentication framework designed to help organizations integrate existing enterprise identity management systems into Kerberos, a widely-used, secure network authentication protocol. 
HAS is based on the Apache Kerby implementation of Kerberos, and Intel will contribute it to the project for release in Apache Kerby 2.0. In the Hadoop big data ecosystem, Kerberos is the only built-in, secure, user authentication mode. Most open source data components support Kerberos and can enable Kerberos authentication for services and users. However, Kerberos authentication on big data platforms brings along two challenges:
  1. The support provided by Java* Development Kit (JDK) / Java Runtime Environment (JRE) lacks complete encryption and checksum types, and the Generic Security Service Application Program Interface (GSSAPI) / Simple Authentication and Security Layer (SASL) layer is hidden, making it difficult to change and add functions. 
  2. When it is used by Hadoop, Kerberos does not support other authentication mechanisms except password authentication, making it is difficult to connect an existing identity authentication system to the Kerberos authentication flow.
HAS aims to address those challenges by providing a complete authentication solution for the Hadoop open source ecosystem. HAS is based on a Java implementation of the Kerberos protocol, and by integrating with existing authentication and authorization systems, HAS supports other authentication modes besides Kerberos on Hadoop/Spark*. In addition, it does not require separate maintenance of the identity information, reducing complexity and risks.  
Enterprise Identity HAS and Hadoop Diagram
HAS uses the Key Distribution Center (KDC) provided by Apache Kerby (a Java version of Kerberos) to efficiently implement a new authentication solution for the Hadoop open source big data ecosystem.
HAS offers these key advantages:
  1. Hadoop services use the original Kerberos authentication mechanism. Counterfeit nodes cannot communicate with nodes inside the cluster because counterfeits do not obtain key information in advance. Malicious use of the Hadoop cluster is prevented.
  2. Hadoop users can continue to login using a familiar authentication mode. HAS is compatible with the MIT Kerberos protocol. Users can also be authenticated by using passwords or Keytab before accessing corresponding services.
  3. The HAS plugin mechanism connects to the existing authentication system, and multiple authentication plugins can be implemented based on user requirements.
  4. Security administrators don’t need to synchronize user account information to the Kerberos database, reducing maintenance costs and information leakage.
HAS Client and Server Diagram
HAS is compatible with the Kerberos protocol, so all components in the Hadoop ecosystem can use the HAS-provided Kerberos authentication mode. 
HAS provides a series of interfaces and tools to help simplify deployment. It also provides interfaces to help users implement plugins to integrate Kerberos with other user identity management systems. 
At present, HAS supports ZooKeeper, MySQL, and LDAP. HAS will be contributed to the Apache Kerby project. According to the community plan, the HAS function will be released in Kerby 2.0.
For more information, read the technical white paper, “Big Data Security Solution Based on Kerberos” posted on the Intel Developer Zone.
For more complete information about compiler optimizations, see our Optimization Notice.