Installing Apache Zeppelin* on Cloudera Distribution of Hadoop*

Published: 07/27/2015, Last Updated: 07/27/2015

Data science is not a new discipline. However, with the growth of big data and adoption of big data technologies, the request for better quality data has grown exponentially. Today data science is applied to every facet of life—product validation through fault prediction, genome sequence analysis, personalized medicine through population studies and Patient 360 view, credit card fraud-detection, improvement in customer experience through sentiment analysis and purchase patterns, weather forecast, detecting cyber or terrorist attacks, aircraft maintenance utilizing predictive analytics to repair critical parts before they fail, and many more. Every day, data scientists are detecting patterns in data and providing actionable insights to influence organizational changes.

The data scientist’s work broadly involves acquisition, cleanup, and analysis of data. Being a cross-functional discipline, this work involves communication, collaboration, and interaction with other individuals, internal and possibly external to your organization. This is one reason why the “notebook” features in data analysis tools are gaining popularity. They ease organizing, sharing, and interactively working with long workflows. IPython* Notebook is a great example but is limited to usage of Python* language. Apache Zeppelin* is a new web-based notebook that enables data-driven, interactive data analytics, and visualization with the added bonus of supporting multiple languages, including Python*, Scala*, Spark SQL, Hive*, Shell, and Markdown. Zeppelin also provides Apache Spark* integration by default, making use of Spark’s fast in-memory, distributed, data processing engine to accomplish data science at lightning speed.

In this paper we describe how to install and configure Apache Zeppelin on the Cloudera Distribution of Apache Hadoop*, providing access to Hadoop and Spark.

Download complete PDF Apache-Zeppelin.pdf

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804