Intel® Distribution for Python: pyDAAL module

By Sergey A Maydanov, Published: 08/21/2017, Last Updated: 08/21/2017

What is it?

pyDAAL is a free package which implements Python bindings to the Intel® Data Analytics Acceleration Library, or Intel® DAAL.

Why do I need it?

 

If you’re a data scientist or a machine learning specialist who cares about performance and scalability of your data analytics problem this is a must-have package.

This is your bridge from fast machine learning prototyping with Python on a laptop or workstation to large full scale deployment on multiple nodes. For convenience Intel DAAL provides bindings for Python and Java/Scala. It is also natively optimized for C/C++.

Based on highly tuned Intel MKL kernels the Intel DAAL brings machine learning performance to HPC levels so that you can do your high-performance data analytics seamlessly from any environment, MPI*-based, Spark*/Hadoop*-based, or Dask*-based.

It is equipped with the most advanced machine learning and other data analysis pipeline algorithms to implement basic and complex data analytics usages, from batch analytics to online and distributed analytics.

How is it different from Scikit-Learn*?

Scikit-learn implements a lot of machine learning algorithms, and as of now it is a richer library in terms of machine learning algorithm richness.

On the other hand the Intel DAAL is more than just a set of machine learning algorithms, as it implements and optimizes the full data analysis pipeline, from loading data and transforming/filtering it to analysis and modeling of data with classic statistical and machine learning techniques as well as advanced deep learning methods.

Intel DAAL is richer in terms of coverage of complex usage scenarios requiring online data analysis, end-to-end analytics on edge devices as well as distributed analytics on multiple nodes.

On the other end, certain Scikit-learn algorithms shipped within Intel Distribution for Python take advantage of pyDAAL bringing Scikit-learn performance to new levels.

Like Scikit-learn the Intel DAAL is an open source project with community contributions aiming to enhance and enlarge data analytics in an open fashion.

How do I get it?

If you’re a user of the Intel® Distribution for Python* it’s already there, it comes preinstalled to work out-of-the box for your machine learning and data analysis problems.

If you’re not a user of the Intel® Distribution for Python* you can also Conda-install it from Anaconda* Cloud.

If you want to build your own version of pyDAAL then you can do it from sources shipped within the Intel DAAL product. See Getting Started with Intel Data Analytics Acceleration Library (Python section) for instructions how to build your own pyDAAL on Windows*, Linux* and OS X*.

How can I use it?

There are plenty of Intel DAAL materials available online.

You can also find plenty of pyDAAL examples and samples here.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804