Intel® HPC Developer Conference 2017

High Productivity Languages

Stalk the Interactive Terabyte with R

Stalk the Interactive Terabyte with R

The Programming with Big Data in R (pbdR) project enables native and low-overhead use of high-performance computing (HPC) scalable libraries in R language data analysis on large systems. This includes distributed dense linear algebra, data distribution and redistribution functions for data wrangling, and high-level use of message passing interface (MPI) collectives.

George Ostrouchov, Drew Schmidt, and Michael Matheson, Oak Ridge National Laboratory

Presentation (PDF)


High-Performance Computing with Python* and Anaconda*

High-Performance Computing with Python* and Anaconda*

The Python community has been working for many years to ensure that engineers who use Python can take full advantage of their concurrent hardware, both on a single motherboard and across machines in a cluster and cloud. This talk highlights the technologies of Anaconda* that help to make this possible in practice today, including Conda*, Numba*, and dask, and show how they relate to other solutions. It also outlines the motivation for a new research effort called Plures to enable better cross-language interoperability.

Travis Oliphant, Anaconda

Presentation (PDF)


Harness the Power of High-Performance Computing for R

Harness the Power of High-Performance Computing for R

We discuss the modern computing platforms and the programming environment available for R applications, and introduce techniques and skills that are necessary to solve large-scale and complex problems with R.

Zhiyong Zhang, Stanford University

Presentation (PDF)


Accelerate Scientific Python with Optimizations from Intel

Accelerate Scientific Python with Optimizations from Intel
(26 min)

Get an overview of Intel® Distribution for Python*, which contains optimizations to core computational packages such as NumPy, SciPy, scikit-learn, and Numba. This optimization allows certain common Python workflows to run at near-native code performance on a range of Intel® processors.

Oleksandr Pavlyk, Intel

Presentation (PDF)


Parallel Computing with Python and the Numba* Compiler

Parallel Computing with Python and the Numba* Compiler
(24 min)

See how to get higher performance in Python on multicore systems using Numba (a compiler for numerical Python functions) and explore its various features, including releasing the Global Interpreter Lock, automatic multithreading, and compatibility with dask* and Apache Spark*.

Stanley Seibert, Anaconda

Presentation (PDF)


Manage Data Science at Scale

Manage Data Science at Scale
(24 min)

Predictive analytics and artificial intelligence have become critical competitive capabilities. Yet IT teams struggle to provide the support data science teams needs to succeed. Learn how leading banks, insurance, pharmaceutical companies, and others manage data science at scale.

Albert Chow, Domino Data Lab

Presentation (PDF)


Mixed-Language Debugging

Mixed-Language Debugging (Python, C, and C++ with TotalView*)

Debugging Python with C and C++ can be difficult. This talk explains how TotalView* makes it easier with a mixed language debugger and gives a view of both languages in the same platform.

Jasmit Singh, Rogue Wave Software

Presentation (PDF)


Python Applications in the NERSC Exascale

Python Applications in the NERSC Exascale Science Applications Program for Data

This talk discusses the challenges faced and early lessons learned in porting real data-intensive science codes using Python to Intel® Xeon Phi™ processors.

Rollin Thomas, National Energy Research Scientific Computing Center (NERSC)

Presentation (PDF)


HyperLoom: A Platform for Defining and Executing Scientific Pipelines in Distributed Environments

HyperLoom: A Platform for Defining and Executing Scientific Pipelines in Distributed Environments

Real-world applications often encompass end-to-end data processing pipelines composed of a large number of interconnected computational tasks of various granularity. Introducing HyperLoom: a platform for defining and executing such pipelines in distributed environments using a Python API.

Vojtech Cima, IT4Innovations National Supercomputing Center

Presentation (PDF)