Visualization Development

These presentations include hands-on experiments (using materials provided or bringing your own) and expert assistance from the visualization team. They are appropriate for all domains of computational scientists and skill levels (beginner, intermediate, and advanced).

Use the OSPRay API with Data-Distributed Applications

Using OSPRay API with data-distributed applications addresses data that is either too big to fit on one node, already distributed (in situ) or both.

Will Usher, Jefferson Amstutz, and Jim Jeffers, Intel
Silvio Rizzi, Argonne National Laboratory
Valerio Pascucci, Scientific Computing and Imaging Institute, University of Utah

Presentation (PDF)

CSP Project Featuring VTK-m

VTK is a single place for the visualization community to collaborate, contribute and leverage massively threaded algorithms. Allows for a reduction in challenges when writing highly concurrent algorithms by using data parallel algorithms. Makes it easier for simulation codes to take advantage of these parallel visualization and analysis tasks on a wide range of current and next generation hardware.

Robert Maynard, Kitware Inc.
Jim Jeffers, Intel

Presentation (PDF)

Modernize VisIT with VTK and Software Defined Visualization (SDVis)

VisIt is an open source, turnkey application for data analysis and visualization of mesh-based data. Provides an infrastructure for parallel post-processing that scales from desktops to massive HPC clusters. Pushing the envelope of scientific visualization software making effective use of the Intel architecture.

Jim Jeffers, Intel
Hank Childs, University of Oregon

Presentation (PDF)

In-Situ Open Lab Recap and Insight into the Future of Visualization

Software Defined Visualization (SDVis) are open source libraries to improve the visual fidelity, performance and efficiency of prominent visualization solutions – with a particular emphasis on supporting the rapidly growing “Big Data” usage on workstations through HPC super computing clusters without the memory limitations and cost of GPU based solutions.

Jim Jeffers, Intel
Paul Navratil, Texas Advanced Computing Center

Presentation (PDF)

Accelerate Big Data Processing with High-Performance Computing (HPC) Technologies

Explore opportunities and challenges in accelerating big data middleware with new design case studies of Apache Hadoop*, Apache Spark*, Memcached, and TensorFlow*. We share the associated benefits, such as interplay between interconnects, storage systems (NVM and SSD), parallel file systems (Lustre), and multicore Intel® Xeon® platforms.

Dhabaleswar K. (DK) Panda, and Xiaoyi Lu, The Ohio State University

Presentation (PDF)

Java* for HPC—A Story of Performance

Java* and the Java virtual machine (JVM) have features that make them appealing for HPC developers, for example parallel processing via streams for quick and easy development, and runtime compiler optimizations like auto-vectorization. Also, Vector API is becoming a viable approach for Java vector programming. Prerequisites: Intel® Advanced Vector Extensions (Intel® AVX), vector programming.

Razvan Lupusoru and Vivek Deshpande, Intel

Presentation (PDF)

Accelerate Workloads Using the Acceleration Stack for Intel® Xeon® Processors with FPGAs

The acceleration stack for Intel® Xeon® processors with FPGAs is a robust collection of software, firmware, and tools. These are designed and distributed by Intel to make it easier to develop and deploy Intel® FPGAs for workload optimization in the data center.

David Munday, Intel

Presentation (PDF)

Reconfigurable Accelerator Platform with FPGAs

Emerging workloads are increasing the opportunities in the data center for FPGA algorithm, networking, and data access acceleration. This presentation looks at how FPGAs are efficiently accelerating key HPC workloads, such as genomics, machine learning, video analytics, and big data analytics.

Mike Strickland, Intel

Presentation (PDF)

A Physics-Based Approach to Unsupervised Discovery of Spatiotemporal Structures (PDF)

As part of the Big Data Center at Lawrence Berkeley National Laboratory (LBNL), our goal is unsupervised discovery of coherent structures in terabyte-scale climate data. We outline the current progress and future challenges of scaling our Python* implementation, accelerated with Numba*, to a fully distributed execution on Cori II.

Adam Rupe and James P. Crutchfield, UC Davis

Mr. Prabhat and Karthik Kashinath, National Energy Research Scientific Computing Center (NERSC) and Lawrence Berkeley National Laboratory (LBNL)

Accelerated Reverse Time Migration with Optimized I/O (PDF)

Reverse time migration is used for generating a seismic image of the underground layers of the earth. Here, we review two implementations (one using the standard approach, the second one doing an extra propagation) using random velocity boundaries and reducing I/O communications.

Philippe Thierry, Intel

Kareem Metwaly, Khaled El-Amrawi, Essam Algizawy, Mohamed Mahmoud, and Mohamed ElBasyouni, Brightskies Technologies

Accelerate Eigen Math Library for Automated Driving Workloads (PDF)

Small dense matrix-matrix multiplication (DGEMM) is a primary compute kernel in several automated driving workloads. We evaluate and improve Eigen’s performance on small DGEMMs on the Intel® Xeon® processor through LIBXSMM and Intel® Math Kernel Library (Intel® MKL) with MKL_DIRECT_CALL.

Steena Monteiro and Gaurav Bansal, Intel

An Efficient SIMD Implementation of Pseudo-Verlet Lists for Neighbour Interactions (PDF)

A pseudo-Verlet list of pair-wise interactions aims to reduce the number of spurious pair-wise distance calculations between neighbouring cells. We present a Single Instruction Multiple Data (SIMD) implementation with a speed-up of 2.24, 2.43, and 4.07 times over the scalar algorithm using Intel® AVX instruction sets.

James Willis, Matthieu Schaller, and Pedro Gonnet, Durham University

Astrophysics Simulation on Intel® Xeon Phi™ Processors (PDF)

Explore a series of results from a multiyear collaboration between specialists of the supercomputer center, RSC Group HPC solutions developer, and engineers from Intel to make AstroPhi astrophysical code for simulation of dynamics of different astrophysical objects.

Igor Chernykh and Igor Kulikov, Institute of Computational Mathematics and Mathematical Geophysics SB RAS

Automated Systolic Array Architecture Synthesis for High-Throughput Convolutional Neural Networks (CNN) Inference on FPGAs (PDF)

See how to implement convolutional neural networks (CNN) on an FPGA using a systolic array architecture, which can achieve high clock frequency under high-resource utilization. The experimental results show that the framework is able to generate the accelerator for real-life CNN models.

Jim Wu, Falcon Computing Solutions

Xuechao Wei, Falcon Computing Solutions, Inc., Center for Energy-Efficient Computing & Applications, and Peking University, China

Cody Hao Yu, Falcon Computing Solutions, Inc. and Computer Science Department, University of California, Los Angeles

Peng Zhang, Youxiang Chen, Yuxin Wang, and Han Hu, Falcon Computing Solutions, Inc.

Yun Liang, Center for Energy-Efficient Computing & Applications, and Peking University, China

Jason Cong, Falcon Computing Solutions, Inc., Computer Science Department, University of California, Los Angeles, Center for Energy-Efficient Computing & Applications, and Peking University, China

Containerize Deep Learning Workloads in Intel® Xeon® Processor E3 Cluster for AI Web Applications (PDF)

Today's web applications are data intensive and demand environments like TensorFlow* to execute the workloads. Hence, containers are best suited to provide the framework and compute resources like CPU and memory for each workload. It decouples the app environment from the running machine or host and encapsulates all dependencies in a single portable unit. Nomad is a state of the art tool for scheduling Docker* containers. A test model workload is generated from Model Zoo for each framework.

Srivignessh Pacham Sri Srinivasan, Intel® Student Ambassador

Contention-Aware Virtual Machine Scheduling via Runtime Performance Monitoring (PDF)

The ever-increasing availability of hardware resources delivered with each generation of computing systems comes with an increase in the complexity of managing such resources. This research presents findings of the possible impact on the performance of virtual machines (VM) when managed by the default Linux* scheduler as regular host processes.

Gildo Torres and Chen Liu, Clarkson University

CPU-Based Multi-Object Recognition & Training System Using Intel® AVX (PDF)

With current systems, the ability to detect multiple objects in an image requires large amounts of CPU and GPU usage. This solution includes object recognition and training on an Intel® processor-based embedded device. It primarily uses HOG and SVM for object detection and training.

Maneesh Tewani, Harish Subramony, Tejaswini Sirlapu, Suresh Nampalli, and Rey Nicolas, Intel

DeepTutor: A State-of-the-Art Conversational Intelligent Tutoring System

This presentation highlights the development of dialogue-based intelligent tutoring systems, as well as results from large-scale, after-school experiments with high school students. These experiments revealed that tutoring systems are as effective as average human tutors.

Vaile Rus, The University of Memphis

Face It: The Artificially Intelligent Hairstylist (PDF)

Face It is a mobile application that uses machine learning to determine a user’s face shape, and then combines this information with data input by the user to give the user a personalized set of hairstyles.

Pallab Paul, Intel® Student Ambassador

Galactos Computing the Anisotropic Three-Point Correlation Function for Two Billion Galaxies (PDF)

Galactos is a high-performance implementation of the three-point correlation function of galaxies in the universe, optimized for Intel® Xeon Phi™ processors. It reaches 39 percent of peak performance on a single node, and scales to the full Cori* system, achieving 9.8 PFLOPS (peak) across 9636 nodes.

Brian Friesen, Brian Austin, Deborah Bard, Jack Deslippe and Mr. Prabhat, NERSC

Pradeep Dubey, Mostofa Ali Patwary, Nadathur Satish and Narayanan Sundaram, Intel

Zachary Slepian, LBNL

Daniel J. Eisenstein, Harvard- Smithsonian Center for Astrophysics

High-Efficient Data Movement for AI Training by Implementing Mnemonic Durable Data Model (PDF)

To efficiently process data-intensive artificial intelligence modeling on high-performance computing (HPC), we focus on largely reducing data movement overhead costs across storage and memory through implementation of the durable data model (DDM) from Mnemonic, incubated at Apache*. This session expores ways to improve efficiency.

Helin Cao, Gang Wang, and Jianying Lang, Intel

Intel® Xeon Phi™ Processor Acceleration of FETI Solvers (PDF)

See how to efficiently accelerate FETI solvers using Intel® Xeon Phi™ processors and coprocessors using the Local Schur complement method that converts sparse matrices generated by a finite element method (FEM) into dense.

Lubomír Říha, Tomáš Brzobohatý, Michal Merta, Alexandros Markopoulos, Ondřej Meca, Tomáš Kozubek, and Vít Vondrák IT4Innovations, VSB – Technical University of Ostrava

Micromanage Threads for Faster MD Simulations (PDF)

This poster presents a new threading framework under development, called the Static Thread Scheduler (STS), which provides flexibility in assigning threads to tasks and eliminates overhead by being static rather than dynamic.

John Eblen and Jeremy C. Smith, The University of Tennessee, Knoxville

Message Passing Interface OpenMP* Parallelization of the Hartree-Fock Method for the Intel® Xeon Phi™ Processor (PDF)

Modern OpenMP* threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI or OpenMP algorithm.

Vladimir Mironov, Lomonosov Moscow State University
Yuri Alexeev, Argonne National Laboratory, Leadership Computing Facility

Mark S. Gordon and Kristopher Keipert, Iowa State University

Alexander Moskovsky, RSC Technologies

Michael D’mello, Intel

Optimization of Striped Smith-Waterman and Pair HMM Algorithms (PDF)

Use of Intel® Core™ processors and Intel® AVX-512 instruction sets for optimization of Striped Smith-Waterman and Pair HMM algorithms used in whole genome sequencing pipelines resulted in a significant increase in throughput compared to both classical, as well as Intel® Advanced Vector Extensions 2 (Intel® AVX2) optimized implementation.

Unsal Gokdag and Mehmet Zorlu, Seven Bridges Genomics

Optimization of the AIREBO Many-Body Potential for Intel® Xeon Phi™ Coprocessors (PDF)

The Adaptive Intermolecular Reactive Empirical Bond Order (AIREBO) potential is an example for a many-body potential, which models carbon and carbohydrates. Learn about a vectorized and optimized AIREBO implementation for the Intel® Xeon® processor and Intel® Xeon Phi™ coprocessors, and integrate it into the popular open source molecular dynamics code, LAMMPS.

Markus Höehnerbach, RWTH Aachen University

Optimization of the Boundary Element Method for Intel® Architectures (PDF)

See the results of optimization and parallelization of the boundary element method for the Intel® Xeon® and Intel® Xeon Phi™ platforms. The focus is on the SIMD vectorization and shared and distributed memory parallelization. The efficiency of the techniques is demonstrated by experiments on multiple architectures.

Michal Merta, Jan Zapletal, and Lukas Maly, IT4Innovations National Supercomputing Center

Reconfigurable Accelerator Platform with FPGAs (PDF)

Emerging workloads are increasing the opportunities in the data center for FPGA algorithm, networking, and data access acceleration. This presentation looks at how FPGAs are efficiently accelerating key HPC workloads, such as genomics, machine learning, and big data analytics.

Mike Strickland, Intel

Scalable Amber Molecular Dynamics Implementation for Intel® Architecture (PDF)

The ability to design drugs purely through simulation is the holy grail of computational drug discovery. These simulations are complex which limit the ability to gain meaningful insights in a reasonable time frame. For this, an important goal is to optimize Amber software for better scalability.

Tareq Malas and Ashraf Bhuiyan, Intel
Charles Lin, University of California, San Diego

SCIPHI - Score-P and Cube Extensions for Intel® Xeon Phi™ Processors (PDF)

To improve specific tool support for Intel Xeon Phi processors within the areas of vectorization and memory, we present extensions to the highly scalable Score-P measurement system and the Cube report explorer.

Christian Feld, Marc Schlütter, Pavel Saviankou, Michael Knobloch, and Bernd Mohr, Jülich Supercomputing Centre