These presentations include hands-on experiments (using materials provided or bringing your own) and expert assistance from the visualization team. They are appropriate for all domains of computational scientists and skill levels (beginner, intermediate, and advanced).
Use the OSPRay API with Data-Distributed Applications
Using OSPRay API with data-distributed applications addresses data that is either too big to fit on one node, already distributed (in situ) or both.
Will Usher, Jefferson Amstutz, and Jim Jeffers, Intel
Silvio Rizzi, Argonne National Laboratory
Valerio Pascucci, Scientific Computing and Imaging Institute, University of Utah
CSP Project Featuring VTK-m
VTK is a single place for the visualization community to collaborate, contribute and leverage massively threaded algorithms. Allows for a reduction in challenges when writing highly concurrent algorithms by using data parallel algorithms. Makes it easier for simulation codes to take advantage of these parallel visualization and analysis tasks on a wide range of current and next generation hardware.
Robert Maynard, Kitware Inc.
Jim Jeffers, Intel
Modernize VisIT with VTK and Software Defined Visualization (SDVis)
VisIt is an open source, turnkey application for data analysis and visualization of mesh-based data. Provides an infrastructure for parallel post-processing that scales from desktops to massive HPC clusters. Pushing the envelope of scientific visualization software making effective use of the Intel architecture.
Jim Jeffers, Intel
Hank Childs, University of Oregon
In-Situ Open Lab Recap and Insight into the Future of Visualization
Software Defined Visualization (SDVis) are open source libraries to improve the visual fidelity, performance and efficiency of prominent visualization solutions – with a particular emphasis on supporting the rapidly growing “Big Data” usage on workstations through HPC super computing clusters without the memory limitations and cost of GPU based solutions.
Jim Jeffers, Intel
Paul Navratil, Texas Advanced Computing Center
Accelerate Big Data Processing with High-Performance Computing (HPC) Technologies
Explore opportunities and challenges in accelerating big data middleware with new design case studies of Apache Hadoop*, Apache Spark*, Memcached, and TensorFlow*. We share the associated benefits, such as interplay between interconnects, storage systems (NVM and SSD), parallel file systems (Lustre), and multicore Intel® Xeon® platforms.
Dhabaleswar K. (DK) Panda, and Xiaoyi Lu, The Ohio State University
Java* for HPC—A Story of Performance
Java* and the Java virtual machine (JVM) have features that make them appealing for HPC developers, for example parallel processing via streams for quick and easy development, and runtime compiler optimizations like auto-vectorization. Also, Vector API is becoming a viable approach for Java vector programming. Prerequisites: Intel® Advanced Vector Extensions (Intel® AVX), vector programming.
Razvan Lupusoru and Vivek Deshpande, Intel
Accelerate Workloads Using the Acceleration Stack for Intel® Xeon® Processors with FPGAs
The acceleration stack for Intel® Xeon® processors with FPGAs is a robust collection of software, firmware, and tools. These are designed and distributed by Intel to make it easier to develop and deploy Intel® FPGAs for workload optimization in the data center.
David Munday, Intel
Reconfigurable Accelerator Platform with FPGAs
Emerging workloads are increasing the opportunities in the data center for FPGA algorithm, networking, and data access acceleration. This presentation looks at how FPGAs are efficiently accelerating key HPC workloads, such as genomics, machine learning, video analytics, and big data analytics.
Mike Strickland, Intel
As part of the Big Data Center at Lawrence Berkeley National Laboratory (LBNL), our goal is unsupervised discovery of coherent structures in terabyte-scale climate data. We outline the current progress and future challenges of scaling our Python* implementation, accelerated with Numba*, to a fully distributed execution on Cori II.
Adam Rupe and James P. Crutchfield, UC Davis
Mr. Prabhat and Karthik Kashinath, National Energy Research Scientific Computing Center (NERSC) and Lawrence Berkeley National Laboratory (LBNL)
Reverse time migration is used for generating a seismic image of the underground layers of the earth. Here, we review two implementations (one using the standard approach, the second one doing an extra propagation) using random velocity boundaries and reducing I/O communications.
Philippe Thierry, Intel
Kareem Metwaly, Khaled El-Amrawi, Essam Algizawy, Mohamed Mahmoud, and Mohamed ElBasyouni, Brightskies Technologies
Small dense matrix-matrix multiplication (DGEMM) is a primary compute kernel in several automated driving workloads. We evaluate and improve Eigen’s performance on small DGEMMs on the Intel® Xeon® processor through LIBXSMM and Intel® Math Kernel Library (Intel® MKL) with MKL_DIRECT_CALL.
Steena Monteiro and Gaurav Bansal, Intel
The acceleration stack for Intel® Xeon® processors with FPGAs is a robust collection of software, firmware, and tools designed and distributed by Intel. This collection is designed to make it easier to develop and deploy Intel® FPGAs for workload optimization in the data center.
David Munday, Intel
A pseudo-Verlet list of pair-wise interactions aims to reduce the number of spurious pair-wise distance calculations between neighbouring cells. We present a Single Instruction Multiple Data (SIMD) implementation with a speed-up of 2.24, 2.43, and 4.07 times over the scalar algorithm using Intel® AVX instruction sets.
James Willis, Matthieu Schaller, and Pedro Gonnet, Durham University
This talk explores how various applications have leveraged key features of Charm++ to productively achieve high performance and scalability with portability across a wide range of systems.
Phil Miller, Charmworks, Inc.
Explore a series of results from a multiyear collaboration between specialists of the supercomputer center, RSC Group HPC solutions developer, and engineers from Intel to make AstroPhi astrophysical code for simulation of dynamics of different astrophysical objects.
Igor Chernykh and Igor Kulikov, Institute of Computational Mathematics and Mathematical Geophysics SB RAS
Automated Systolic Array Architecture Synthesis for High-Throughput Convolutional Neural Networks (CNN) Inference on FPGAs (PDF)
See how to implement convolutional neural networks (CNN) on an FPGA using a systolic array architecture, which can achieve high clock frequency under high-resource utilization. The experimental results show that the framework is able to generate the accelerator for real-life CNN models.
Jim Wu, Falcon Computing Solutions
Xuechao Wei, Falcon Computing Solutions, Inc., Center for Energy-Efficient Computing & Applications, and Peking University, China
Cody Hao Yu, Falcon Computing Solutions, Inc. and Computer Science Department, University of California, Los Angeles
Peng Zhang, Youxiang Chen, Yuxin Wang, and Han Hu, Falcon Computing Solutions, Inc.
Yun Liang, Center for Energy-Efficient Computing & Applications, and Peking University, China
Jason Cong, Falcon Computing Solutions, Inc., Computer Science Department, University of California, Los Angeles, Center for Energy-Efficient Computing & Applications, and Peking University, China
Containerize Deep Learning Workloads in Intel® Xeon® Processor E3 Cluster for AI Web Applications (PDF)
Today's web applications are data intensive and demand environments like TensorFlow* to execute the workloads. Hence, containers are best suited to provide the framework and compute resources like CPU and memory for each workload. It decouples the app environment from the running machine or host and encapsulates all dependencies in a single portable unit. Nomad is a state of the art tool for scheduling Docker* containers. A test model workload is generated from Model Zoo for each framework.
Srivignessh Pacham Sri Srinivasan, Intel® Student Ambassador
The ever-increasing availability of hardware resources delivered with each generation of computing systems comes with an increase in the complexity of managing such resources. This research presents findings of the possible impact on the performance of virtual machines (VM) when managed by the default Linux* scheduler as regular host processes.
Gildo Torres and Chen Liu, Clarkson University
With current systems, the ability to detect multiple objects in an image requires large amounts of CPU and GPU usage. This solution includes object recognition and training on an Intel® processor-based embedded device. It primarily uses HOG and SVM for object detection and training.
Maneesh Tewani, Harish Subramony, Tejaswini Sirlapu, Suresh Nampalli, and Rey Nicolas, Intel
This session discusses the application of convolutional generative adversarial networks to the simulation of particle energy showers in electromagnetic calorimeters.
Sofia Vallecorsa, CERN
Andrea Zanetti, Intel
This presentation highlights the development of dialogue-based intelligent tutoring systems, as well as results from large-scale, after-school experiments with high school students. These experiments revealed that tutoring systems are as effective as average human tutors.
Vaile Rus, The University of Memphis
Aclectic Systems Inc. is developing Polymath*, a hardware- and software-integrated supercomputing appliance explicitly designed for physically based visual effects and light field-based productions. Colossus* and Enthalpy* are simulation and volumetric rendering tools developed by Aclectic.
Yahya Mirza, Jason Lefley, and Todd Anderson, Aclectic Systems Inc
We present a systematic approach to transform QMCPACK to better exploit the new hardware features of modern CPUs in portable and maintainable ways.
Amrita Mathuriya and Jeongnim Kim, Intel
Ye Luo and Anouar Benali, Argonne National Labs
Raymond C. Clay III and Luke Shulenburger, Sandia National Labs
Optimizations of Python* and NumPy by Intel improve performance over native Python and NumPy in applying evolutionary algorithms to NP hard problem instances.
Justin Shenk, Osnabrück University
Face It is a mobile application that uses machine learning to determine a user’s face shape, and then combines this information with data input by the user to give the user a personalized set of hairstyles.
Pallab Paul, Intel® Student Ambassador
The computational fluid dynamics (CFD) department at Onera has been developing computational fluid dynamics software for decades both for its own research and for industrial partners.
Nicolas Alferez, ONERA
Galactos is a high-performance implementation of the three-point correlation function of galaxies in the universe, optimized for Intel® Xeon Phi™ processors. It reaches 39 percent of peak performance on a single node, and scales to the full Cori* system, achieving 9.8 PFLOPS (peak) across 9636 nodes.
Brian Friesen, Brian Austin, Deborah Bard, Jack Deslippe and Mr. Prabhat, NERSC
Pradeep Dubey, Mostofa Ali Patwary, Nadathur Satish and Narayanan Sundaram, Intel
Zachary Slepian, LBNL
Daniel J. Eisenstein, Harvard- Smithsonian Center for Astrophysics
Global Extensible Open Power Manager (GEOPM): A Scalable Open Runtime Framework for Power Management (PDF)
This poster presents empirical results demonstrating up to 30 percent improvements in the time-to-solution of CORAL system procurement benchmarks on a cluster of Intel® Xeon Phi™ processors.
Siddhartha Jana, Asma Al-Rawi, Steve Sylvester, Christopher Cantalupo, Brad Geltz, Brandon Baker, and Jonathan Eastep, Intel
To efficiently process data-intensive artificial intelligence modeling on high-performance computing (HPC), we focus on largely reducing data movement overhead costs across storage and memory through implementation of the durable data model (DDM) from Mnemonic, incubated at Apache*. This session expores ways to improve efficiency.
Helin Cao, Gang Wang, and Jianying Lang, Intel
See how to efficiently accelerate FETI solvers using Intel® Xeon Phi™ processors and coprocessors using the Local Schur complement method that converts sparse matrices generated by a finite element method (FEM) into dense.
Lubomír Říha, Tomáš Brzobohatý, Michal Merta, Alexandros Markopoulos, Ondřej Meca, Tomáš Kozubek, and Vít Vondrák IT4Innovations, VSB – Technical University of Ostrava
This poster presents a new threading framework under development, called the Static Thread Scheduler (STS), which provides flexibility in assigning threads to tasks and eliminates overhead by being static rather than dynamic.
John Eblen and Jeremy C. Smith, The University of Tennessee, Knoxville
Message Passing Interface OpenMP* Parallelization of the Hartree-Fock Method for the Intel® Xeon Phi™ Processor (PDF)
Modern OpenMP* threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI or OpenMP algorithm.
Vladimir Mironov, Lomonosov Moscow State University
Yuri Alexeev, Argonne National Laboratory, Leadership Computing Facility
Mark S. Gordon and Kristopher Keipert, Iowa State University
Alexander Moskovsky, RSC Technologies
Michael D’mello, Intel
Use of Intel® Core™ processors and Intel® AVX-512 instruction sets for optimization of Striped Smith-Waterman and Pair HMM algorithms used in whole genome sequencing pipelines resulted in a significant increase in throughput compared to both classical, as well as Intel® Advanced Vector Extensions 2 (Intel® AVX2) optimized implementation.
Unsal Gokdag and Mehmet Zorlu, Seven Bridges Genomics
The Adaptive Intermolecular Reactive Empirical Bond Order (AIREBO) potential is an example for a many-body potential, which models carbon and carbohydrates. Learn about a vectorized and optimized AIREBO implementation for the Intel® Xeon® processor and Intel® Xeon Phi™ coprocessors, and integrate it into the popular open source molecular dynamics code, LAMMPS.
Markus Höehnerbach, RWTH Aachen University
See the results of optimization and parallelization of the boundary element method for the Intel® Xeon® and Intel® Xeon Phi™ platforms. The focus is on the SIMD vectorization and shared and distributed memory parallelization. The efficiency of the techniques is demonstrated by experiments on multiple architectures.
Michal Merta, Jan Zapletal, and Lukas Maly, IT4Innovations National Supercomputing Center
Learn about code transformations that reduce the amount of data transferred through the high-performance network and memory hierarchy and increase code vectorization.
John Dennis, National Center for Atmospheric Research
OpenFOAM* is a well-known and popular software package for solving partial differential equations (PDE) and is used by industry, researchers, and academia to solve a variety of computational fluid dynamics (CFD) problems and other physical problems.
Sonia Gupta, Prasad Pawar, Ravi Ojha, and Manoj Nambiar, Tata Consultancy Services Ltd.
Michael Klemm, Intel
Emerging workloads are increasing the opportunities in the data center for FPGA algorithm, networking, and data access acceleration. This presentation looks at how FPGAs are efficiently accelerating key HPC workloads, such as genomics, machine learning, and big data analytics.
Mike Strickland, Intel
The ability to design drugs purely through simulation is the holy grail of computational drug discovery. These simulations are complex which limit the ability to gain meaningful insights in a reasonable time frame. For this, an important goal is to optimize Amber software for better scalability.
Tareq Malas and Ashraf Bhuiyan, Intel
Charles Lin, University of California, San Diego
To improve specific tool support for Intel Xeon Phi processors within the areas of vectorization and memory, we present extensions to the highly scalable Score-P measurement system and the Cube report explorer.
Christian Feld, Marc Schlütter, Pavel Saviankou, Michael Knobloch, and Bernd Mohr, Jülich Supercomputing Centre
Find out about the latest synchronization features of Intel® Xeon® processors and how to use them directly from OpenMP* codes.
Jim Cownie, Intel
Stencil computation is an important class of algorithms used in a large variety of scientific applications. This talk describes a software framework that simplifies the tasks of defining stencil functions, generating high-performance code targeted especially for Intel® Xeon® and Intel® Xeon Phi™ processors.
Chuck Yount, Intel