Compiler, Architecture, and Tools Conference 2018 Recap

Published: 04/20/2018, Last Updated: 04/20/2018

intel building haifa

The seventh Compiler, Architecture, and Tools Conference (CATC) event took place at Intel in Haifa, Israel with around 130 attendees from Europe, Asia, and the US. It hosted two keynote speakers (one from the industry and the other from academia) and included four serial sessions on compilation, tools, new architectures, and systems.

Conference Program

December 17, 2018

8:45 - 9:00 Opening
9:00 - 10:00

Keynote Talk: From Programs to Interpretable Deep Models and Back

Prof. Eran Yahav, Technion Institute of Technology, Israel

Review how deep learning of programs provides (preliminary) augmented programmer intelligence. See how to perform tasks like code completion, code summarization, and captioning. Learn about a general path-based representation of source code that can be used across programming languages and learning tasks, and discuss how this representation enables different learning algorithms. Find out about techniques for extracting interpretable representations from deep models, thus shedding light on what has been learned in various tasks.

View the Presentation
Session 1: Compilers and Languages
10:20 - 10:40

The Future of C++ Directions: Towards Heterogeneous C++

Michael Wong, Codeplay Software Ltd.*

C++20 is a major release with many new features. It  focuses particularly on heterogeneous computing. The presentation walks through the C++ directions document.

View the Presentation

10:40 - 11:00

PARALLator: Auto Parallelizer of Sequential Code

David Livshin, Dalsoft

Tools used for auto parallelization either cannot convert complex code or only help with creating a parallel version. This presentation discusses an approach that handles code written in a high-level language (such as C and Fortran) and doesn't require user assistance.

View the Presentation

11:00 - 11:20

Interleaved Loads and Stores in LLVM Compiler

Michael Zuckerman, Aurora Labs*

Review a general solution for interleave and deinterleave problems for different types of information (byte, word, dword), and different VF and stride. The solution involves generating a cyclic matrix that can be manipulated to solve the problem.

View the Presentation

11:20 - 11:40

Coarse Grain High-Level Synthesis: A Technique to Reducing MUX Complexity

Yosi Ben-Asher, University of Haifa

Learn about reducing the number of circuit MUX gates that high-level compilers synthesize. Devise a fast high-level synthesis (HLS) compiler that accelerates sequential C programs on machines that use Intel® Xeon® processors and Intel® FPGA.

View the Presentation

Session 2: Debug and Optimizations Tools
12:00 - 12:20

Full-Stack Automatic Optimization: Compiler Flags, Operating Systems, and Application Settings

Tomer Morad, Concertio

Hundreds of tunable settings are in compilers, processors, firmware, and applications. As a result, manually discovering the optimal configuration is extremely hard. This talk presents the Concertio approach to automatic, static, and dynamic tuning.

View the Presentation

12:20 - 12:40

How Top-Down Microarchitecture Analysis (TMA) Addresses Challenges in Modern Servers and Enhancements Coming in Ice Lake Processors

Ahmad Yasin, Intel

Review the top-down microarchitecture analysis (TMA) method and its handling of cycle accounting in modern out-of-order cores. This talk illustrates some performance problems that call for truly top-down-oriented metrics, presents recent challenges of modern data centers, and performance monitoring unit (PMU) enhancements to address them.

View the Presentation

12:40 - 13:00

Visualization Tool for the Programmable Macro Array (PMA) Accelerator of the Mobileye* System for Autonomous Drive

Arie Tal, Magdy Keadan, Uri Levy, Intel

The programmable macro array (PMA) enables computation density nearing that of fixed-function hardware accelerators without sacrificing programmability. This talk presents a new visualization tool to help with the programming of the PMA accelerator.

View the Presentation

14:00 - 15:00 Keynote Talk: The Mobileye* Approach to Autonomous Driving

Dr. Gaby Hayon, senior vice president of research and development, Mobileye

This talk presents key principals of the Mobileye approach to enabling human-like driving decisions safely. The talk also introduces primary concepts, the current status, and high-level future plans.

Session 3: Architecture
15:20 - 15:40

Highlighted Paper from the MICRO 18 Conference

Interthread Communication in Multithreaded, Reconfigurable Coarse-Grain Array

Dani Voitsechovy, Oron Porty, Yoav Etsion, Technion, Israel Institute of Technology

This topic introduces direct inter-thread communications for massively multithreaded RCGAs, where intermediate values are communicated directly through the compute fabric on a point-to-point basis. The talk also introduces proposed extensions to the programming model (CUDA) and execution model, as well as the hardware primitives that facilitate the communication.

15:40 - 16:00

BLARe: Bandwidth-Latency Aware Routing for Heterogeneous Network Operations Center (NoC)

Ravi Venkatesan, Manikantan R., Leon Polishuk, Intel

This talk reviews a study of heterogeneous systems that have both latency-sensitive cores and bandwidth-sensitive accelerators. The study presents a novel, topology-aware, flexible routing scheme, which trades latency for bandwidth for relevant agents connected to the fabric. Such distribution reduces traffic congestion, increases fabric utilization, and delivers 40% more bandwidth for the accelerators. Latency-sensitive cores continue to use the latency-optimized routing algorithm without any performance impact.

View the Presentation

16:00 - 16:20

RASSA: Resistive Accelerator for Approximate Long Read DNA Mapping

Roman Kaplan, Leonid Yavits, Ran Ginosar, Technion, Israel Institute of Technology

DNA read mapping is a computationally expensive bioinformatics task, required for genome assembly and consensus polishing. It finds the best-fitting location for each DNA read on a long reference sequence. A novel resistive approximate similarity search accelerator (RASSA) exploits charge distribution and parallel in-memory processing to reflect a mismatch count between DNA sequences.

16:20 - 16:40

Memristive Memory Processing Unit for Real In-Memory Processing

Rotem Ben-Hur, Ronny Ronen, Shahar Kvatinsky, Technion, Israel Institute of Technology

Data transfer between memory and processor in conventional architecture is the primary performance and energy bottleneck in modern computing systems. A new computer architecture, called a memristive memory processing unit (mMPU), enables real in-memory processing (based on a unit that can both store and process data using the same cell) and substantially reduces the necessity of moving data in computing systems.

View the Presentation

Session 4: Binary Analysis and Translation
17:00 - 17:20

Reverse the Linking Process

Joel Nider, IBM Research in Haifa

This paper introduces the concept of an unlinker: a new tool that reverts a fully linked executable to a set of object files for further manipulation. These object files are functionally equivalent to the original set used to produce the executable, and can be manipulated further before being linked into a new executable. This fully automated tool is a powerful addition to the reverse engineering tool set.

View the Presentation

17:20 - 17:40

Hardware-Assisted Call Stacks for Performance Monitoring

Vitaly Slobodskoy, Andrey Isakov, Pavel Gerasimov, Intel

In performance analysis tools, providing call stacks for hotspot functions is a natural way to expose analyzed application flow. Software methods for collecting call stacks add collection overhead and reduce precision. Intel® processors have dedicated registers for recording the code branches taken, which are called last branch records (LBR). Learn more about this mechanism. 

View the Presentation

17:40 - 18:00

Performance Characterization of Simultaneous Multithreading for an Online Document Search Application

Yanos Sazeides, Department of Computer Science University of Cyprus  

This work reports the results of a performance characterization of simultaneous multithreading (SMT) when executing an online document search application. This report finds that in many situations SMT can help decrease both average and tail latency for the application and server type used in this study.

18:00 Closing

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804