Intel® System Studio 2016 Update 4 - What's New

What's New in Intel® System Studio 2016 Update 4

1. Intel® C++ Compiler

  • Update to version 16.0.4
  • Support for Microsoft* Visual Studio* 2015 Update 3
  • Several fixes for reported problems
  • Documentation updates
  • More detailed information in the full product release notes (s. section ‘6.1 Release Notes and User Guide Location’)

2. Intel® Math Kernel Library (Intel® MKL)

  • Update to version 11.3 Update 4

New features:

  • BLAS:
    • Introduced new packed matrix multiplication interfaces (?gemm_alloc, ?gemm_pack ,?gemm_compute, ?gemm_free) for single and double precisions.
    • Improved performance over standard S/DGEMM on Intel® Xeon® processor E5-xxxx v3 and later processors.
  • LAPACK:
    • Improved LU factorization, solve, and inverse (?GETR?) performance for very small sizes (<16).
    • Improved General Eigensolver (?GEEV and ?GEEVD) performance for the case when eigenvectors are needed.
    • Added Intel® Threading Building Blocks (Intel® TBB) parallelism for ?ORGQR/?UNGQR.
  • More detailed information on the Intel® MKL release notes webpage.

3. Intel® Integrated Performance Primitives (Intel® IPP)

4. Intel® Threading Building Blocks (Intel® TBB)

  • Update to version 4.4 Update 6
  • Changes: For 64-bit platforms, quadrupled the worst-case limit on the amount of memory the Intel® TBB allocator can handle.
  • Bugs fixed: Fixed a memory corruption in the memory allocator when it meets internal limits.
  • Fixed the memory allocator on 64-bit platforms to align memory to 16 bytes by default for all allocations bigger than 8 bytes.
  • Fixed parallel_scan to provide correct result if the initial value of an accumulator is not
    the operation identity value.
  • As a workaround for crashes in the Intel® TBB library compiled with GCC 6, added -
    flifetime-dse=1 to compilation options on Linux* OS
  • More detailed information on the Intel® TBB release notes webpage
     

 5. Intel® System Debugger

  • Update to version 2016 Update 4 (internal version U1629)
  • Fix for AET decoder crash after multiple start/stop cycles
  • Several fixes for reported problems

New features for System Trace:

  • Architectural Event Traces (AET) support added
  • CSME verbosity can be set to ‘Verbose’ or ‘Normal’ in the configuration editor.
  • Eclipse* Neon (4.6) supported
  • New buttons for de-/selecting all traces sources in the Event Distribution View (EDV).
  • More detailed information in the full product release notes (s. section ‘6.1 Release Notes and User Guide Location’)

6. Intel® Graphics Performance Analyzers (Intel® GPA)

 

What's New in Intel® System Studio 2016 Update 3

1. Intel® C++ Compiler:

  • Annotated source listing
    • This feature annotates source files with compiler optimization reports. The listing format may be specified as either text or html.
  • New attribute, pragma, and compiler options for code alignment
  • Additional C++14 features supported
  • Additional C11 features supported
  • New and Changed Compiler Options

View the full release notes for more details.

2. Intel® Math Kernel Library (Intel® MKL):

  • Introducing Deep Neural Networks (DNN) primitives including convolution, normalization, activation and pooling functions intended to accelerate convolutional neural networks (CNNs) and Deep neural networks (DNNs) on Intel® architecture
  • The SP2DP interface library is removed
  • Removed pre-compiled BLACS library for MPICH v1; MPICH users can still build the BLACS library with MPICH support via Intel MKL MPI wrappers
  • Sparse BLAS:
    • Improved performance of parallel BSRMV functionality for processor supporting Intel® Advanced Vector Extensions 2 (Intel® AVX2) instruction set
  • Intel MKL PARDISO:
    • Added support for mkl_progress in Parallel Direct Sparse Solver for Clusters
  • DFT:
    • Improved performance of batched 1D FFT with large batch size on processor supporting Intel® Advanced Vector Extensions (Intel® AVX), Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction sets
  • Data Fitting:
  • Introduced 2 new storage formats for interpolation results (DF_MATRIX_STORAGE_SITES_FUNCS_DERS, DF_MATRIX_STORAGE_SITES_DERS_FUNCS)

3. Intel® Integrated Performance Primitives (Intel® IPP):

  • Added new APIs (Intel® IPP 64x functions) to support 64-bit data length in the image and signal processing domains:
    • This release provides the 64x functions for memory allocation, image addition, subtraction, multiplication, division, resizing, and filtering operations.
    • The Intel® IPP 64x functions are implemented as wrappers over Intel® IPP functions operating on 32-bit sizes by using tiling and multithreading. The 64x APIs support external threading for Intel® IPP functions, and are provided in the form of source and pre-built binaries.
  • Added integration wrappers for some image processing and computer vision functions. The wrappers provide the easy-to-use C and C++ APIs for Intel® IPP functions, and they are available as a separate download in the form of source and pre-built binaries.
  • Performance and Optimization:
    • Extended optimization for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set on Intel® Many Integrated Core Architecture (Intel® MIC Architecture). Please see the Intel® IPP Functions Optimized for Intel® AVX-512 article for more information.
    • Extended optimization for Intel® AVX-512 instruction set on Intel® Xeon® processors.
    • Extended optimization for Intel® Advanced Vector Extensions 2 (Intel® AVX2) instruction set on the 6th Generation Intel® Core™ processors. Please see the Intel® IPP Functions Optimized for Intel® AVX2 article for more information.
    • Extended optimization for Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2) instruction set on Intel® Atom™ processors.
  • Signal Processing:
    • Added the ippsIIRIIR functions that perform zero-phase digital IIR filtering.
    • Added 64-bit data length support to the ippsSortRadixAscend functions.
  • Image Processing:
    • Added the ippiScaleC functions to support image data scaling and shifting for different data types.
  • Data Compression:
    • Added the patch files for the zlib compression and decompression functions. The patches provide drop-in optimization with Intel® IPP functions, and support zlib version 1.2.5.3, 1.2.6.1, 1.2.7.3 and 1.2.8.
  • Removed the tutorial from the installation package, and its sample code and documentation are now provided online (https://software.intel.com/en-us/product-code-samples).
    • Threading Notes: Though Intel® IPP threaded libraries are not installed by default, these threaded libraries are available by the custom installation, so the code written with these libraries will still work as before. However, the multi-threaded libraries are deprecated and moving to external threading is recommended. Your feedback on this is welcome

4. Intel® Threading Building Blocks:

  • Removed a few cases of excessive user data copying in the flow graph.
  • Improved robustness of concurrent_bounded_queue::abort() in case of simultaneous push and pop operations.
  • Modified parallel_sort to not require a default constructor for values and to use iter_swap() for value swapping.
  • Added support for creating or initializing a task_arena instance that is connected to the arena currently used by the thread

Preview Features:

  • Added template class opencl_node to the flow graph API. It allows a flow graph to offload computations to OpenCL* devices.
  • Extended join_node to use type-specified message keys. It simplifies the API of the node by obtaining message keys via functions associated with the message type (instead of node ports).
  • Added static_partitioner that minimizes overhead of parallel_for and parallel_reduce for well-balanced workloads.
  • Improved template class async_node in the flow graph API to support user settable concurrency limits.
  • Class global_control supports the value of 1 for max_allowed_parallelism.
  • Added tbb::flow::async_msg, a special message type to support communications between the flow graph and external asynchronous activities.
  • async_node modified to support use with C++03 compilers

Bugs fixed:

  • Fixed a bug in dynamic memory allocation replacement for Windows* OS.
  • Fixed excessive memory consumption on Linux* OS caused by enabling zero-copy realloc.

5. Intel® System Debugger:

  • Support for Eclipse* 4.5 (Mars.2) for the trace viewer. The package is also included in the Intel® System Studio installation package for optional installation.
  • Support for debug format Dwarf4
  • SMM support for Intel® Core™ based processors debugging.
  • A new EFI script and three buttons are added for loading PEI/DXE modules easily in System Debug

6. Intel® VTune™ Amplifier for Systems

  • Support for the next generation Intel® Xeon® Processor E5 v4 Family (formerly codenamed "Broadwell-EP")
  • Detection of the OpenCL™ 2.0 Shared Virtual Memory (SVM) usage types per kernel instance
  • Driverless event-based sampling collection for uncore events enabled for the Memory Access analysis.
  • Support for the Microsoft* Visual Studio* 2015 Update 2
  • Preview features:
  • Disk Input and Output analysis that monitors utilization of the disk subsystem, CPU and processor buses, helps identify long latency of I/O requests and imbalance between I/O and compute operations
  • GPU Hotspots analysis targeted for GPU-bound applications and providing options to analyze execution of OpenCL™ kernels and Intel Media SDK tasks
  • Basic Hotspots analysis extended to support Python* applications running via the Launch Application or Attach to Process modes.

Intel® Energy Profiler for Windows:

  • Update to version v1.14.1
  • Extended collection start time information to include microseconds to better enable correlation with event trace logs.
  • Corrected reporting of Gfx P-states on Intel® 6th Generation Core™ (formerly code-named “Skylake”) platform.

7. Intel® Inspector

  • No update vs. Update 2

8. Intel® Graphics Performance Analyzers (Intel® GPA)

  • New Features for Analyzing Microsoft DirectX* Applications

Intel GPA now provides alpha-level support for DirectX* 12 application profiling. This version has limited profiling and debug capabilities and might work unstable on some workloads. You can find more details regarding the supported features below.

  •  
    • Graphics Frame Analyzer provides detailed GPU hardware metrics for Intel® graphics. For third-party GPUs, GPU Duration and graphics pipeline statistics metrics are available.
    • DirectX states, Geometry, Shader code, Static and dynamic textures, Render targets resources are available for frame-based analysis in Graphics Frame Analyzer.
    • Simple Pixel Shader, Disable Erg(s) performance experiments, Highlighting and Disable draw calls visual experiments are available in Graphics Frame Analyzer
    • Time-based GPU metrics for Intel graphics, CPU metrics, Media and Power metrics in System Analyzer.
    • System Analyzer HUD includes support for hotkeys, the same set of metrics as in System Analyzer, messages and settings.

Note: In order to capture DirectX 12 application frames, enable the Force DirectX12 injection option in the Graphics Monitor Preferences dialog box.

Note: System memory consumption is expected to be high in this release at both time of capture and during playback. Needed memory is related to workload and frame complexity and varies greatly. 8GB is minimum, 16GB is recommended, with some workloads requiring more.

  • New Features for Analyzing OpenGL/OpenGL ES* Applications
    • Enabled support for GPU hardware metrics in System Analyzer and Graphics Frame Analyzer on the 6th Generation Intel® Core™ Processors for Ubuntu* targets.
    • Several OpenGL API calls (e.g. glTexImage2D, glReadPixels, glCopyTexImage2D, etc.) are now represented as ergs in Graphics Frame Analyzer, which allows measuring GPU metrics for them and see the used input and output.
  • Resource History was implemented in Graphics Frame Analyzer. When you select a particular texture or program in the Resource viewer, colored markers appear in the bar chart, indicating the ergs where these resources are used. The color of these markers corresponds to the type of the resource: input, execution, or output.

View the full release notes for more details.

What's New in Intel® System Studio 2016 Update 2

1. Intel® C++ Compiler: 

  • Intrinsics for the Short Vector Random Number Generator (SVRNG) Library
    • The Short Vector Random Number Generator (SVRNG) library provides intrinsics for the IA-32 and Intel® 64 architecture running on supported operating systems. The SVRNG library partially covers both standard C++ and the random number generation functionality of the Intel® Math Kernel Library (Intel® MKL). Complete documentation may be found in the Intel® C++ Compiler 16.0 User and Reference Guide.
  • Intel® SIMD Data Layout Templates (Intel® SDLT)
    • Intel® SDLT is a library that helps you leverage SIMD hardware and compilers without having to be a SIMD vectorization expert.
    • Intel® SDLT can be used with any compiler supporting ISO C++11, Intel® Cilk™ Plus SIMD extensions, and #pragma ivdep
    • Intel® SIMD Data Layout Templates: 
  • New C++14 and C11 features supported 
  • And many others ... For a full list of new features please refer to the Composer Edition product release notes 

2. Intel® Math Kernel Library (Intel® MKL)

  • Introduced mkl_finalize function to facilitate usage models when Intel MKL dynamic libraries or third party dynamic libraries are linked with Intel MKL statically are loaded and unloaded explicitly
  • Introduced sorting algorithm
  • Performance improvements for BLAS, LAPACK, ScaLAPACK, Sparse BLAS 
  • Several new features for Intel MKL PARDISO
  • Added Intel® TBB threading support for all and OpenMP* for some BLAS level-1 functions.

3.  Intel® Integrated Performance Primitives (Intel® IPP) 

  • Image Processing:
    • Added the contiguous volume format (C1V) support to the following 3D data processing functions: ipprWarpAffine, ipprRemap, and ipprFilter.
    • Added the ippiFilterBorderSetMode function to support high accuracy rounding mode in ippiFilterBorder.
    • Added the ippiCopyMirrorBorder function for copying the image values by adding the mirror border pixels.
    • Added mirror border support to the following filtering functions: ippiFilterBilateral, ippiFilterBoxBorder, ippiFilterBorder, ippiFilterSobel, and ippiFilterScharr.
    • Kernel coefficients in the ippiFilterBorder image filtering functions are used in direct order, which is different from the ippiFilter functions in the previous releases.
  • Computer Vision:
    • Added 32-bit floating point input data support to the ippiSegmentWatershed function
    • Added mirror border support to the following filtering functions: ippiFilterGaussianBorder, ippiFilterLaplacianBorder, ippiMinEigenVal, ippiHarrisCorner, ippiPyramidLayerDown, and ippiPyramidLayerUp. 
  • Signal Processing:
    • Added the ippsThreshold_LTAbsVal function, which uses the vector absolute value.
    • Added the ippsIIRIIR64f functions to perform zero-phase digital IIR filtering.
  • The multi-threaded libraries only depend on the OpenMP* libraries; their dependencies on the other Intel® Compiler runtime libraries were removed 

4.  Intel® System Debugger: 

  • Unified installer now for all components of the Intel® System Debugger (for system debug, system trace and WinDbg* extension)
  • Support for Eclipse* 4.4 (Luna) integration with Intel® Trace Viewer
  • New ‘Trace Profiles’ feature for System Trace Viewer to configure the destination for streaming mode for:
    • BIOS Reserverd Trace Memory
    • Intel® Trace Hub Memory
    • Streaming to DCI-Closed Chassis Adapter (BSSB CCA)
  • Tracing to memory support (Intel® Trace Hub or system DRAM memory) for 6th Gen Intel® Core™ processors (PCH) via Intel® XDP3 JTAG probe.
  • Various stability bug fixes in Trace Viewer: Handling of decoder-instance-parameters. Crash on stop capture. Errors resulting from renaming capture files. Fix for persistent page up/down navigation. Decoding linked files containing spaces in path. Sporadic Eclipse error when switching target
  • Trace Viewer improvements: Event distribution viewer. New progress bar when stopping a trace to memory. Rules are saved now in Eclipse workspace and restored during Eclipse restart. Improved memory download with wrapping enabled.
  • Debugging support for Intel® Xeon® Processor D-1500 Product Family on the Grangeville platform.
  • System Debugger improvements: Export memory window to text file.

5. Intel® Graphics Performance Analyzer (Intel® GPA) 

  • Added support for 32-bit and 64-bit applications on Android M (6.0, Marshmallow).
  • Intel Graphics Performance Analyzers are now in a single package for Windows users.
  • Added support for OS X 10.11 El Capitan.
  • Implemented texture storage parameters modification experiment - you can now change dimensions and sample count parameters for input textures without recompiling your app.
  • Can now export textures in KTX/DDS/PNG file formats.
  • And much more…. View the full release notes for more details.

6. Intel® VTune™ Amplifier for Systems 

  • Added support for Ubuntu 14.4.3 for Intel® Energy Profiler (SoC Watch 2.1.1):
  • Support for the ITT Counters API used to observe user-defined global characteristic counters that are unknown to the VTune Amplifier
  • Support for the Load Module API used to analyze code that is loaded in an alternate location that is not accessible by the VTune Amplifier
  • Option to limit the collected data size by setting a timer to save tracing data only for the specified last seconds of the data collection added for hardware event-based sampling analysis types
  • New Arbitrary Targets group added to create command line configurations to be launched from a different host. This option is especially useful for microarchitecture analysis since it provides easy access to the hardware events available on a platform you choose for configuration.
  • Source/Assembly analysis available for OpenCL™ kernels (with no metrics data)
  • SGX Hotspots analysis support for identifying hotspots inside security enclaves for systems with the Intel Software Guard Extensions (Intel SGX) feature enabled
  • Metric-based navigation between call stack types replacing the former Data of Interest selection
  • Updated filter bar options, including the selection of a filtering metric used to calculate the contribution of the selected program unit (module, thread, and so on)
  • DRAM Bandwidth overtime and histogram data is scaled according to the maximum achievable DRAM bandwidth

7.  Intel® Inspector

  • Support for Fedora 23 and Ubuntu 15.10.

What's New in Intel® System Studio 2016 Update 1

1. Intel® C++ Compiler:

  • Enhancements for offloading to Intel® Graphics Technology
  • Added Intel® SIMD Data Layout Templates

2. Intel® Energy Profiler (SoC Watch):

  • Added support for collection of gfx-cstate and ddr-bw metrics on platforms based on Intel® Core™ architecture.

3. Intel® System Debugger:

  • New options for the debugger’s “Restart” command
  • System Trace Viewer:
    • New “Event Distribution View” feature
    • Several improvements in the Trace Viewer GUI.

What's New in Intel® System Studio 2016

  • Support for new platforms based on Airmont, Intel® Quark™, Edison and SoFIA by various Components.
  • Intel® C++ Compiler:
    • Enhanced C++11 feature support
    • Enhanced C++14 feature support
    • FreeBSD* support
  • Intel® VTune Amplifier for Systems:
    • Basic Hotspots, Locks and Waits and hardware event-based stack sampling collection supported for RT kernel and RT applications for Linux* targets
    • Hardware event-based stack sampling collection supported for kernel-mode threads
    • Support for Intel® Atom™ x7 Z8700 & x5 Z8500/X8400 processor series (Cherry Trail) including GPU analysis
    • KVM guest OS profiling based on the Linux* Perf tool
    • Analysis of applications in a virtualization environment (KVM) for Linux* kernels  (version 3.2 and higher) and QEMU (version 1.4 and higher)
    • Remote event-based sampling analysis on SoFIA  leveraging an existing sampling driver on the target
  • Intel® Threading Building Blocks (Intel® TBB):
    • Several C++11 improvements
    • Added 64-bit Android* support
  • Intel® Integrated Performance Primitives (Intel® IPP):
    • Extended optimization for Intel® Atom™ processors in the Computer Vision and Image Processing functions
    • Added optimization for Intel® Quark™ processors to the Cryptography functions
  • Intel® Math Kernel Library (Intel® MKL):
    • New ?GEMM_BATCH and (C/Z)GEMM3M_BATCH functions for performing multiple independent matrix-matrix multiply operations
    • New C-language version of the Intel® MKL reference manual
  • Intel® System Debugger:
    • Support for new platforms based on Airmont microarchitecture: Moorefield (Z35XX), Cherrytrail (Z8700), Braswell (N3700)
    • New supported targets: 6th Generation Intel® Core™ Processor Family, Intel® 100 Series Chipset.
  • For 6th Generation Intel® Core™ Processor Family :
    • Intel® Debug Extensions for WinDbg* with Intel® Processor Trace support and JTAG debug support
    • System Trace support for Intel® Trace Hub
    • Intel® Debugger for Heterogeneous Compute
    • The debugger supports 64-bit host OS systems only and requires a 64-bit Java* Runtime Environment (JRE) to operate. See System Debugger release notes for more details.
  • The installation directories structure has changed. Several components link to common directories which are shared with other Intel® Software Development Products. 

Get Help or Advice

Getting Started?
Click the Learn tab for guides and links that will quickly get you started.
Support Articles and White Papers – Solutions, Tips and Tricks

Resources
Documentation
Training Material

Support

We are looking forward to your questions and feedback. Please don't hesitate to escalate any questions you have or issues you run into. We thank you for helping us to continuously improve Intel® System Studio

Intel® Premier Support – (registration is required) - For secure, web-based, engineer-to-engineer support, visit our Intel® Premier Support web site. Intel Premier Support registration is required. Once logged in search for the product name Intel® System Studio.

Please provide feedback at any time:

For more complete information about compiler optimizations, see our Optimization Notice.