Инструменты для разработки

How to I determine at runtime what vector instructions are being used when compiling with -ax

In a few weeks, we will have another generation of Intel HPC system.  We will have systems that support SSE4.2 (Nehalem, Westmere), AVX (SandyBridge, IvyBridge), and CORE-AVX2 (Haswell) optimizations.  Since the compile nodes are being upgraded to Haswell as well, I want to tell the users to specify something different than -xHost when using Intel Fortran so binaries can be backwards compatible and run on any of the clusters.  I planned to tell the users to use -xSSE4.2 -axCORE-AVX2,AVX.


My questions are:


Project migration from IPP 8.x to IPP 9.0 -- need help on replacing removed functionality

Starting with IPP 9.0, Intel has completely removed all JPEG-related functions including Huffman and color space conversions. Color space conversions I can handle on my own, but I am not in the mood to implement multi-threaded JPEG and Huffman parts from scratch. I had a working solution with UIC and IPP 8.x, now I have a broken build.

Sure, I can choose between staying on 8.x and miss on all performance improvements for new CPUs and all bugfixes / new features and upgrading to 9.0 while losing the most important part of my image viewer application -- (fast) JPEG decoding.

Finding functions which take the highest wall clock time

Hi all,

I've just started using VTune amplifier to profile my code, and I have a very basic question. How do I find functions that are taking the most wall clock time overall, and the most wall clock time per invocation? Basically, I have a Fortran application with regions of OpenMP parallel code and regions of serial code. My aim is to figure out if any of the serial regions can be speeded up by parallelizing them, for which I first need to find out what the time consuming serial portions are.

Thanks in advance,


Profiling I/O bound applications with Vtune


I am currently using the basic analysis tools in the Vtune profiler. I see that CPU bound profiling is well supported and also lock and spin waiting time is measured. How about I/O operations and syscall waiting times (i.e. wall clock times)? Are they measured in Vtune at all? If yes, which view supposed to show them? Are they only available in the advanced analysis?


A specified class is not registered in the registration database

I am having the error "A specified class is not registered in the registration database" returned by function "COMCreateObjectByGUID" aioll the time when I try to call .net function from fortran function.

Business case: I have to call .net function from Fortran on Windows platforms

Solution approach:

compiler error: undefined reference to `for_ifcore_version'

I wanted to compile a code with mpif90 (which i got from openmpi compiled with the intel compiler) but I get the error:

/opt/intel/compilers_and_libraries_2016.0.109/linux/compiler/lib/intel64/libifport.so.5: undefined reference to `for_ifcore_version'

I have the Intel Parallel Studio XE Composer Edition for Fortran and C++ (Linux) (parallel_studio_xe_2016.0.047) installed.

I started with loading the environment variables:

source /opt/intel/compilers_and_libraries/linux/bin/compilervars.sh -arch intel64 -platform linux

Installation of Intel Parallel Studio XE Composer Edition for Fortran and C++ (Linux) failed


I just tried to install Intel Parallel Studio XE Composer Edition for Fortran and C++ (Linux) (parallel_studio_xe_2016.0.047) but near the end it failed:


about Parallel Studio XE 2015 Composer/Cluster Editions

Some questions about Intel Parallel Studio XE 2015 Composer/Cluster Editions for Linux

1) Am I correct, that there is now no "only Fortran version" (ifort w/o C) for Parallel Studio XE Cluster Edition?

2)  Composer Edition includes MKL,  MKL library includes  PBLAS, which may be used in cluster.
 Intel MKL PBLAS is based on MPI. May I translate (using Intel C compiler) any free MPI library, or must use (and purchase therefore additionally to MKL) only Intel MPI ?

Strange behavior of vlist argument in UDTIO

I have yet another problem with UDTIO. In my read procedure, I want to use the vlist argument to pass the length of a temporary character string to be used internally in the procedure. However, when I read more than one DT in a single format statement, the vlist for the second and following I/O items behaves strangely.

Подписаться на Инструменты для разработки