Measure Intel® IPP Function Performance


By Ying Song, Intel Corporation

The Intel® Integrated Performance Primitives (Intel® IPP) is a cross-architecture software library that provides a broad range of library functions for image processing, signal processing, data compression, cryptography, and computer vision, as well as math support routines for such processing capabilities. Intel IPP is optimized for the wide range of Intel microprocessors.

One of key advantages within Intel IPP is performance. This paper introduces a powerful performance test tool packaged inside the Intel IPP library and demonstrates how you can use this tool to test the performance of each Intel IPP function on various Intel® Pentium,  and Itanium processor systems.

What is the Intel IPP Performance Test Tool?

The Intel IPP Performance Test Tool for both Windows* and Linux* based on Intel Pentium Processors and Itanium Processors is a timing system specially designed for accomplishing Intel IPP functions performance tests on the same hardware platforms as the related Intel IPP libraries. It contains command line programs for testing the performance of each IPP function in various ways. 

You can control the course of tests and generate the results in the desirable format by using command line options. The results are saved in the .csv file for further processing with Microsoft Excel*. The course of timing is displayed on the console and can be saved in a .txt file. You can create a list of functions to be tested and set required parameters with which the function should be called during the performance test. The list of functions to be tested and their parameters can either be defined in the .ini file, or entered directly from the console.

Where to Find the Performance Test Tool?

Once you install the Intel IPP package, you can locate the performance test *.exe files in the \tools\perfsys directory.

The following table explains the corresponding .exe file that can be used for each Intel IPP domain. For example, the .exe file name containing “64” is targeted to Intel Itanium Processors and the .exe files name containing “em64t” is targeted to Intel Xeon Processor with Intel® 64, and so on.


Executable NamesDomains
ps_ippi.exeImage Processing
ps_ipps.exeSignal Processing
ps_ippdc.exeData Compression
ps_ippcv.exeComputer Vision
ps_ippvm.exeVector Math
ps_ippcc.exeColor Conversion


How to Use the Performance Test Tool?

Command Line Format and Options

As we mentioned above, this performance test tool contains command line programs. Following is the command line format:

<ps_FileName>.exe [switch_1] [switch_2]… [switch_n]

A short reference for the command line options can be displayed on the console. To invoke it, enter -? or -h in command line:
ps_ipps.exe –h

The command line options can be divided by their functionality into 6 groups. You can enter switches in an arbitrary order with at least one space between. Options like –r, -V, -o, -O can be entered several times with different file names.

1. Adjusting Console Input-APrompt for all parameters from the console
-BBatch mode
2. Managing Output-r [<file-name>]Create csv-file and write PS results
-V[<file-name>]Add PS results to csv-file
-o[<file-name>]Create txt-file and write console output
-O[<file-name>]Add console output to txt-file
-L<ERR|WARN|PARM|INFO|TRACE>Set detail level of the console output
-eEnumerate tests and exit
-g[<file-name>]Signal file is created after testing is completed
-FLWrite result as number of float operations per second. This option is supported for few functions only, because it is not required by users.
3. Selecting Functions for Testing-f < or-pattern>Run tests of functions with pattern in name, case sensitive
-f-<not-pattern>Not test functions with pattern in name, case sensitive
-f+<and-pattern>Run only tests of functions with pattern in name, case sensitive
-f=< eq-pattern>Run tests of functions with this full name, case sensitive
4. Operation with .ini Files-i[<file-name>]Read PS parameters from ini-file
-I[<file-name>]Write PS parameters to ini-file and exit
-PRead tested function names from ini-file
5. Direct Data Input-d<name>=<value>Set PS parameter value
6. Multi-Thread Timing-MT<numThr eads>Run timing in several threads simultaneously
-T<HIGH|NORMAL|LOW>Set high or normal priority for threads, the priority level may be specified by entering only the first letter
-T H[IGH]High priority. It is a default value if a multi-thread timing is not set, or if the number of threads is equal to 1.
-T N[ORMAL]Normal priority. It is a default value if the number of threads is greater than 1.
-T L[OW]Low priority. It is recommended for a multi-thread timing if functions use OpenMP* technology.


Examples of Running the Performance Test Tool

The following examples illustrate how you can use this tool in different ways to generate Intel IPP function performance data on the targeted system.

Running in the Standard Mode

This is the simplest way to get a full set of performance data on your target system. For example:

ps_ippch.exe –B –v

All IPP string functions are tested by the default timing method on standard data (-B option). The results are generated in the ps_ippch.csv file.

Testing Selected Functions

This is common usage to measure IPP functions. You can select only one specific function or several functions from one domain to get performance data. For example:

ps_ipps.exe -fFIRLMS_32f -V firlms.csv

It measures signal processing function FIRLMS_32f (-f option), and generates a .csv file named firlms.csv (-V option). For example:

ps_ippch -B -V string.csv -fFind -fCompare –fRemove

It outputs the data for the functions, find, compare, and remove from the string domain.

Retrieving Function Lists

Run the following command to find out the functions included in a domain:

ps_ipps.exe  -e –o signal_list.txt

The output file signal_list.txt (-o option) will list all IPP signal functions (-e option). 

Launching the Performance Test Tool with the .ini File

Use the .ini file is to avoid entering all the required parameters while running the console performance tool. You can also customize the required parameters in .ini file to simply the tests. For example:

ps_ipps.exe –B –I

A ps_ipps.ini file ps_ippps.ini is created after the first run (-I option).

ps_ipps.exe –i –v

Optionally, before running this command, you can modify the ps_ipps.ini file to choose the functions, the array of vectors, the length of vectors, and so on. This will test all or limited functions, reading timing procedure and all or limited function parameters values from the ps_ipps.ini file (-i option). It also generates the ps_ipps.csv (-v option) output file.

Setting General Parameters in Performance Test Tool

While running the performance tool in console command line, you are prompted to enter the parameters for timing methods, options, and other common parameters. Use a set of meaningful parameters to influence the performance of the function.

The following table gives you a detailed description to enable you to choose the appropriate parameters:

AutoAutomatic selection of the number of function calls in the performance measurement cycle default procedure).Enter “A”  from {Auto|Manual|Statistic|Jeff}
Accuracy - The number of function calls in the cycle doubles each time until the results of the last two measurements coincide with the accuracy specified by the Accuracy value. This procedure repeats three times and the best result is printed. Repeating these tests is necessary to avoid casual fluctuations generated by the system.
ManualManual User sets the necessary number of function calls in the measurement cycle.Enter “M” from {Auto|Manual|Statistic|Jeff}
NumLoops – Number of loops. The function is called NumLoops times in the measurement cycle. This is the only result printed and no other measurements are taken. This method is not recommended to test all functions with full options as the results yielded are inaccurate, or the process takes too much time.
StatisticUser specifies the total number of processor clocks for the whole procedure and sets the required number of function calls to accumulate statistics .Enter “S” from  {Auto|Manual|Statistic|Jeff}
NumCalls & TotalClocks - Each function is measured at least NumCalls times, and the measurements continue until the total number of processor clocks exceeds the specified value of TotalClocks. The result is the average taken across all measurements. This method may be used only if the system has the stamp counter.
JeffCertain variations of Auto method.Enter “J” from {Auto|Manual|Statistic|Jeff}
NumTestRepetitions - The number of function calls in the cycle doubles each time until the results of the last two measurements coincide with the accuracy specified by the Accuracy value. This procedure repeats NumTestRepetitions times and the best result is printed.
  Accuracy – see above

You can use the last three procedures to check the results yielded by the first method if you have doubt the reliability of the measurement results.

Setting Function Parameters

In the course of the timing procedure the parameters of IPP functions should be specified. Certain parameters like array addresses, array element values, and parameters that have a small influence on the performance of the functions are defined within the test and you cannot change them.

Other parameters including the vector length, image size, scale factor, and function-specific parameters may significantly affect the function performance. The test varies their values to obtain more detailed measurements of the performance. The test specifies the parameters that are variable for a given function.

Most of the tests have several variable parameters. The test measures performance for all possible combinations of parameter values. For example, if the vector length has five possible values and scale factor has three possible values for a certain function the total number of performance measurements is 5*3 = 15. The values of variable parameters are written into the .csv file in the same order as they were prompted to be entered from the console, or placed by the PS in the .ini file.

You can set the values for the specific function parameters either from the console in the .ini file. There are several methods to identify in the course of the time procedure. This paper does not include these details. If you have any questions, refer to the “Reference” section and contact us.

Interpretation of the Performance Output Data

The output data file is in .csv format, which can be viewed with a spreadsheet program such as Microsoft Excel. The output performance data contains both the elapsed execution time in microseconds, as well as in the more commonly used u nits, clocks per element (cpe).  Following is a snap shot of the first few rows from an output .csv file from running the example mentioned above. It also includes test system information such as processor, operating system, the Intel IPP library version and start time:

Click on image to view large size.

When viewing the .csv file, you may notice some columns with the headings nLps, Clocks, and per.

The nLps contains the number of repetitions in the loop, for example 16 iterations were measured. The column with Clocks is associated with the column per, implying clocks per element (cpe). The column Time specifies the time spent for the execution of this function.

The output for function ippsFIRLMS (illustrated above) displays the performance data on the 32f data type; it averages 30 clocks per cpMac (defined below). It also implies that it takes 0.82 usec to run on a loop with 16 iterations.

The following table is gives a detailed description on the units used in column “per” for all IPP domains:

Units used in column “per”Interpretations
EClock per element (cpe)
ElementPer element
cpMaccpMAC = numClocks/numOfAdd&MulPairs – this unit is used for algorithms in which the number of multiplications equals the number of additions (for example, convolution algorithm). The result is divided by the number of such pairs: multiplication + addition (mostly used in ippSP).
e_krnPnte_krnPnt = numClocks/(numOfElememts * numOfKernelPoints) – per element & filter kernel point  (mostly used in ippSP)
PxPer pixel
PxchPer pixel per channel
allnumClocks – non-standard units in ippVC tests, it means nothing (per 1)
matrix valuePer matrix value (for ippMX)
vector valueper vector value  ( for ippMX)
ValuenumClocks – non-standard units in ippMX tests, it means nothing (per 1)



Intel IPP provides a powerful performance test enabling you to simplify the process of the performance benchmark for Intel IPP functions on various Intel based microprocessors. Additionally, it provides a comprehensive data set to help you analyze performance data.



Appendix: Measuring Intel IPP Performance with ippGetCpuClocks

In some cases, you can also call Intel IPP common function ippGetCpuClocks in your application to measure Intel IPP function performance or any function performance measurement. The ippGetCpuClocks function reads the current state of the time stamp counter (TSC) register and returns its value.  Subtracting two successive polling results before and after the function call gives a very accurate measurement of elapsed time.

Following is a flow chart to illustrate how to use this Intel IPP common function:

Notes: The ippCoreGetCpuClocks() is no longer used in Intel IPP, please ippGetCpuClocks() instead.

Following is an example to show you how to use this ippGetCpuClocks to measure function performance:

// Measure FFT performance //

#include <stdio.h>
#include "math.h"
#include "ipp.h"

int main()
 //Get the version of IPP on this machine
 const IppLibraryVersion* lib;
 lib = ippsGetLibVersion();

 printf("CPU       : %s\n",lib->targetCpu);
 printf("Name      : %s\n",lib->Name);
 printf("Version   : %s\n",lib->Version);
 printf("Build date: %s\n",lib->BuildDate);

 //Measure FFT performance
 #define Nord 10  
 #define LEN  1024
 #define IPP_PI    ( 3.14159265358979323846 )
 Ipp32f  Signal[LEN],SignalFft[LEN];
 Ipp32f  Amp=1;
 Ipp32f  fsample=51.2e6;
 Ipp32f  fc=3e6;
 int NoFFT=100000; 
 int   i;
 IppsFFTSpec_R_32f* FftSpec;
 Ipp8u * pFFTInitBuf, *pFFTWorkBuf, *pFFTSpecBuf;
 int FftOrder=Nord;
 int FftFlag=IPP_FFT_DIV_FWD_BY_N;
 int SpecSize, SpecBufferSize, BufferSize;
 IppStatus Status;  

 Ipp64s start,stop;
 // Generate sine wave 
 for (i=0;i<LEN;i++)
 {  Signal[i]=Amp*cos(2*IPP_PI*i*fc/fsample);
 ippsFFTGetSize_R_32f(FftOrder, FftFlag, ippAlgHintNone, &SpecSize, &SpecBufferSize, &BufferSize);

 pFFTSpecBuf = ippsMalloc_8u(SpecSize);
 pFFTInitBuf = ippsMalloc_8u(SpecBufferSize);
 pFFTWorkBuf = ippsMalloc_8u(BufferSize);

 ippsFFTInit_R_32f(&FftSpec, FftOrder, FftFlag, ippAlgHintNone, pFFTSpecBuf, pFFTInitBuf);

 start = ippGetCpuClocks();
 for (i=0;i<NoFFT;i++)
   Status= ippsFFTFwd_RToPack_32f(Signal, SignalFft, FftSpec, pFFTWorkBuf);
 stop = ippGetCpuClocks();

 float ippFFTPerf = (float) (stop-start)/LEN/NoFFT;
 printf ("FFT 32f: ipp=%.1f\n", ippFFTPerf);

  return 0;