Measure Intel® IPP Function Performance

Introduction

By Ying Song, Intel Corporation

The Intel® Integrated Performance Primitives (Intel® IPP) is a cross-architecture software library that provides a broad range of library functions for video codecs (for example, H.264 and MPEG-4), audio, image coding (for example, JPEG, JPEG2000), image processing, signal processing, data compression, speech compression (for example, G.729, G.723, GSM AMR), cryptography, and computer vision, as well as math support routines for such processing capabilities. Intel IPP is optimized for the wide range of Intel microprocessors, including:

  • Intel® Pentium® 4 Processor
  • Intel® Pentium® 4 Processor with Hyper Threading Technology
  • Intel® Pentium® M Processor component of Intel® Centrino® Mobile Technology
  • Intel® Xeon® Processor
  • Intel® Xeon® Processor with Intel® Extended Memory 64 Technology (Intel® EM64T)
  • Intel® Core™ Quad and Intel® Core™ 2 Duo Microarchitectures
  • Intel® Core™ i7 processor
  • Intel® Atom Processor
  • Intel® Itanium® Processor


One of key advantages within Intel IPP is performance. This paper introduces a powerful performance test tool packaged inside the Intel IPP library and demonstrates how you can use this tool to test the performance of each Intel IPP function on various Intel® Pentium,  and Itanium processor systems.


What is the Intel IPP Performance Test Tool?

The Intel IPP Performance Test Tool for both Windows* and Linux* based on Intel Pentium Processors and Itanium Processors is a timing system specially designed for accomplishing Intel IPP functions performance tests on the same hardware platforms as the related Intel IPP libraries. It contains command line programs for testing the performance of each IPP function in various ways. 

You can control the course of tests and generate the results in the desirable format by using command line options. The results are saved in the .csv file for further processing with Microsoft Excel*. The course of timing is displayed on the console and can be saved in a .txt file. You can create a list of functions to be tested and set required parameters with which the function should be called during the performance test. The list of functions to be tested and their parameters can either be defined in the .ini file, or entered directly from the console.

Additionally, this performance test tool provides all performance test data in .csv format. It contains data covering all domains and cpu types support on Intel IPP. For more information, read the reference data located in the subdirectory \tools\perfsys\data.


Where to Find the Performance Test Tool?

Once you install the Intel IPP package, you can locate the performance test *.exe files in the \tools\perfsys directory.

The following table explains the corresponding .exe file that can be used for each Intel IPP domain. For example, the .exe file name containing “64” is targeted to Intel Itanium Processors and the .exe files name containing “em64t” is targeted to Intel Xeon Processor with Intel® 64, and so on.

Executable Names Domains
ps_ippac(64/em64t).exe Audio Coding
ps_ippi(64/em64t).exe Image Processing
ps_ipps(64/em64t).exe Signal Processing
ps_ippvc(64/em64t).exe Video Coding
ps_ippdc(64/em64t).exe Data Compression
ps_ippch(64/em64t).exe Strings
ps_ippcp(64/em64t).exe Cryptography
ps_ippcv(64/em64t).exe Computer Vision
ps_ippj(64/em64t).exe JPEG
ps_ippvm(64/em64t).exe Vector Math
ps_ippm(64/em64t).exe Small Matrix
ps_ippcc(64/em64t).exe Color Conversion
ps_ippsc(64/em64t).exe Speech Coding
ps_ippsr(64/em64t).exe Speech Recognition
ps_ippr(64/em64t).exe Realistic Rendering
ps_ippdi(64/em64t).exe Data Integrity


How to Use the Performance Test Tool?

Command Line Format and Options

As we mentioned above, this performance test tool contains command line programs. Following is the command line format:

<ps_FileName>.exe [switch_1] [switch_2]… [switch_n]

A short reference for the command line options can be displayed on the console. To invoke it, enter -? or -h in command line:
ps_ipps.exe –h

The command line options can be divided by their functionality into 6 groups. You can enter switches in an arbitrary order with at least one space between. Options like –r, -V, -o, -O can be entered several times with different file names.

Groups Options Descriptions
1. Adjusting Console Input -A Prompt for all parameters from the console
-B Batch mode
2. Managing Output -r [<file-name>] Create csv-file and write PS results
-V[<file-name>] Add PS results to csv-file
-o[<file-name>] Create txt-file and write console output
-O[<file-name>] Add console output to txt-file
-L<ERR|WARN|PARM|INFO|TRACE> Set detail level of the console output
-e Enumerate tests and exit
-g[<file-name>] Signal file is created after testing is completed
-FL Write result as number of float operations per second. This option is supported for few functions only, because it is not required by users.
3. Selecting Functions for Testing -f < or-pattern> Run tests of functions with pattern in name, case sensitive
-f-<not-pattern> Not test functions with pattern in name, case sensitive
-f+<and-pattern> Run only tests of functions with pattern in name, case sensitive
-f=< eq-pattern> Run tests of functions with this full name, case sensitive
4. Operation with .ini Files -i[<file-name>] Read PS parameters from ini-file
-I[<file-name>] Write PS parameters to ini-file and exit
-P Read tested function names from ini-file
5. Direct Data Input -d<name>=<value> Set PS parameter value
6. Multi-Thread Timing -MT<numThr eads> Run timing in several threads simultaneously
-T<HIGH|NORMAL|LOW> Set high or normal priority for threads, the priority level may be specified by entering only the first letter

-T H[IGH] High priority. It is a default value if a multi-thread timing is not set, or if the number of threads is equal to 1.
-T N[ORMAL] Normal priority. It is a default value if the number of threads is greater than 1.
-T L[OW] Low priority. It is recommended for a multi-thread timing if functions use OpenMP* technology.

 


Examples of Running the Performance Test Tool

The following examples illustrate how you can use this tool in different ways to generate Intel IPP function performance data on the targeted system.

Running in the Standard Mode

This is the simplest way to get a full set of performance data on your target system. For example:

ps_ippch.exe –B –v

All IPP string functions are tested by the default timing method on standard data (-B option). The results are generated in the ps_ippch.csv file.

Testing Selected Functions

This is common usage to measure IPP functions. You can select only one specific function or several functions from one domain to get performance data. For example:

ps_ipps.exe -fFIRLMS_32f -V firlms.csv

It measures signal processing function FIRLMS_32f (-f option), and generates a .csv file named firlms.csv (-V option). For example:

ps_ippch -B -V string.csv -fFind -fCompare –fRemove

It outputs the data for the functions, find, compare, and remove from the string domain.

Retrieving Function Lists

Run the following command to find out the functions included in a domain:

ps_ippvc.exe  -e –o vc_list.txt

The output file vc_list.txt (-o option) will list all IPP video coding functions (-e option). 

ps_ippvc.exe  -e -r H264.csv -f H264

The list of functions with names containing H264 (–f option) that may be tested (-e option) is displayed on the console and stored in the file H264.csv (-v option).

Launching the Performance Test Tool with the .ini File

Use the .ini file is to avoid entering all the required parameters while running the console performance tool. You can also customize the required parameters in .ini file to simply the tests. For example:

ps_ipps.exe –B –I

A ps_ipps.ini file ps_ippps.ini is created after the first run (-I option).

ps_ipps.exe –i –v

Optionally, before running this command, you can modify the ps_ipps.ini file to choose the functions, the array of vectors, the length of vectors, and so on. This will test all or limited functions, reading timing procedure and all or limited function parameters values from the ps_ipps.ini file (-i option). It also generates the ps_ipps.csv (-v option) output file.


Setting General Parameters in Performance Test Tool

While running the performance tool in console command line, you are prompted to enter the parameters for timing methods, options, and other common parameters. Use a set of meaningful parameters to influence the performance of the function.

The following table gives you a detailed description to enable you to choose the appropriate parameters:

Methods Descriptions Parameters
Auto Automatic selection of the number of function calls in the performance measurement cycle default procedure). Enter “A”  from {Auto|Manual|Statistic|Jeff}
Accuracy - The number of function calls in the cycle doubles each time until the results of the last two measurements coincide with the accuracy specified by the Accuracy value. This procedure repeats three times and the best result is printed. Repeating these tests is necessary to avoid casual fluctuations generated by the system.
Manual Manual User sets the necessary number of function calls in the measurement cycle. Enter “M” from {Auto|Manual|Statistic|Jeff}
NumLoops – Number of loops. The function is called NumLoops times in the measurement cycle. This is the only result printed and no other measurements are taken. This method is not recommended to test all functions with full options as the results yielded are inaccurate, or the process takes too much time.
Statistic User specifies the total number of processor clocks for the whole procedure and sets the required number of function calls to accumulate statistics . Enter “S” from  {Auto|Manual|Statistic|Jeff}
NumCalls & TotalClocks - Each function is measured at least NumCalls times, and the measurements continue until the total number of processor clocks exceeds the specified value of TotalClocks. The result is the average taken across all measurements. This method may be used only if the system has the stamp counter.
Jeff Certain variations of Auto method. Enter “J” from {Auto|Manual|Statistic|Jeff}
NumTestRepetitions - The number of function calls in the cycle doubles each time until the results of the last two measurements coincide with the accuracy specified by the Accuracy value. This procedure repeats NumTestRepetitions times and the best result is printed.
Accuracy – see above



You can use the last three procedures to check the results yielded by the first method if you have doubt the reliability of the measurement results.

Setting Function Parameters

In the course of the timing procedure the parameters of IPP functions should be specified. Certain parameters like array addresses, array element values, and parameters that have a small influence on the performance of the functions are defined within the test and you cannot change them.

Other parameters including the vector length, image size, scale factor, and function-specific parameters may significantly affect the function performance. The test varies their values to obtain more detailed measurements of the performance. The test specifies the parameters that are variable for a given function.

Most of the tests have several variable parameters. The test measures performance for all possible combinations of parameter values. For example, if the vector length has five possible values and scale factor has three possible values for a certain function the total number of performance measurements is 5*3 = 15. The values of variable parameters are written into the .csv file in the same order as they were prompted to be entered from the console, or placed by the PS in the .ini file.

You can set the values for the specific function parameters either from the console in the .ini file. There are several methods to identify in the course of the time procedure. This paper does not include these details. If you have any questions, refer to the “Reference” section and contact us.


Interpretation of the Performance Output Data

The output data file is in .csv format, which can be viewed with a spreadsheet program such as Microsoft Excel. The output performance data contains both the elapsed execution time in microseconds, as well as in the more commonly used u nits, clocks per element (cpe).  Following is a snap shot of the first few rows from an output .csv file from running the example mentioned above. It also includes test system information such as processor, operating system, the Intel IPP library version and start time:



Click on image to view large size.

When viewing the .csv file, you may notice some columns with the headings nLps, Clocks, and per.

The nLps contains the number of repetitions in the loop, for example 16 iterations were measured. The column with Clocks is associated with the column per, implying clocks per element (cpe). The column Time specifies the time spent for the execution of this function.

The output for function ippsFIRLMS (illustrated above) displays the performance data on the 32f data type; it averages 30 clocks per cpMac (defined below). It also implies that it takes 0.82 usec to run on a loop with 16 iterations.

The following table is gives a detailed description on the units used in column “per” for all IPP domains:

Units used in column “per” Interpretations
E Clock per element (cpe)
Element Per element
cpMac cpMAC = numClocks/numOfAdd&MulPairs – this unit is used for algorithms in which the number of multiplications equals the number of additions (for example, convolution algorithm). The result is divided by the number of such pairs: multiplication + addition (mostly used in ippSP).
e_krnPnt e_krnPnt = numClocks/(numOfElememts * numOfKernelPoints) – per element & filter kernel point  (mostly used in ippSP)
Px Per pixel
Pxch Per pixel per channel
all numClocks – non-standard units in ippVC tests, it means nothing (per 1)
matrix value Per matrix value (for ippMX)
vector value per vector value  ( for ippMX)
Value numClocks – non-standard units in ippMX tests, it means nothing (per 1)

 


Conclusion

Intel IPP provides a powerful performance test enabling you to simplify the process of the performance benchmark for Intel IPP functions on various Intel based microprocessors. Additionally, it provides a comprehensive data set to help you analyze performance data.


References

 


Appendix: Measuring Intel IPP Performance with ippGetCpuClocks

In some cases, you can also call Intel IPP common function ippGetCpuClocks in your application to measure Intel IPP function performance or any function performance measurement. The ippGetCpuClocks function reads the current state of the time stamp counter (TSC) register and returns its value.  Subtracting two successive polling results before and after the function call gives a very accurate measurement of elapsed time.

Following is a flow chart to illustrate how to use this Intel IPP common function:


Notes: The ippCoreGetCpuClocks() is no longer used in Intel IPP, please ippGetCpuClocks() instead.

Following is an example to show you how to use this ippGetCpuClocks to measure function performance:

// Measure FFT performance //

#include "stdafx.h"
#include <stdio.h>
#include "math.h"
#include "ipp.h"

int main()
{
	//Get the version of IPP on this machine

	const IppLibraryVersion* lib;
	lib = ippsGetLibVersion();

    printf("Intel® Performance Primitives 
&quo
t;);
    printf("CPU       : %s
",lib->targetCpu);
    printf("Name      : %s
",lib->Name);
    printf("Version   : %s
",lib->Version);
    printf("Build date: %s
",lib->BuildDate);

    //Measure FFT performance

    #define Nord	10  
    #define LEN		1024
    #define IPP_PI    ( 3.14159265358979323846 )
	Ipp32f  Signal[LEN],SignalFft[LEN];
	Ipp32f  Amp=1;
	Ipp32f  fsample=51.2e6;
	Ipp32f  fc=3e6;

	int NoFFT=100000; 
	int   i;
	IppsFFTSpec_R_32f* FftSpec;
	int FftOrder=Nord;
	int FftFlag=IPP_FFT_DIV_FWD_BY_N;
	//IppHintAlgorithm FftHint;
    IppStatus Status;  

	Ipp64s start,stop;

    // Generate sine wave 
	for (i=0;i<LEN;i++)
	{
	 Signal[i]=Amp*cos(2*IPP_PI*i*fc/fsample);
             }

	 ippsFFTInitAlloc_R_32f(&FftSpec,FftOrder,FftFlag,ippAlgHintNone); 

     start = ippGetCpuClocks();
     for (i=0;i<NoFFT;i++)
	 {
      		 ippsFFTFwd_RToPack_32f(Signal,SignalFft,FftSpec, NULL);
	 } 

     stop = ippGetCpuClocks();

     float ippFFTPerf = (float) (stop-start)/LEN/NoFFT;
     printf ("FFT 32f: ipp=%.1f
", ippFFTPerf);

     ippsFFTFree_R_32f(FftSpec); 
	 ippsFree(Signal);
	 ippsFree(SignalFft); 

	 return 0;
}

 


Optimization Notice in English

Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.