Ambiguity in analysis

Ambiguity in analysis

Hi,I'm using vTune amplifier XE, the trial version, the code profiler I mean.When I was working on the results I couldn't realize something. I was working on SPEC2006 benchmark ( cpu ), the bzip2 benchmark. When the analysis finished, in the bottom-up tab I clicked on thepart that had the longest cpu time ( it was "fallbackSort" part ), in this step when I click on the "BZ2_bzWrite<-compressStream<-spec_compress<-main<-_tmainCRTStartup<-BaseThreadInitThunk<-RtlInitializeExceptionChain<-RtlInitializeExceptionChain" ( which was appeared when I clicked on the "fallbackSort" )vTune goes to the source code, that a yellow pointer points to a line of the code, But in front of that line, column "cpu time" is blank. Is that true?( I've attached an image of the result. )Thank you very much.

18 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Hello,

It seemed that everythingwas OK before entering source view.

Is "fallbackSort" a hot function? When you went to source view of this, it should jump to highest CPU time of source line. Note that BZ2_bzCompress may consume CPU time, but its CPU time is not counted inthe line of caller unless BZ2_bzCompress is "inlined" function. You may find BZ2_bzCompress in bottom-up report, thengo source view of this, to display source line with CPU time of BZ2_bzCompress.

Please feel free to attach result directory, if you have other question (run Hotspots Analysis shortly toreduce size of result).

Regards, Peter

ok I will send. Just to be sure, I used "CC=cl /Z7". Is that enough or
I have to include more compiler options?

I've attached results of analysis.

Allegati: 

AllegatoDimensione
Download vtune_28_2__29.7z126.75 KB

Using "/Z7" compiler option is OK to generate debug info, but not in PDB file. Linker option "/DEBUG" is needed also.

Your .7z file doesn't include .amplxe file - need this to open result, also need other contents under result directory.

I've added linker option "/DEBUG" to my config file ( my complier option now is: "cl /Z7 /DEBUG" ), But it didn't change.Also I attached the complete result folder.

Allegati: 

AllegatoDimensione
Download r011hs.7z112.1 KB

Thank you for test result, it's much helpful.I wonder:1) All hot functions areflat(no subroutine) if I used Top-down Tree to view report2) In bottom-up report, "Thread/Function/Call Stack" grouping - all hot functions have no parent in "mainCRTStartup" thread; "Function/Call Stack" grouping - all hot functions have parent function. Changing grouping mode is to get different result.I guess that stack-unwinding may be incorrect, but I don't know why. First, check if you used "inlined" option to build, if so reference this articlehttp://software.intel.com/en-us/articles/display-inline-functions-in-hot... need your help to tell us all compiler options (orMakefile), binary with PDB file are appreciated.I'mcurioushot line in functionfallbackSort, can you attachblocksort.c so I can step into source view.This is a valuable report, and worth to investigate.Thank you.Regards, Peter

Hi again.We compiled the source code of 401.bzip2 from SPEC CPU2006 with Visual
studio 2010. I don't know if I can send the source code. The config
for building the binary is "windows-em64t-icl". The compiler
options are:

################################################################
# Compiler section
################################################################
CC = cl /Z7 /DEBUG
CXX = cl /Z7 /DEBUG
FC = ifort
OBJ = .obj
int=default:
EXTRA_LDFLAGS = /F512000000
fp=default:
EXTRA_LDFLAGS = /F950000000

...401.bzip2...=default:
PORTABILITY = -D_Complex= -DSPEC_CPU_P64
################################################################
# Baseline Tuning Flags
# default baseline for int and fp 2006
################################################################
default=default=default=default:
OPTIMIZE= -fast
CXXOPTIMIZE= -Qcxx_features
sw_base_ptrsize = 64-bit
sw_peak_ptrsize = Not Applicable

default=peak=default=default:
OPTIMIZE= -fast -Qauto_ilp32
sw_peak_ptrsize = 32/64-bit
PASS1_CFLAGS= -Qprof_gen
PASS2_CFLAGS= -Qprof_use
PASS1_CXXFLAGS= -Qprof_gen
PASS2_CXXFLAGS= -Qprof_use
PASS1_FFLAGS= -Qprof_gen
PASS2_FFLAGS= -Qprof_use
PASS1_LDFLAGS= -Qprof_gen
PASS2_LDFLAGS= -Qprof_use

for compiling and building the codes we used this command in windows command prompt:C:\cpu2006>runspec --action=build --tune=base--config=windows-em64t-icl.cfg 401.bzip2( which was written in installation guide of SPEC2006 ).*Another thing maybe important is that we used visual studio compiler not intel compiler.At last I have two more questions:1- is there any difference between intel compiler and visual studio?
2- should be build in release mode (optimized) or debug mode? Which oneis correct?
Please let me know if there is a way to help identify the problem.

I tried the 2013 version. But as I realized, nothing is changed in this version. As you can see in the picture below, the output of analysis is just same as previous version, and nothing is written the "CPU Time" column in front of function that I want.Would please help me?thanks a lot.

If the events don't display in the BZ2_bzCompress source code, this looks like the symptom of interprocedural optimization. Of course, that optimization may be tuned specifically for this benchmark, but you may have to turn off that optimization to get useful source event views. If you switch to asm view, you may see events from instructions coming from a melange of source lines.
I don't know whether using /Z7 in place of the usual /Zi will affect this.

Thanks Tim.
Here is tip to display inlined functions when using Intel C++ compiler with -O2, -O3, or -ipo optimization switches. Read this article

2013 beta has fixed the issue, can't display call stack info Bottom-up report, in 2011 version.

Following up Peter's suggestion, I see that ICL has an option /debug:inline-debug-info which looks worth trying in this situation. I guess it crept in gradually; I never saw its use suggested before. However, the original poster was using Microsoft CL. As we have been diverted into discussing Intel compilers, I'll mention they are restricted to working in /Z7 mode. /Zi doesn't switch to the separate .pdb mode, supposedly due to lack of Microsoft support for 3rd party compilers. This occasionally created some issues since Microsoft began preferring /Zi. Anyway, it's something of a dilemma since the Microsoft compilers don't handle debugging in identical modes to those available to Intel compilers, so one may expect to require experiments such as /Zi vs. /Z7 and /GL- /Qip-.
You should find some interesting comments on the relative functionality of /Zi and /Z7 for CL by firing up your search engine.

Thanks a lot,First, I should say that we are using the visual studio compiler ( not intel C++ complier ).Peter,In the bottom-up window, I also can see the cpu times. According tothis window, "BZ2_bzWrite" is the hot spot. I additionally want toinvestigate, which statement in this function takes the most cpu time.So I double click on "BZ2_bzWrite" line and it shows a picture that Iposted earlier.It is saying that "BZ2_bzCompress" is taking cpu time. What is next then?

I have no your binary code, and source file for BZ2_bzCompress, if you used MS VC++, I don't know if BZ2bzCompress was inlined or NOT.Itis veyeasy for youtoreview disassembly code, if inlined function was used, there is no"call xxx"in the caller.

1. If BZ2_bzCompress was inlined, all CPU time should be counted in the caller.
2. If BZ2_bzCompress was not inlined, all CPU time was count itself, please go source view of BZ2_bzCompress.

Regards, Peter

according to your
suggestion, to use /Z7 compiler option, we used this parameter in our config file.
But for changing the inline functions,

we used the compiler
option /Ob0 ( for MS visual studio ), which disables inline expansion. Now our
config file for compiling the codes is something like this:

CC = cl /Z7 /Ob0 /DEBUG

is that true?

( before that, we just
used /Z7 without /Ob0 )

Quoting foofooli

according to your suggestion, to use /Z7 compiler option, we used this parameter in our config file. But for changing the inline functions,

we used the compiler option /Ob0 ( for MS visual studio ), which disables inline expansion. Now our config file for compiling the codes is something like this:

CC = cl /Z7 /Ob0 /DEBUG

is that true?

( before that, we just used /Z7 without /Ob0 )

If you generate binary in above way, should see hot lines in hot fuunctionBZ2_bzCompress().

In your old r011hs result, I saw hot function BZ2_decompress() - CPU time 0.559, but BZ2_beCompress() was not shown. Was it possible there is limited work for compression work? <10ms?

Hi!At last we decided to use Intel C++ compiler ( to see information of inlined functions ). In the config file we used this options for compilition:CC = icl /O2 /Z7 /Qinline-debug-infoCXX = icl /O2 /Z7 /Qinline-debug-info( we used these options on Windows )By using options above, finally our analysis result has been changed. this is the result image:After finishing the analysis and clicking on the "fallbackQsort3" and "fallbackSort<-BZ2_compressBlock<-handle_compress" and after that by clicking on the "BZ2_bzWrite<-compressStream<-spec_compress<-main<-_tmainCRTStartup<-_tmainCRTStartup", we can see image above, which is pointing to the line "progress = handle_compress ( strm )", is that true? I mean, can we trust that this line of code is the hotspot?

Itis so great that you used the method described inKB toverify hot lined functions. -Peter

Accedere per lasciare un commento.