After Thread Checker, an improvement ?

After Thread Checker, an improvement ?

I installed Inspector XE and VTune XE over VS 2008 to check them out.

For Inspector XE and VTune XE :

User interface is not very user friendly lots of fudging around to set up the file to process when in standalone mode. Lots of repeats to tell it where to find things. Does not remember previous choices need to restart at root of system.

Not possible to define subset as to what is to be inspected on large projects previous versions had a configuration utility which enabled a subset of the entire application to be selected. The only response to this issue is in Inspector (but not in VTune) Modules to exclude but there is no list such as in the previous version they must be expressly typed in.

Results on using Inspector XE

With a simple Open MP for loop missing a reduction and private attributes it does not find the data races.

If I remove the data race condition, then it finds the simultaneous use of a non reentrant library function but it reports it as a data race problem.

It reports a very cryptic stack cross access condition which remains unexplained.

I found the need to redefine things on successive runs, such as the location of source files.

Thread Checker no longer runs properly I get a call support type error.

The app being Inspector XEed contained User Event API Functions - coud these pose a problem and are they supported in the new XE products?

Conclusions

Inspector seems pretty close to Thread Checker however its non detection of blatant OMP data races needs explaining.

This is after spending an afternoon on the two products I may have missed a lot of issues or may be simply reacting negatively to lots of changes. Iy would be interesting to get other user feedbacks.

I have similar comments on rh VTune XE forum.

Michle

11 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

Hi Michele,

1. User Friendly - I agree that will be better if there are functions like as "Go back" / "Go forward", "Bookmark". Currently the user can use "Zoom-in/Filter on Selection" to step into interest of result, then exit VTune Amplifier XE 2011 (save current result),then reenter the tool the last data willbe displayed.

2. Yes. The user should type "Excluded" modules in the list. It will be better if supportof reading excluded modules from a file:-)

3.Please provide a test case to reproduce the problem in Inspect XE for OpenMP.Also please use new User Sync APIs: (see detail in $(VTuneAmplifier XE 2011)\include\libittnotify.h)
__itt_event_create
__itt_event_start
__itt_event_end

Regards, Peter

Peter,

I created a simple test without user event APIs - same results, does not detect data races, reports just one cryptic stack cross access.

Here is the code (stack cross access isbold) :

// PiIntegrationNumerique.cpp : Defines the entry point for the console application.

//

#include
#include
#include
#include
#include "omp.h"

#define ITERATIONS 500000 // Temps 500000000 12 sec

double pi;

int main(int argc, char* argv[]) {

clock_t tempsDebut;
clock_t tempsFin;
printf("PiCalcul parallelise Sections - valeur iterations = %d\n\n", ITERATIONS);
tempsDebut = clock( );
double sum = 0.0;
static double step = 1.0/(double) ITERATIONS;
double x;

#pragma omp parallel //for reduction(+:sum) private (x)
for (int i=0; i< ITERATIONS; i++){
x = ((double)i+0.5)*step;
sum += 4.0/(1.0 + x*x);
}
pi = step * sum;

tempsFin = clock( );
printf("Pi integration numerique = %f\n",pi);
printf("temps calcul de Pi = %f secondes\n",(double)(tempsFin - tempsDebut)/1000.);
return 0;
}

step, s, and sum are data races - they are not reported.

This is a simple OMP parallelizationof a for loop. Inspector is just not doing its job - the openmp switch in the project configuration has been turned on. Compilation effected with Intel C++ 12.0.0.104.

Michle

Hi Michele,

Please use "#pragma omp parallel for //reduction(+:sum) private (x)" instead of "#pragma omp parallel //for reduction(+:sum) private (x)"

If you use Intel C++ Compiler 12.0, please ensure compiler option "/Qopenmp" used, and another option "/Qopenmp-report" can help you to know if OMP
parallelized loop succeeded or NOT during compiling time.

I verified this problem (thanks for your test code), Inspector XE 2011 reported the problem about "thread stack access" and you can do right-click on problematic code line to explain the problem, that is "Occurs when a thread accesses stack memory of a different thread.". Variable "x" is not protected.

If you use "#pragma omp parallel for reduction(+:sum) private (x)", Inspector XE 2011 will not report this problem.

Regards, Peter

Peter,

You are correct in that you are reproducing exactly that which I reported on, i.e. that the message is simply not as good as the write-write conflict that Thread Checker would have reported. As a reminder, Thread Checker would report read-write, write-read and write-write conflicts. Inspector reports stack cross acccess without specifying which variable is involved. If the lilne contained several variables, inspector would not tell you which one is the culprit.

Furthermore Inspector is not reporting a problem on 'sum +=' there is clearly a data access conflict on this variable.

As for removing the comment // before reduction, you are correct in that Inspector does not report a data race problem because adding the reduction and private clauses removes these problems.

I have added the switch /Qopenmp-report - the parallel region is indeed being parallelized and all reporting options are on (detect data races and detect data races on stack accesses). X is a data race on stack, sum is a data race on global memory. They are not detected as such.

It seems to me that there is a problem as Inspector is simply not doing its job properly with x, and not doing it at all with sum.

Thanks for your efforts,
Michle

It seems that Inspector XE 2011 reports data race error in different way, not like as original Inspector. I have reported this problem to development team. I will update this issue as soon as I can.

Thanks for this report.

Regards, Peter

Inspector XE reports races and stack cross accesses in exactly the same way as Inspector, and pretty much in the same way as Thread Checker 3.1.

Inspector XE (and Parallel Inspector) have multiple "levels" of analysis. By default we do not do race detection on stack variables. Doing race detection on stack variables is extremely expensive and races on stack variables tend to be considerably rarer than races on heap-allocated variables, so we make race detection on the stack an option that is turned off by default).

Instead we report "stack cross accesses" if we detect that there are any stack variables being shared. If you see no "stack cross access" reports then you know that your program is not sharing data on the stack at all and there is no reason to turn on race detection on the stack. We don't want to inundate the user with "stack cross access" reports so we only report it once per thread per remote stack accessed.

If you are seeing "stack cross access" reports and you suspect that there is something non-deterministic going on with your stack variables you should turn on race detection for stack variables. In the Inspector XE GUI you would do this by selecting the "Locate Deadlocks and Data Races" analysis type and then choosing the "extremely thorough" option. Turning the "extremely thorough" option on will probably make the analysis run take considerably longer (at least 2x and maybe as much as 10x) more time than the analysis would take with the "extremely thorough" option turned off.

It looks to me like both variables x and sum are declared on the stack (they are both declared in main's scope).

Peter,

So the end of the story is that Intel modified its reporting presentation but it does not explain sum not being reported at all.

Inspector speed seems to remain very slow when thourough reporting is requested which means that one should run inspector on a data subset and do coverage analysis to ensure that all program areas concerned with parallelism have been tested.

To resume and we can close the issue unless someone else would like to comment :

- Reporting is less clear that in the previous version.
- Inspection runs in data race thorough mode remain slow requiring a data subset and coverage analysis.
- There remains an issue as to an unreported data race.

In my opinion, when dealing with data races, dealocks and re-entrancy, I'd rather use the previous Thread Checker than the new Inspector XE version.

Michle

Frank is right! Inspector XE 2011 doesn't report data raceon stack access, as default (detecting local variables on stack has heavy workload). The tool can detect data race for global variables.

If the user wants to know data race on stack access, please use "Locate Deadlocks and Data Races", change Scope from "Normal" to "Extremely thorough", note that is high coston overhead.

Hello,I am new to intel inspector, but could you please tell me how you compiled to code - especially flags - to make it work under linux. The code itself does work but in the inspector only the assembly code is shown and not the source code. I am using the gcc compiler.$ gcc -fopenmp test.c -o testThanks in advanceYours,Emanuel

Quoting emanuelbHello, I am new to intel inspector, but could you please tell me how you compiled to code - especially flags - to make it work under linux. The code itself does work but in the inspector only the assembly code is shown and not the source code. I am using the gcc compiler.$ gcc -fopenmp test.c -o testThanks in advanceYours,Emanuel

Simply use (add) "-g" option to buildprogram.

Regards, Peter

Deixar um comentário

Faça login para adicionar um comentário. Não é membro? Inscreva-se hoje mesmo!