Sampling API Error: resume sampling collection failed.

Sampling API Error: resume sampling collection failed.

Аватар пользователя slohn

Hi,

currently I am struggeling with the error as mentioned above.

One of our programmers made the effort to include performance analysis into our framework to perform a hotspotanalysis (VTune Amplifier XE Upd. 8), what is working quite nicely. But when I change the hotspotanalysis to a user defined hw event counting like this:

$AMPLXECMD -collect-with runsa -knob event-config="$HWEVENTS" -start-paused -follow-child \\
-target-duration-type=medium -no-allow-multiple-runs -no-analyze-system \\
-data-limit=500 -slow-frames-threshold=40 -fast-frames-threshold=100 \\
-r=$OUTPUTDIR/r@@@ -- $APPLICATION

Then I recieve the error message. Reducing the command did not show any improvements. The ide is to profile specific algorithms that appear in VTune then as tasks. If an algorithm is started there is a before hook asking for some customized code, where we put:

taskId = __itt_event_create(typeName.c_str(), typeName.size());
__itt_event_start(state.event);

and if started:

__itt_event_end(state.parent_event);

just before the start and so on. Between algorithms the profiling is paused and then resumed. Means it will be called with high frequency. Is this a problem? How could I fix it?

After browsing through the web, I did not found any solution. Has somebody any idea?

Thanks,
Stefan

4 posts / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя Peter Wang (Intel)

I don't know why did you use pause mode, did you resume sampling in code?

Secondary, event start/end is only marked in timeline report. And it's for user-mode sampling (Hotspots, Concurrency Analysis, LocksAndWaits Analysis), NOT for PMU event-based sampling.

Here I gave you a simple example - matrix1.c

#include 

#include 

#include 
#include "ittnotify.h"
#define NUM 512
double a[NUM][NUM], b[NUM][NUM], c[NUM][NUM];
__itt_event event_matrix;
void multiply()

{

 unsigned int i,j,k;
    __itt_event_start(event_matrix);
    for(i=0;i
       for(j=0;j
          c[i][j] = 0.0;

          for(k=0;k
             c[i][j] += a[i][k]*b[k][j];

          }
       }

     }
     __itt_event_end(event_matrix);

}
main()

{

 clock_t start, stop;
 event_matrix = __itt_event_create ("Mark matrix event", 17);
 //start timing the matrix multiply code

 start = clock();

 multiply();

 stop = clock();
// print elapsed time

 printf("Elapsed time = %lf secondsn",

      ((double)(stop - start)) / CLOCKS_PER_SEC);
}

gcc -g matrix1.c -I/opt/intel/vtune_amplifier_xe_2011/include /opt/intel/vtune_amplifier_xe_2011/lib64/libittnotify.a -lpthread -ldl -o matrix1

# amplxe-cl -collect hotspots -- ./matrix1
Elapsed time = 0.740000 seconds
Using result path `/home/peter/problem_report/r001hs'
Executing actions 75 % Generating a report
Summary
-------

Elapsed Time: 0.761
CPU Time: 0.750
Executing actions 100 % done

Open result from amplxe-gui, note "User Task" mark in timeline report

Аватар пользователя slohn

Thanks,

I think this clarifies why it is not running.

The only point is, that using tasks gives me another possibility to group processing time, right? But if I use frames, I can do the same and user event sampling is covered as well, isn't it? So where is the difference between events and frames?

Thanks,
Stefan

Аватар пользователя Peter Wang (Intel)

Using __itt_frame is another approach when you do same (similar) works in a loop, so all performance dataare classifiedin eachiteration, please see this article.

__itt_event provides APIstomark"event star/end"in timeline report, whereyou runcritical code. Usually use "zoom-in/filter on selection", tofocus on this time range to review result.

Regards, Peter

Зарегистрируйтесь, чтобы оставить комментарий.