About vtune,

About vtune,

Hello All,

I am big fan of Vtune, and it is the first time I use on MIC, and I have difficulty to extract result, especially Gflop/s and operational intensity. I follow the intel introduction of : http://software.intel.com/en-us/articles/best-know-method-estimating-flo....

for one thread, my command line is :

-bash-4.1$ amplxe-cl -collect-with runsa-knc -knob event-config=CPU_CLK_UNHALTED,VPU_ELEMENTS_ACTIVE,VPU_INSTRUCTIONS_EXECUTED -- ssh mic0 "export OMP_NUM_THREADS=1; export KMP_AFFINITY=compact; /users/ewartt/cyme/build/sandbox/main_mic"

float: 8[Byte],  sec 5.05472 <---------------------------------------- MY TIMER

amplxe: Using result path `/users/ewartt/cyme/build/r005runsa_knc'

amplxe: Executing actions 16 % Resolving module symbols                        

amplxe: Warning: Cannot locate file `/lib64/ld-2.14.90.so'.

amplxe: Executing actions 17 % Resolving information for `main_mic'            

amplxe: Warning: Cannot locate file `sep3_10.ko'.

amplxe: Executing actions 17 % Resolving information for `sep3_10'             

amplxe: Warning: Cannot locate file `/lib64/libc-2.14.90.so'.

amplxe: Executing actions 17 % Resolving information for `libc-2.14.90.so'     

amplxe: Warning: Cannot locate file `/bin/busybox'.

amplxe: Executing actions 18 % Resolving information for `busybox'             

amplxe: Warning: Cannot locate file `/boot/vmlinuz-2.6.38.8-g5f2543d'.

amplxe: Executing actions 18 % Resolving information for `vmlinux'             

amplxe: Warning: Cannot locate file `/lib64/libcrypto.so.10'.

amplxe: Executing actions 18 % Resolving information for `libcrypto.so.10'     

amplxe: Warning: Cannot locate file `/sbin/sshd'.

amplxe: Executing actions 19 % Resolving information for dangling locations    

amplxe: Warning: Cannot locate file `micscif.ko'.

amplxe: Executing actions 19 % Resolving information for `micscif'             

amplxe: Warning: Cannot locate file `/sep3.10/libabstract_mic_card.so'.

amplxe: Executing actions 20 % Resolving information for `libabstract_mic_card.

amplxe: Warning: Cannot locate file `/gpfs/apps/dommic/intel/composer_xe_2013.2.146/compiler/lib/mic/libiomp5.so'.

amplxe: Executing actions 50 % Generating a report                             

 

Collection and Platform Info

----------------------------

Parameter                 r005runsa_knc

------------------------  --------------------------------------------------------------------------------------------------------------

Application Command Line  ssh "mic0" "export OMP_NUM_THREADS=1; export KMP_AFFINITY=compact; /users/ewartt/cyme/build/sandbox/main_mic" 

Computer Name             dom37-mic0.login.cscs.ch

Environment Variables     

MPI Process Rank          

Operating System          Intel MIC Platform Software Stack release 2.1

Result Size               2734931

User Name                 ewartt

 

CPU

---

Parameter          r005runsa_knc

-----------------  -----------------------------

Frequency          1052000000

Logical CPU Count  240

Name               Intel(R) Xeon(R) E5 processor

 

Summary

-------

Elapsed Time:  16.548

CPU Usage:     0.811

 

Event summary

-------------

Hardware Event Type        Hardware Event Count:Self  Hardware Event Sample Count:Self  Events Per Sample

-------------------------  -------------------------  --------------------------------  -----------------

CPU_CLK_UNHALTED           32124000000                8031                              2000000

VPU_ELEMENTS_ACTIVE        11484000000                5742                              1000000

VPU_INSTRUCTIONS_EXECUTED  1070000000                 535                               1000000

amplxe: Executing actions 100 % done

                                        

and 4 threads (with compact mode):

 amplxe-cl -collect-with runsa-knc -knob event-config=CPU_CLK_UNHALTED,VPU_ELEMENTS_ACTIVE,VPU_INSTRUCTIONS_EXECUTED -- ssh mic0 "OMP_NUM_THREADS=4 KMP_AFFINITY=compact; /users/ewartt/cyme/build/sandbox/main_mic"

 

-bash-4.1$ amplxe-cl -collect-with runsa-knc -knob event-config=CPU_CLK_UNHALTED,VPU_ELEMENTS_ACTIVE,VPU_INSTRUCTIONS_EXECUTED -- ssh mic0 "export OMP_NUM_THREADS=4; export KMP_AFFINITY=compact; /users/ewartt/cyme/build/sandbox/main_mic"

float: 8[Byte],  sec 1.63602 <----------------------- MY TIMER

amplxe: Using result path `/users/ewartt/cyme/build/r004runsa_knc'

amplxe: Executing actions 16 % Resolving module symbols                        

amplxe: Warning: Cannot locate file `/bin/coi_daemon'.

amplxe: Executing actions 16 % Resolving information for `coi_daemon'          

amplxe: Warning: Cannot locate file `/lib64/libcrypto.so.10'.

amplxe: Executing actions 17 % Resolving information for `libcrypto.so.10'     

amplxe: Warning: Cannot locate file `/sep3.10/sep_mic_server3.10'.

amplxe: Executing actions 18 % Resolving information for `main_mic'            

amplxe: Warning: Cannot locate file `sep3_10.ko'.

amplxe: Executing actions 18 % Resolving information for `sep3_10'             

amplxe: Warning: Cannot locate file `/lib64/libc-2.14.90.so'.

amplxe: Executing actions 19 % Resolving information for `libc-2.14.90.so'     

amplxe: Warning: Cannot locate file `/gpfs/apps/dommic/intel/composer_xe_2013.2.146/compiler/lib/mic/libiomp5.so'.

amplxe: Executing actions 19 % Resolving information for `libiomp5.so'         

amplxe: Warning: Cannot locate file `/boot/vmlinuz-2.6.38.8-g5f2543d'.

amplxe: Executing actions 20 % Resolving information for `vmlinux'             

amplxe: Warning: Cannot locate file `/lib64/ld-2.14.90.so'.

amplxe: Executing actions 50 % Generating a report                             

 

Collection and Platform Info

----------------------------

Parameter                 r004runsa_knc

------------------------  --------------------------------------------------------------------------------------------------------------

Application Command Line  ssh "mic0" "export OMP_NUM_THREADS=4; export KMP_AFFINITY=compact; /users/ewartt/cyme/build/sandbox/main_mic" 

Computer Name             dom37-mic0.login.cscs.ch

Environment Variables     

MPI Process Rank          

Operating System          Intel MIC Platform Software Stack release 2.1

Result Size               2767027

User Name                 ewartt

 

CPU

---

Parameter          r004runsa_knc

-----------------  -----------------------------

Frequency          1052000000

Logical CPU Count  240

Name               Intel(R) Xeon(R) E5 processor

 

Summary

-------

Elapsed Time:  13.149

CPU Usage:     1.168

 

Event summary

-------------

Hardware Event Type        Hardware Event Count:Self  Hardware Event Sample Count:Self  Events Per Sample

-------------------------  -------------------------  --------------------------------  -----------------

CPU_CLK_UNHALTED           33728000000                8432                              2000000

VPU_ELEMENTS_ACTIVE        11302000000                5651                              1000000

VPU_INSTRUCTIONS_EXECUTED  1084000000                 542                               1000000

amplxe: Executing actions 100 % done                                           

 

Well my timer shows well the OMP speed up 3.09. If i calculated the time by, Time= (CPU_CLK_UNHALTED)/((#threads)*Frequency);  

Results are not agree with my timer, and FLOP/s = (VPU_ELEMENTS_ACTIVE)/Time. If I take my timer I get 6.93 GFLOP/s and 2.274 Gflop/s. I should major my results by 20% for the FMA (roof estimation of my code), so 8.31 and 2.2 Gflop/s respectively. I consider the peak oa a single core as 1.052 * 8 * 2 (double precision with FMA), 16.82 GFLop/s. So I should get 50% of the peak.

Question what is your feeling about this time estimation ? One of you have an idea to calculate the operational intensity (roofline model)

Very Best Regard 

4 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

Hi, 

I noticed that the elapsed time for your runs is 16.55sec and 13.15sec. Based on this, I am guessing that elapsed time reported by your application is not the total time taken by your application but only a portion of the runtime. However, the event counts displayed on completion of the analysis are representative of the entire application. This could be a reason why the results don't agree with your timer. You need to filter your results to include only the compute regions that you are measuring. You can filter your Intel VTune Amplifier XE results using the -filter option.  

Thank you, I forget this point. Does it exist any possibility to bound a kernel as with IACA intel product ?  I mean determine my metrics between given limits ?

Best

++t

To determine metrics between given limits, I generally filter the result so that Intel VTune Amplifier XE only displays the relevant data. 

I generally use the Intel VTune Amplifier XE GUI to visualize my results. In the GUI, you can select the function or the loop that you are interested in, right-click on it and "select filter in by selection". By doing this Intel VTune Amplifier XE will only display the counter values for the selected region. 

You can also use the -filter option in the command line interface: 

$ amplxe-cl -report hotspots -filter function=function-name -result-dir /temp/test/r001hs

I hope this was what you were looking for. 

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui