[Yocto Linux] [GPU] cpugpu-concurrency remains stuck

[Yocto Linux] [GPU] cpugpu-concurrency remains stuck

Running a custom distro, I 'm now able to run *Vtune* remotely and get results.
Moreover, I am able to launch a GPU hotspot analysis, thanks again to Pavel.
But, when launching a "GPU/CPU concurrency" task, if the process to be checked starts and stop correctly the vtune program remains stuck and never ends.
See `ps alx` following output:

1136 ?        Ss     0:00 sh -c sh -c 'echo _pvi_ 1>&2 ; /opt/intel/vtune_amplifier_2018.2.0.551022/bin$(if [ `uname -m` = x86_64 ] || [ `uname -m` = amd64 ]; then echo 64; else echo 32;fi)/amplxe-runss -V 1>&2 ;
 1146 ?        Sl     0:00 /opt/intel/vtune_amplifier_2018.2.0.551022/bin64/amplxe-runss --result-dir /tmp/amplxe-results-root/root_adse3950/tmpTTnq5j/r006cgc --option-file /tmp/root@adse3950_r006cgc.opts
 1161 ?        D      0:00 /opt/intel/vtune_amplifier_2018.2.0.551022/bin64/sep -start -experimental -uem timer=10 -out /tmp/amplxe-results-root/root_adse3950/tmpTTnq5j/r006cgc/data.0/sep7f0d5a1a4700.20180514T1028
 1182 ?        S      0:00 /opt/intel/vtune_amplifier_2018.2.0.551022/bin64/sep -stop

If I understand this well, `amplxe` has detected that the program to be checked has ended, then send a `sep stop` to terminate `sep` ( collector process ?)
and wait for it to end. But this one is locked down ("D" state: non interruptible).

Here the output extract from `ps -s`

UID   PID          PENDING          BLOCKED          IGNORED           CAUGHT STAT TTY        TIME COMMAND
   0  1136 0000000000000000 0000000000010000 0000000000000004 0000000000010002 Ss   ?          0:00 sh -c sh -c 'echo _pvi_ 1>&2 ; /opt/intel/vtune_amplifier_2018.2.0.551022/bin$(if [ `uname -m` = x86_64 ] || [ `uname -m` = amd64 ]; then echo 64; else echo 32;fi)/amplxe-runss -V 1>&2 ; echo /_pvi_  1>&2 ;' chmod 600 /tmp/root@adse3950_r006cgc.opts ; mkdir -p /tmp/tmpTTnq5j ; mkdir -p /tmp/amplxe-results-root/root_adse3950/tmpTTnq5j/r006cgc/log/target ; sh -c 'cd "/home/root/datatest" && AMPLXE_LOG_DIR=/tmp/amplxe-results-root/root_adse3950/tmpTTnq5j/r006cgc/log/target /opt/intel/vtune_amplifier_2018.2.0.551022/bin$(if [ `uname -m` = x86_64 ] || [ `uname -m` = amd64 ]; then echo 64; else echo 32;fi)/amplxe-runss --result-dir /tmp/amplxe-results-root/root_adse3950/tmpTTnq5j/r006cgc --option-file /tmp/root@adse3950_r006cgc.opts'
    0  1146 0000000000000000 fffffffe7ffbfa37 0000000000000000 00000001c1004eae Sl   ?          0:00 /opt/intel/vtune_amplifier_2018.2.0.551022/bin64/amplxe-runss --result-dir /tmp/amplxe-results-root/root_adse3950/tmpTTnq5j/r006cgc --option-file /tmp/root@adse3950_r006cgc.opts
    0  1161 0000000000000000 0000000000000000 0000000000000002 0000000180000000 Dl   ?          0:00 /opt/intel/vtune_amplifier_2018.2.0.551022/bin64/sep -start -experimental -uem timer=10 -out /tmp/amplxe-results-root/root_adse3950/tmpTTnq5j/r006cgc/data.0/sep7f0d5a1a4700.20180514T102814.847811 -ec INST_RETIRED.ANY:sa=1600000,CPU_CLK_UNHALTED.CORE:sa=1600000,CPU_CLK_UNHALTED.REF_TSC:sa=1600000,UNC_SOC_All_BW, -d 0 -uem factor=10
    0  1182 0000000000000000 0000000000000000 0000000000000002 0000000180000000 S    ?          0:00 /opt/intel/vtune_amplifier_2018.2.0.551022/bin64/sep -stop




The only way to get off this situation is to hard reset the board.

Any clue to avoid this ?

 

 

 

 

12 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi, you are very lucky!

First of all. Let's try to check that sep works correctly, on target:

/opt/intel/vtune_amplifier_2018.2.0.551022/bin64/sep -start

In 10 seconds the process will stop. Is it?

Quote:

Pavel Gerasimov (Intel) wrote:

Hi, you are very lucky!

 

I 'm running for the black belt !

Joking aside, my custom distro was designed on a hard diet. No magic on board. If some (kernel)  options, libraries or whatever is required but is not clearly mentioned on documentation, it will no be available. On this topic, is there any specification ( the more technical is the best) available that lists these requirements, regardless the distro name, and will help me to fully setup vtune ?

Quote:

First of all. Let's try to check that sep works correctly, on target:

/opt/intel/vtune_amplifier_2018.2.0.551022/bin64/sep -start

In 10 seconds the process will stop. Is it?

Yep. It did. To be clear, vtune does its job on "gpu-hostpost" Analysis. I wonder if the "concurrency" analysis do  not also remain stuck. I have to check this again.

 

"Concurrency" succeeded. But "General exploration" with "Analyse memory bandwith" checked remains stuck in the same way that "gpu/cpu concurrency" did.

Panic occurred :

 

[ 1074.824442] ioremap: invalid physical address fffff00000
[ 1074.828935] ------------[ cut here ]------------
[ 1074.832643] WARNING: CPU: 0 PID: 2524 at /kernel-source//arch/x86/mm/ioremap.c:107 __ioremap_caller+0x2af/0x2d0
[ 1074.842464] Modules linked in: vtsspp(O) sep4_1(O) socperf2_0(O) pax(O) intel_rapl x86_pkg_temp_thermal coretemp igb spi_pxa2xx_platform mei_me pwm_lpss_pci pwm_lpss i915 mei uio
[ 1074.858825] CPU: 0 PID: 2524 Comm: sep Tainted: G           O    4.14.33-intel-pk-standard #3
[ 1074.866901] task: ffff8802762b4b00 task.stack: ffffc900004c8000
[ 1074.872063] RIP: 0010:__ioremap_caller+0x2af/0x2d0
[ 1074.875960] RSP: 0018:ffffc900004cbca0 EFLAGS: 00010286
[ 1074.880343] RAX: 000000000000002c RBX: 000000fffff00000 RCX: 0000000000000001
[ 1074.886860] RDX: 0000000080000001 RSI: ffffffff81dfb3a8 RDI: 00000000ffffffff
[ 1074.893381] RBP: ffffc900004cbd00 R08: ffff880275aab218 R09: 000000000000029b
[ 1074.899900] R10: 0000000000000008 R11: ffffffff813a7af0 R12: ffff880266958000
[ 1074.906418] R13: 0000000000100000 R14: 0000000000000000 R15: ffffffffffffffff
[ 1074.912938] FS:  00007f2d67fff700(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
[ 1074.920528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1074.925492] CR2: 0000000000000002 CR3: 000000026a324000 CR4: 00000000003406f0
[ 1074.932012] Call Trace:
[ 1074.933295]  ? uncore_Write_PMU+0x1e6/0x290 [socperf2_0]
[ 1074.937778]  ioremap_nocache+0x18/0x20
[ 1074.940510]  uncore_Write_PMU+0x1e6/0x290 [socperf2_0]
[ 1074.944799]  socperf_Service_IOCTL+0x170/0x900 [socperf2_0]
[ 1074.949573]  socperf_Device_Control+0x63/0xb0 [socperf2_0]
[ 1074.954250]  do_vfs_ioctl+0x9c/0x5f0
[ 1074.956788]  ? __fget+0x79/0xa0
[ 1074.958839]  SyS_ioctl+0x7f/0x90
[ 1074.960986]  do_syscall_64+0x65/0x120
[ 1074.963623]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1074.967810] RIP: 0033:0x36e78e8267
[ 1074.970150] RSP: 002b:00007f2d67ffe648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1074.977155] RAX: ffffffffffffffda RBX: 0000000000000018 RCX: 00000036e78e8267
[ 1074.983676] RDX: 0000000000000002 RSI: 0000000040086303 RDI: 0000000000000018
[ 1074.990194] RBP: 00007f2d71c7a468 R08: 0000000000000000 R09: 00000036e796bda0
[ 1074.996712] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2d3c000b40
[ 1075.003230] R13: 00000000000000e0 R14: 00000000006625f0 R15: 0000000000000000
[ 1075.009750] Code: c7 c7 8f 70 d8 81 c6 05 c7 bf 09 01 01 e8 cb 29 08 00 0f 0b 45 31 e4 e9 e7 fd ff ff 48 8d 37 48 c7 c7 18 6f d8 81 e8 b2 29 08 00 <0f> 0b 45 31 e4 e9 ce fd ff ff 8d 30 48 c7 c7 70 6f d8 81 e8 9a
[ 1075.029468] ---[ end trace c3bbb318181a11bf ]---
[ 1075.034196] BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
[ 1075.041509] IP: uncore_Write_PMU+0x1f6/0x290 [socperf2_0]
[ 1075.046082] PGD 238018067 P4D 238018067 PUD 238019067 PMD 0
[ 1075.050954] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 1075.054363] Modules linked in: vtsspp(O) sep4_1(O) socperf2_0(O) pax(O) intel_rapl x86_pkg_temp_thermal coretemp igb spi_pxa2xx_platform mei_me pwm_lpss_pci pwm_lpss i915 mei uio
[ 1075.070721] CPU: 2 PID: 2524 Comm: sep Tainted: G        W  O    4.14.33-intel-pk-standard #3
[ 1075.078796] task: ffff8802762b4b00 task.stack: ffffc900004c8000
[ 1075.083959] RIP: 0010:uncore_Write_PMU+0x1f6/0x290 [socperf2_0]
[ 1075.089117] RSP: 0018:ffffc900004cbd20 EFLAGS: 00010246
[ 1075.093502] RAX: 0000000000000000 RBX: ffff8802669588f0 RCX: 0000000000000001
[ 1075.100021] RDX: 0000000080000001 RSI: 0000000000000090 RDI: 0000000000000000
[ 1075.106540] RBP: ffffc900004cbda0 R08: ffff880275aab218 R09: 0000000000000090
[ 1075.113059] R10: 0000000000000008 R11: ffffffff813a7af0 R12: ffff880266958000
[ 1075.119580] R13: ffff8802669588f0 R14: 0000000000000000 R15: ffffffffffffffff
[ 1075.126100] FS:  00007f2d67fff700(0000) GS:ffff88027fd00000(0000) knlGS:0000000000000000
[ 1075.133689] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1075.138654] CR2: 0000000000000090 CR3: 000000026a324000 CR4: 00000000003406e0
[ 1075.145173] Call Trace:
[ 1075.146458]  socperf_Service_IOCTL+0x170/0x900 [socperf2_0]
[ 1075.151235]  socperf_Device_Control+0x63/0xb0 [socperf2_0]
[ 1075.155916]  do_vfs_ioctl+0x9c/0x5f0
[ 1075.158455]  ? __fget+0x79/0xa0
[ 1075.160505]  SyS_ioctl+0x7f/0x90
[ 1075.162653]  do_syscall_64+0x65/0x120
[ 1075.165286]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1075.169475] RIP: 0033:0x36e78e8267
[ 1075.171816] RSP: 002b:00007f2d67ffe648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1075.178822] RAX: ffffffffffffffda RBX: 0000000000000018 RCX: 00000036e78e8267
[ 1075.185340] RDX: 0000000000000002 RSI: 0000000040086303 RDI: 0000000000000018
[ 1075.191859] RBP: 00007f2d71c7a468 R08: 0000000000000000 R09: 00000036e796bda0
[ 1075.198381] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2d3c000b40
[ 1075.204902] R13: 00000000000000e0 R14: 00000000006625f0 R15: 0000000000000000
[ 1075.211428] Code: ff 48 89 7b 20 0f 84 01 ff ff ff 4c 89 4d 88 4c 89 55 80 89 55 94 e8 3a 7c e5 e0 4c 8b 4d 88 48 89 43 58 48 8b 7b 48 4a 8d 34 08 <89> 3e 8b 36 4c 8b 55 80 48 83 3d 4a 36 00 00 00 8b 55 94 46 89
[ 1075.231177] RIP: uncore_Write_PMU+0x1f6/0x290 [socperf2_0] RSP: ffffc900004cbd20
[ 1075.237979] CR2: 0000000000000090
[ 1075.240225] ---[ end trace c3bbb318181a11c0 ]---

 

FTR, message output for GPU hotspot analysis:

[ 562.667041] vtss_cmd_start[cpu2]: Starting vtss++ collection
[  562.671928] vtss_cmd_start[cpu2]: HARDCFG: family: 6, model: 5c
[  562.677101] vtss_cmd_start[cpu2]: SYSCFG: kernel: 4.14.33
[  562.681860] vtss_transport_init[cpu2]: TRANSPORT: use UEC
[  562.693886] vtss_cea_init[cpu2]: KPTI: enabled
[  562.697774] vtss_pebs_init[cpu2]: PEBSv1: record size: 0x90, mask: 0x1
[  562.703635] vtss_cpuevents_init_pmu[cpu2]: PMU: counters: 5
[  562.708414] vtss_cpuevents_init_pmu[cpu2]: PMI: registered NMI handler
[  562.721281] probe_syscall_leave[cpu2]: register_kprobe('syscall_trace_leave') failed: -2
[  562.735308] probe_sched_process_exit[cpu2]: Unable register tracepoint: -1
[  562.751782] probe_sched_process_fork[cpu2]: Unable register tracepoint: -1
[  562.763849] probe_sched_process_fork[cpu2]: register_kretprobe('do_fork') failed: -22
[  562.809295] probe_sched_switch[cpu2]: Unable register tracepoint: -1
[  567.486029] CMCI storm detected: switching to poll mode
[  612.472361] vtss_transport_fini[cpu1]: TRANSPORT: stopped
[  612.484124] vtss_collection_fini[cpu2]: vtss++ collection stopped

and, advanced hotspot analysis:

[  786.224187] vtss_cmd_start[cpu0]: Starting vtss++ collection
[  786.229080] vtss_cmd_start[cpu0]: HARDCFG: family: 6, model: 5c
[  786.234268] vtss_cmd_start[cpu0]: SYSCFG: kernel: 4.14.33
[  786.238998] vtss_transport_init[cpu0]: TRANSPORT: use UEC
[  786.252848] vtss_cea_init[cpu0]: KPTI: enabled
[  786.256489] vtss_ipt_init[cpu0]: IPT: enabled
[  786.260076] vtss_pebs_init[cpu0]: PEBSv1: record size: 0x90, mask: 0x1
[  786.265926] vtss_cpuevents_init_pmu[cpu0]: PMU: counters: 5
[  786.270716] vtss_cpuevents_init_pmu[cpu0]: PMI: registered NMI handler
[  786.282488] probe_syscall_leave[cpu0]: register_kprobe('syscall_trace_leave') failed: -2
[  786.295943] probe_sched_process_exit[cpu0]: Unable register tracepoint: -1
[  786.312877] probe_sched_process_fork[cpu0]: Unable register tracepoint: -1
[  786.324973] probe_sched_process_fork[cpu0]: register_kretprobe('do_fork') failed: -22
[  786.365538] probe_sched_switch[cpu0]: Unable register tracepoint: -1
[  786.381337] vtss_dump_ipt[cpu3]: vtss_transport_record_write() FAIL
[  811.496148] vtss_collection_fini[cpu1]: vtss++ collection stopped
[  811.508122] vtss_transport_fini[cpu1]: TRANSPORT: '1896-1904.1' lost 1 events
[  811.515010] vtss_transport_fini[cpu1]: TRANSPORT: stopped
[  868.767154] CMCI storm subsided: switching to interrupt mode

Hope this help to figure out where the issue is.

Quote:

TurtleCrazy wrote:

Panic occurred :

Uhhuu... Dazed and confused...
There is definitely VTune Amplifier drivers on stack... Unfortunately I'm not very proficient in them. 

Quote:

TurtleCrazy wrote:

On this topic, is there any specification ( the more technical is the best) available that lists these requirements, regardless the distro name, and will help me to fully setup vtune ?

I'm afraid no... I also discussed with several team members for solve your problem... 

Quote:

TurtleCrazy wrote:

Yep. It did. To be clear, vtune does its job on "gpu-hostpost" Analysis. I wonder if the "concurrency" analysis do  not also remain stuck. I have to check this again.

It's because differences in collection settings and for concurrency analysis sep driver is not used. 

Quote:

TurtleCrazy wrote:

But, when launching a "GPU/CPU concurrency" task, if the process to be checked starts and stop correctly the vtune program remains stuck and never ends.

I think it's possible to use perf-based collection in you case. Try to uninstall sep driver with ./rmmod-sep script. Specify different path to target package. And start new analysis.

I have to make myself more clear. The panic output is linked to the "GPU/CPU" concurrency analysis. Each time this analysis get launched, the crash occurred immediately. There is not even any sort of "vtss start" or else message.

 I guess that is the cause of 'sep' process getting locked down in "D" state. 

 

 

The same crash occurred while testing "General exploration with analyze memory bandwith".

 

If I 'm not clear enough, feel free to tell me.

Best Reply

We have multiple options for collect performance data. Unfortunately sep is not stable on your custom build of Yocto. It's possible to use another data source Linux perf util which is supported in VTune Amplifier.

All micro-architecture analysis uses sep driver by default if it's loaded to system. If not - it will try to use perf.

To switch to perf-based collection you have to rmmod sep dirver from system and not allow to load it back... The easiest way to do is specify another directory for target package. (So in default search directories will not prebuilt driver and it will not load)

Or rename sepdk directory in the target package.

Quote:

Pavel Gerasimov (Intel) wrote:

All micro-architecture analysis uses sep driver by default if it's loaded to system. If not - it will try to use perf.

Ok, GPU/CPU concurrency analysis succeeded in while using `perf`. What is infortunate, as I was using perf directly before switching to vtune. :)

BTW, moving ahead, I noticed that this is the "Analyse memory Bandwith" that causes the driver to panic. I'd appreciate If you could send this feedback to the  `sep` driver developers team.

Regards,

 

 

The bandwidth profiling requires a restitched BIOS and from BIOS change NPK setting to enable. NPK setting can be set from BIOS page, Device Manager > System Setup > Debug Conf > NPK Debug Conf > North Peak Enable.

Thanks.

-Yang

Leave a Comment

Please sign in to add a comment. Not a member? Join today