multithreaded application | waits and locks analysis

multithreaded application | waits and locks analysis

Hi,

I am analyzing a multithreaded application with one main thread and 100 threads-child.

I am trying to get the waitandlocks. when I am running this command line:

./amplxe-cl -collect locksandwaits -target-duration-type=medium -follow-child  --target-pid "pid"

I have this error:

amplxe: Error: Assertion '(blocked == tpss_tls_op_err_ok)' failed.[ASSERTION CONTEXT][CONTEXT END]. Please contact the technical support.
amplxe: Error: Assertion 'head[i].sp >= sp' failed.[ASSERTION CONTEXT]head[i].sp = 0xfec6c286, sp = 0x7f3e8e7f
[CONTEXT END]. Please contact the technical support.

any ideas to fix it?

Thanks,

Mourad

25 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

First at all, did you try locksandwaits analysis on latest Update 12?

Second thing is to verify - is it your program specific issue or common issue? For example, write a simple program such as do-sth-in-while-true; then test it: "amplxe-cl -collect locksandwaits -duration 10 -target-pid PID", can reproduce this problem?  If this is only your program specific issue, go https://premier.intel.com to submit issue with test case.

Another possibility is privilege issue, you need to use logon user (which launched your target process) to analyze.  

Hi Peter,

thanks for your reply.

locks ad waits analysis is working fine with any other application on my system. it's just with my specific application that I have the same error as before even after running :

# amplxe-cl -collect locksandwaits -duration 10 -target-pid 13151       
amplxe: Error: Assertion '((*desc)->l1.data.flags[tpss_ts_desc_fl_entered] == 0)' failed.[ASSERTION CONTEXT]ASSERT_TID: 13418
[tpss_ts_desc]: addr = 0x7f6fdee70dc0
 flags:
  cached    = 1
  recursive = 1
  entered   = 1
  acquired  = 0
  orphan    = 0
 magic = 48815
 link  = [prev:0x7f6fdee70d08|next:0x7f6fdee70e88]
 tid   = 13418
 [tpss_tsd]: addr = 0x7f6fdee47840
  mgr   = 0x7f7038cbd200
  link  = [prev:(nil)|next:(nil)]
  magic = 48815
  tid   = 13418
  state = 3
  desc  = 0x7f6fdee70dc0

[CONTEXT END]. Please contact the technical support.
amplxe: Using result path `/opt/intel/vtune_amplifier_xe_2013/r007lw'
amplxe: Executing actions 16 % Processing profile metrics and debug information
amplxe: Warning: Error 0x40000026 (Database interface error) -- Cannot run data transformation `Compute Concurrency'.
amplxe: Warning: Error 0x40000026 (Database interface error) -- Cannot run data transformation `Compute CPU Usage'.
amplxe: Executing actions 50 % Saving the result                               
amplxe: Warning: Skipped generation of report `summary': no valid license can be found (License file is not found. Make sure that your license file is in the correct location and readable.
Tip: Consider setting the license file location with the INTEL_LICENSE_FILE environment variable.).
amplxe: Executing actions 50 % doneThanks,

M.B

Sorry to hear about this, but your report is valuable!

My suggestion is to submit this issue with test case to Intel Premier for further investigating!

Hi Peter,

maybe its a privilege issue as well. I was runnong as root the same locks and waits command line (but no data):

amplxe-cl -collect locksandwaits -target-duration-type=medium --target-pid 8900

amplxe: Using result path `/var/home/bouache/vtune/CLI_Install/r000lw'

amplxe: Executing actions 16 % Processing profile metrics and debug information

amplxe: Warning: Error 0x40000026 (Database interface error) -- Cannot run data transformation `Compute Concurrency'.

amplxe: Executing actions 30 % Processing profile metrics and debug information

amplxe: Warning: Error 0x40000026 (Database interface error) -- Cannot run data transformation `Compute CPU Usage'.

amplxe: Executing actions 50 % Generating a report                             

Collection and Platform Info

----------------------------

Parameter                 r000lw                                   

Try:

#amplxe-cl -collect locksandwaits -duration 60 -r /tmp/r000lw --target-pid 8900

# amplxe-cl -collect locksandwaits -duration 60 -r /tmp/r000lw --target-process application
amplxe: Error: Assertion '(blocked == tpss_tls_op_err_ok)' failed.[ASSERTION CONTEXT][CONTEXT END]. Please contact the technical support.

waiting more time, I have thi output. Do you think that I can analyze it?

Thanks for your help Peter:

# amplxe-cl -collect locksandwaits -duration 60 -r /tmp/r000lw --target-process application
amplxe: Error: Assertion '(blocked == tpss_tls_op_err_ok)' failed.[ASSERTION CONTEXT][CONTEXT END]. Please contact the technical support.
amplxe: Using result path `/tmp/r000lw'
amplxe: Executing actions 16 % Processing profile metrics and debug information
amplxe: Warning: Error 0x40000026 (Database interface error) -- Cannot run data transformation `Compute Concurrency'.
amplxe: Executing actions 30 % Processing profile metrics and debug information
amplxe: Warning: Error 0x40000026 (Database interface error) -- Cannot run data transformation `Compute CPU Usage'.
amplxe: Executing actions 50 % Generating a report                             

Collection and Platform Info
----------------------------
Parameter                 r000lw                                                                                         
------------------------  -----------------------------------------------------------------------------------------------
Application Command Line                                                                                                 
Operating System          Red Hat Enterprise Linux Server release 6.3 (Santiago)
Computer Name             server1                                                                   
Result Size               1027124                                                                                        CPU
---
Parameter          r000lw                                   
-----------------  -----------------------------------------
Name               Intel(R) Xeon(R) / Core i7 980X Processor
Logical CPU Count  24                                       

Summary
-------
Elapsed Time:  0.000
amplxe: Executing actions 100 % done

cd data.0/
[data.0]# ls
systemcollector-server.sc

It seems that the issue is your program (process) specific, please ask Intel Premier support.

It looks like assertion statement which has failed.You can at least try with GDB to verify this virtual address = 0x7f6fdee70dc0 if this address contains null values.

ok thanks iliyapolak 

can you help me on that? 

Thanks,

--mb 

I can try but bear in mind that I am more skilled in Windows debugging.Please provide a dump file of failed process as prepare by GDB.

sorry for the delay to answer.

well the dump file is a huge file, I was trying to debug (by the way do you know what the tpss is doing?):

strings core | grep add-symbol-file

 add-symbol-file /opt/intel/vtune_amplifier_xe_2013/lib64/libtpsstool.so 0x7f0d00c89950 -s .data 0x7f0d0156f3a0 -s .bss 0x7f0d015a2f80

add symbol table from file "/opt/intel/vtune_amplifier_xe_2013/lib64/libtpsstool.so" at

.text_addr = 0x7f0d00c89950

.data_addr = 0x7f0d0156f3a0

.bss_addr = 0x7f0d015a2f80

(gdb) info registers

rax            0x11

rbx            0x79121

rcx            0x7f0d975f6920139696350914848

rdx            0x00

rsi            0x7f0d975f6920139696350914848

rdi            0x29d669

rbp            0x7f0d975f8d500x7f0d975f8d50

rsp            0x7f0d975f7c500x7f0d975f7c50

r8             0x00

r9             0x00

r10            0x88

r11            0x246582

r12            0x7b123

r13            0x7f0d975f7c50139696350919760

r14            0x7f0d975f8050139696350920784

r15            0x7f0d975f8450139696350921808

rip            0x7f0d00f91bb80x7f0d00f91bb8 <tpss_assert_raise_assert+760>

eflags         0x10217[ CF PF AF IF RF ]

cs             0x3351

ss             0x2b43

ds             0x00

es             0x00

fs             0x00

gs             0x6b107


Copyright (C) 2009-2013 Intel Corporation. All rights rese

 Thanks,

--mb

any idea? help!!!! 

amplxe-cl -collect locksandwaits -duration 60 -r /tmp/r000locksw --target-pid 7516amplxe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: amplxe-cl -r /tmp/r000locksw -command stop.

amplxe: Error: Assertion failed: thread_manager_impl482: (blocked == tpss_tls_op_err_ok) : BUG! : . Please contact the technical support.
amplxe: Collection detached.
amplxe: Using result path `/tmp/r000locksw'
amplxe: Executing actions 50 % Generating a report

> amplxe: Error: Assertion failed: thread_manager_impl482: (blocked == tpss_tls_op_err_ok) 

As I mentioned to you early, you need to get support from Intel Premier, it is your process specific issue. For example, developer may check TPSS log. You can try latested Update 13, before submitting the issue. By the way, TPSS will trace more things such as threading functions, I/O functions, wait/signal functions. 

Hi Mourad

Without the call stack information it is impossible to reconstruct the chain of events.There is also a problem with register context values like this : 0x7f0d975f7c500x7f0d975f7c50 or this    0x7f0d975f8d500x7f0d975f8d50 , but these are simply combined values due to maybe formatting.

I do not know what tpss is doing I can only suppose that it is processing some kind of linked list data structure.

thanks Peter.

any other way to do locks and waits analysis without to attach the process?

Thanks,

by the way, I am doing tests with U13.

thanks Peter.

--mb

I can get the call stack info through the general exploration:

amplxe-cl -collect snb-general-exploration -knob enable-stack-collection=true -target-duration-type=medium -duration 60 --target-process myapp.

how this analysis can help me to solve the assertion problem for the L&W data?

let me know if you want me to share the dump file with you?

Thanks iliyapolak

You already have posted a register context and  rip register is a crucial what I need is to collect stack trace backward from rip register.I do not know if it will help.Usually debugging on Windows is a lot of easier because of extensive support of windbg an debugging extensions.

As I was able to understand a member data.flags is equal to zero hence assertion is triggered I think that this address0x7f6fdee70dc0

could contain the value of the data.flags member.Earlier I advised you to check that memory address with GDB it will not help us to solve the problem , but it could shed some light on the culprit of the error.

> any other way to do locks and waits analysis without to attach the process?

Sure, try the following command:

amplxe-cl -collect locksandwaits -- <path to application executable and its command line>

It'll start your application. Probably that way it won't crash.

Anyway, it would be good to know if crash related to attach mode only.

Can you please also tell the result of the following command:

amplxe-cl -collect hotspots -duration 10 -target-pid <PID>  

Does it cause this issue or not?

the hotspots analysis is working well. thanks for your message.

--mb

the hotspots analysis is working well. thanks for your message.

--mb

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi