cilkview SIGSEV

cilkview SIGSEV

I'm trying to work with cilkview on GCC 4.8 (after a long and ardouos road compiling the latter). However, fiddling with the lib paths, I get either

./cilkview ~/fib
cilkview: generating scalability data
Cilkview Scalability Analyzer V2.0.0, Build 2516
C:Tool (or Pin) caused signal 11 at PC 0xb70e76a8
cilkview error: process killed by signal -10

OR

./cilkview ~/fib
cilkview: generating scalability data
Cilkview Scalability Analyzer V2.0.0, Build 2516
/home/leo/fib: error while loading shared libraries: libcilkrts.so.5: cannot open shared object file: No such file or directory

Whole Program Statistics
1) Parallelism Profile
Work : 40,673 instructions
Span : 40,673 instructions
Burdened span : 40,673 instructions
Parallelism : 1.00
Burdened parallelism : 1.00
Number of spawns/syncs: 0
Average instructions / strand : 40,673
Strands along span : 1
Average instructions / strand on span : 40,673
Total number of atomic instructions : 0
Frame count : 0

2) Speedup Estimate
2 processors: 0.74 - 1.00
4 processors: 0.66 - 1.00
8 processors: 0.62 - 1.00
16 processors: 0.60 - 1.00
32 processors: 0.60 - 1.00
64 processors: 0.59 - 1.00
128 processors: 0.59 - 1.00
256 processors: 0.59 - 1.00

Cilk Parallel Region(s) Statistics
1) Parallelism Profile
Work : 0 instructions
Span : 0 instructions
Burdened span : 0 instructions
Parallelism : -nan
Burdened parallelism : -nan
Number of spawns/syncs: 0
Average instructions / strand : 0
Strands along span : 0
cilkview error: process killed by signal -3

Any ideas what may be happening? What are the chances of us getting the sources of cilkscreen, cilkview and pin... it would be so much easier to do this! Im on 32.bits. My uname -a is Linux anne 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:45:18 UTC 2012 i686 i686 i386 GNU/Linux

***SHAMELESS PLUG***: I still have the IRC channel at irc.freenode.net#cilkplus, do visit from time to time, just to talk!

12 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

I did a quick test to see if I could reproduce your test case.  No luck so far, although I only tried a 64-bit version of gcc...   

I think a bit more information might be helpful for tracking down what is going on.   
Just to double-check, what is the difference in the path variables you are setting between cases 1 and 2?    In the second case, the errors seem to suggest that cilkview is working, but the Cilk runtime library is not in the LD_LIBRARY_PATH.    I have sometimes seen a "killed by signal -3" error when I run cilkview on programs that never end up starting the Cilk runtime.

Also, does ./fib run correctly using the same setting of LD_LIBRARY_PATH?  What value of "n" is being calculated?

Jim, hi, thanks. Yes, all problems I'm getting are always on 32-bits... I'm calculating fib for 40, which gives me good parallelism in my 64-bit machines (measured with /bin/time). I have no errors running gcc-cilk (compiling). I have a "cilk script" exporting LD... and PATH and then "source cilk.sh" to set up the environment, so they are both (gcc-cilk and cilkview) supposedly running on the same env variables. 

Re. More information: sure, whatever you need. I do myself think that there is a problem with the LD_ env variable, but I haven't been able to debug it yet. Will keep you posted.

Thanks for this.

When you get the error "cilkview error: process killed by signal -10" is there a .log file around?  PIN creates them when it errors out.  It may be named pin.log or something like that.

    - Barry

Hi, Barry. No, there's not pin.log when it gives me the "process killed by signal -10" error. However, if for some reason it tells me the old "OS prevents parent injection mode", it does generate the pin.log file with that error.

For cilkview to work, where should the LD_L... path point to? I have this:

export LD_LIBRARY_PATH=/opt/gcc-cilk/lib:/opt/isl/lib:$LD_LIBRARY_PATH
export PATH=/opt/gcc-cilk/bin:/opt/cilkutil/bin:$PATH
export LIBRARY_PATH=/opt/gcc-cilk/lib

where gcc-cilk is my gcc-cilk installation, and /opt/cilkutil is the cilk{view|scree} installation. That is all in a cilk.sh file that I "source cilk.sh". Now, If I compile the regular fib example with 

gcc -Wall fib.c -o fib -fcilkplus -lcilkrts

It compiles fine (not even a warning) and gives me the expected speedup:

$ CILK_NWORKERS=1 /usr/bin/time ./fib 40
102334155
18.14user 0.00system 0:18.15elapsed 99%CPU (0avgtext+0avgdata 4496maxresident)k
0inputs+0outputs (0major+332minor)pagefaults 0swaps

$ CILK_NWORKERS=2 /usr/bin/time ./fib 40
102334155
18.16user 0.00system 0:09.18elapsed 197%CPU (0avgtext+0avgdata 5072maxresident)k
0inputs+0outputs (0major+375minor)pagefaults 0swaps

I'm sure I'm doing something silly... but I feel, in my heart of hearts, that this may have to do to my way of linking LD and the 32 bit libs... HELP! :)

My understanding is that "cilkview" and "cilkscreen" does its own manipulations of LD_LIBRARY_PATH to calculate the location of the libraries it needs, relative to the location of the binary file.    I am able to run cilkscreen on fib with only the path to the Cilk Plus runtime, i.e., the equivalent of  "/opt/gcc-cilk/lib" folder in LD_LIBRARY_PATH.

Still working on trying to reproduce the behavior you are observing...

Cheers,

Jim

It looks like you are using Ubuntu?  Do you know which version you are using?    What is the output of:  
     lsb_release -a

What is the output of:
     ldd -v   cilkview

Do you know which version of GCC you are using?  Which revision did you end up building from? 
I'm not quite sure that all this information is necessarily relevant, but perhaps it may help in finding a platform that duplicates the issue.
Thanks,

Jim

One other thing that might be worth checking is what the ptrace_scope is.  In Ubuntu 10.10 and later, some of the online documentation I found suggests that by default, users can not ptrace processes that are not a descendant of the debugger.    On my Ubuntu 11.10 machine, if "cat /proc/sys/kernel/yama/ptrace_scope" returns 0, cilkview works for me.   If this value is set to "1", then I get an error message about "injection mode" similar to the one you described earlier.

There is an old post about problems using Cilkscreen with Cilk++ that describes the same issue.    Cilk++ is a precursor of Cilk Plus, but the information should still be relevant.

http://software.intel.com/en-us/forums/topic/281029

Does that change any of the behavior you are seeing?

Jim

Here's the info you requested:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.1 LTS
Release: 12.04
Codename: precise

and for the ldd -v

$ ldd -v /opt/cilkutil/bin/cilkview
linux-gate.so.1 => (0x00b11000)
libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0x0066f000)
libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0x00110000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0x00115000)
/lib/ld-linux.so.2 (0x00608000)

Version information:
/opt/cilkutil/bin/cilkview:
libdl.so.2 (GLIBC_2.1) => /lib/i386-linux-gnu/libdl.so.2
libdl.so.2 (GLIBC_2.0) => /lib/i386-linux-gnu/libdl.so.2
libc.so.6 (GLIBC_2.3) => /lib/i386-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.1.2) => /lib/i386-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.4) => /lib/i386-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.2) => /lib/i386-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.3.4) => /lib/i386-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.1) => /lib/i386-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.0) => /lib/i386-linux-gnu/libc.so.6
libm.so.6 (GLIBC_2.0) => /lib/i386-linux-gnu/libm.so.6
/lib/i386-linux-gnu/libm.so.6:
ld-linux.so.2 (GLIBC_PRIVATE) => /lib/ld-linux.so.2
libc.so.6 (GLIBC_2.1.3) => /lib/i386-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.0) => /lib/i386-linux-gnu/libc.so.6
libc.so.6 (GLIBC_PRIVATE) => /lib/i386-linux-gnu/libc.so.6
/lib/i386-linux-gnu/libdl.so.2:
ld-linux.so.2 (GLIBC_PRIVATE) => /lib/ld-linux.so.2
libc.so.6 (GLIBC_PRIVATE) => /lib/i386-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.1.3) => /lib/i386-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.1) => /lib/i386-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.0) => /lib/i386-linux-gnu/libc.so.6
/lib/i386-linux-gnu/libc.so.6:
ld-linux.so.2 (GLIBC_2.3) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_PRIVATE) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_2.1) => /lib/ld-linux.so.2

Can't see anything wrong with it. Can you?

I have a script that initializes everything and gives me infor about the env I have just set up. It gives me this for development:

Setting up CilkPlus support...

gcc (GCC) 4.8.0 20130109 (experimental)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

cilkview v2.0.2516.0
Copyright (c) 2010 Intel Corp. All rights reserved

cilkscreen version 2.0.0, build 2516, built May 7 2012 09:59:08
using PIN 2.10, Build 43611

Done.

But I used 'gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3' to build GCC (after a lot... A LOT... of troubles with isl and cloog, e.g). Maybe, although it's a lot more work, we might be able to fix the gcc version (say 4.8) and only work on the cilk libs. I know we would have to port it again from time to time, but at least we could have a stable build. Just a suggestion...

I will check ptrace and the link you mentioned and get back to you. EDIT: Just went to the link you mentioned. We have three suspects now: 64-bit vs 32-bit (something about this...), libs somehow being wrongly picked up (our LD problem) and now, back on the lineup, we have the pintools. I don't think the ptrace is the problem now, when this happens in my 64-bit machine, I do the "echo" thing and everything just works.

Jim, visit the IRC channel from time to time, just to say hi! :)

Thanks for your time.

Ok.  I have been able to reproduce a crash.  I believe we also know what the issue is.

The problem seems to be that GCC appears to be generating metadata entries of the wrong size on 32-bit platforms (or at least the ones that you and I are running on).    On fib compiled with icc, or on 64-bit platforms with gcc and icc, the correct metadata seems to be generated, which partially explains why I was having trouble reproducing the issue.

I'm not sure what the fix will involve yet though.

Thanks for your patience. 
Cheers,

Jim

Oh, I see. Thanks, Jim. Let me know if I can help in any way.

L.

Accedere per lasciare un commento.