Intel Cilk Plus SDK - Cilkscreen and Cilkview for Intel Cilk Plus

Intel Cilk Plus SDK - Cilkscreen and Cilkview for Intel Cilk Plus

The Intel Cilk Plus SDK (Software Development Kit) is now available as a WhatIf kit. It supplies the Cilkscreen race detection and Cilkview scalability tools for Intel Cilk Plus developers working on the Microsoft Windows* and Linux* operating systems.

More information can be found at the Intel Cilk Plus Download page. Support for the Intel Cilk Plus SDK will be provided through the Intel Cilk Plus forum.

Note: You must have SP1 or later of the Intel Parallel Composer 2011 or Intel C++ Composer XE 2011 products.

- Barry

7 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Here's how to know if you're running Update 1 or an earlier version:

Linux

Look at the directory that contains the Cilk Plus runtime. The shared object's name is libcilkrts.so.5, So the path should be something like

/intel/composerxe-2011.1.107/compiler/lib/intel64/libcilkrts.so.5

The text after "composerxe" indicates that this is 2011, Update 1, package (build) 107.

Windows

Look at your installed programs using the Control Panel. For Windows XP, use the "Add or Remove Programs" applet. For Windows Vista or later, use the "Program and Features" applet.

In either case, you should see an entry for "Intel Parallel Composer 2011 Update 1".

Where to find Update 1

Update 1 can be downloaded from http://registrationcenter.intel.com. You must have valid support services. Update 1 is a full package, not a patch.

- Barry

Hi,

is there a similar program that i can use with the gcc? I do not have an Intel compiler.

If you're using the "cilkplus" branch of the GCC 4.8, you can use the same version of Cilkscreen and Cilkview available for use with the Intel compiler. You can download the Cilk Plus SDK (which contains Cilkscreen and Cilkview) from http://software.intel.com/en-us/articles/intel-cilk-plus-software-development-kit/.

Cilk Tools Build 3229 has been posted to http://cilkplus.org/download . This version fixes crashes running Cilkscreen on Windows in programs that start non-Cilk threads.  In addition, there are been major improvements to stack traces:

  • Stack traces should now go back to main(), not the start of the parallel region.
  • Spawn helper functions are now suppressed from the stack trace
  • Stack traces on Linux now include non-Cilk functions (that has been true on Windows for a while)

    - Barry

Do cilkview and cilkprof incur the startup overhead of pin rewriting all the basic blocks just to activate the __notify's?  Ideally we would like to get to a point where we can use the low-overhead annotations without the full cost of PIN intercepting and rewriting all basic blocks.  Is that possible?  

Cilkscreen, Cilkview and Cilkprof are all based on PIN.  I believe that PIN will rewrite only basic blocks that a tool has expressed an interest in.

The low-overhead annotations are not specific to PIN. You're welcome to use the annotation decoding in libzca with whatever instrumentation technology you wish.  But, be aware that we've made decisions based on the fact that our tools are PIN-based. The low-overhead annotations have the ability to insert NOPs to allow you to easily insert JMPs, but since none of our tools needed this, the annotations generated by the compiler (and most of the ones in the runtime) do not do this. We made this decision because even if the NOPs are removed early in the instruction decode, they still make the image larger so there's a penalty on instruction fetch as well as additional paging.

If you don't want to use PIN, you'll need to come up with an alternative technology to modify the executing image. I know there are commercially available tools that have the ability to generate "dynamic hooks."  The tool determines the instructions that would be overwritten by a JMP, and uses them to generate a "thunk."  The thunk contains the code necessary to:

  • Preserve processor state
  • Call into a common entrypoint to execute the analysis function
  • Restore the processor state
  • Execute the instructions that were replaced by the JMP to the thunk
  • JMP to the first full instruction after the inserted JMP.

Since ia32 and intel64 instructions are variable sized, the instructions have to be decoded until you have enough space to hold the JMP (5 bytes on an ia32 system, I don't recall how much on intel64).  Once the thunk is generated, the executing image is modified to overwrite 5 bytes with the JMP to the thunk.

Of course, this would have to happen before anything other than the main thread can access the code, and you'll have to be careful that no code  JMPs into the middle of the instructions you've overwritten.

    - Barry

Accedere per lasciare un commento.