Intel® Cilk™ Plus

No barriers in cilkplus?

I was nesting parallelism with cilkplus.  At the high level I invoked several cilk_spawn, inside of that I then used cilk_for.    I created a reducer - there doesn't seem to be any way to reduce across the spawned tasks without going back up to the function invoking the spawns and calling cilk_sync.  This means I must invoke cilk_spawn with new entry points to continue. 

Something like this:

cilk::reducer< cilk::op_add <int > > mysum (0) ;
. . . .

gcc 5.3 and clik dev tools: cilkscreen and cilkview

I see that the dev tools cilkview and cilkscreen have been updated in late 2015 (build 4421) -- the web pages state that they support the gcc cilk branch -- does this imply that they don't support the latest gcc compilers -- aka gcc 5.3.  The gcc 5.3 compiles and runs the cilk code like the traditional fib.cpp test code.  But when run under the latest cilkscreen and cilkview, these programs complain about lack of cilk sir code, see below:

-bash-4.2$ cilkview ./fib


Cilkview: Generating scalability data

Cilkview Scalability Analyzer V2.0.0, Build 4421

Using Cilk Plus in cross-platform R packages (GCC 4.9.3)

I need to be able to compile my application in Linux, Windows and Mac using compiler tools supported by R. For windows this means that I must use Mingw-W64 vs with GCC 4.9.3 ( 

1. Linux: I understand that GCC supports Cilk Plus since GCC 4.9. Is Cilk Plus already a part of GCC 4.9.3 or does it need to be installed separately?

2. On Windows, is Cilk Plus available for the Mingw-W64 toolchain. If not, can it be added and used as another library?

reducers under nested iterations

Hello, I am doing some SpMV-related work and exploring the use of CilkPlus. I had a question related to reducers that I could not find out myself reading the documentation.  In short: is there a simple or performant way of declaring a logical set of reducers or a reducer 'holder' such that an inner cilk_for uses its own reducer hyperobject, without the outer cilk_for having to share the same hyperobject over all of its strands.

Consider the following C99-CilkPlus loop code, which calculates a sparse binary matrix-vector multiplications for eight vectors simultaneously:

Приводим данные и код в порядок: данные и разметка, часть 2

In this pair of articles on performance and memory covers basic concepts to provide guidance to developers seeking to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
  • Students
  • Server
  • Windows*
  • Modern Code
  • C/C++
  • Fortran
  • Intermediate
  • Intel® Advisor
  • Intel® Cilk™ Plus
  • Intel® Threading Building Blocks
  • Intel® Advanced Vector Extensions
  • OpenMP*
  • Intel® Many Integrated Core Architecture
  • Optimization
  • Parallel Computing
  • Threading
  • Vectorization
  • Subscribe to Intel® Cilk™ Plus