Оптимизация? Конечно, каждый сталкивался с данной задачей при разработке своих, сколь-нибудь значительных, требующих определённых вычислений, приложений. При этом способов оптимизировать код существует огромное множество, и, как следствие, различных путей сделать это в автоматическом режиме с помощью опций компилятора. Вот здесь и возникает проблема – как выбрать то, что нужно нам и не запутаться?
First I would like to thank you all for the awesome cilk plus tools you have open source in GCC and LLVM.
I am trying to study the runtime library and finding it a bit difficult to follow the execution in a sample application.
Are there any developer documents available? A wiki perhaps.
Specifically, I am trying to trace the execution path for cilk_spawn which is a key word. Any helpful links to get me started would be really great!
What I understood about steal-continuation is, that every idle thread does not actually steal work, but the continuation which generates a new working item.
Does that mean, that inter-spawn execution time is crucial? If 2 threads are idle at the same time, from what I understand only one can steal the continuation and create its working unit, the other thread stays idle during that time?!
As a debugging artefact, I had a global counter incremented on every function call of a function used within every working item.
I'm new to cilk, and i wanted to ask if it has an implicit threshold for the task creation, in recursive computations like fib?
If so, is it based on the number of tasks created, or in the depth of the computation?
The latest Intel® Xeon® processor E7 v2 family includes a feature called Intel® Advanced Vector Extensions (Intel® AVX), which can potentially improve application performance. Here we will explain the context, and provide an example of how using Intel® AVX improved performance for a commonly known benchmark.
For existing vectorized code that uses floating point operations, you can gain a potential performance boost when running on newer platforms such as the Intel® Xeon® processor E7 v2 family, by doing one of the following:
I have code that is structured like this:
I'm having difficulty running a simple test case using cilk_spawn. I'm compiling under gcc 4.9.0 20130520.
The following fib2010.cpp example, executes in 0.028s without cilk and takes 0.376s with cilk as long as I set the number of workers to 1. If I change the number of workers to any number greater than one, I get a segmentation fault.
I'm having difficulty comparing cilk_for with cilk_spawn. The following cilk_spawn code executes as I expect for command line arguments like 1000000 30
I have the loop, inside its body running the function with array member (dependent on loop index) as an argument, and returning one value. I can parallelized this loop by using cilk_for() operator instead of regular for() - and it is simple and works well. This is explicit parallelization. Instead of explicit loop instruction I can use Array Notation contruction (as shown below) - it is implicit loop. My routine is relatively long and complecs, and has Array Notation constructions inside, so it cannot be declared as a vector (elemental) one.
- Page 1