A deadlock can occur when using two or more locks and different strands acquire the locks in different orders. It is possible for two or more strands to become deadlocked when each strand acquires a mutex that the other strand attempts to acquire.

Lock Contention

Parallel strands are not able to run in parallel if they concurrently attempt to access a shared lock. In some programs, locks can eliminate virtually all of the performance benefit of parallelism. In extreme cases, such programs can even run significantly slower than the corresponding single-processor serial program. Consider using a reducer if possible.

Performance Considerations for Intel(R) Cilk(TM) Plus Programs

Parallel programs have numerous additional performance considerations and opportunities for tuning and improvement.

In general, the Intel® Cilk™ Plus runtime uses processor resources efficiently using a scheduling algorithm called work stealing. The work stealing algorithm is designed to minimize the number of times that work is moved from one processor to another.

Timing Programs and Program Segments

You should measure performance to find and understand bottlenecks. Even small changes in a program can lead to large and sometimes surprising performance differences. The only reliable way to tune performance is to measure frequently—preferably on a mix of different systems. Use any tool or technique at your disposal, but only true measurements will determine if your optimizations are effective.

Subscribe to Developers