I have implemented an application that uses Intel TBB
pipeline pattern for parallel processing on Intel Xeon CPU E5420 @ 2.50GHz running
The application basically composes of 8 pipelines. Each
pipeline has one token (making it one thread per pipeline). Each pipeline receives
data from an endpoint and processes it to completion. I ran this application
and collected general exploration analysis data using vTune amplifier. The
profiler reported high CPI in finish_task_switch function of vmlinux module which
suggests that the kernel is spending more time performing context switching and
adversely affecting performance of the application.
What I would like to understand is why is the kernel
performing high context switching? Will each pipeline be scheduled on the same
CPU? Is there a way to assign CPU affinity with each pipeline? How can reduce
this performance impacting behavior? Please provide some optimization tips.