why my tbb program doesn't speed up

why my tbb program doesn't speed up

imagem de zlw

I have this part of code in my program:

tbb::parallel_for(tbb::blocked_range<int>(0, NumberY,NumberY/6 ), [&] (const tbb::blocked_range<int> &r) -> void{
              for (int iy=r.begin(); iy<r.end(); iy++){
                 int x_loc = x_left;
                for (int ix=0; ix<NumberX; ix++){
                        MyFunction(x_loc, intensity_value);
                        pfDensity[iy*NumberX+ix]  += intensity_value * mvp_idx;
                        x_loc += delta_x;
 } ); // parallel_for

NumberY and NumberX are around 8000. When I run in single thread, it runs 2 times faster than running in multithreading using TBB. I have tried to adjust grain size, or init tbb first. none of them helps.

This is a function apply to a matix, MyFunction is an interpolate function. I think it doesn't speed up as the load is too small. But this is as much as I can divide the work and I still would like something running faster than the single thread code.

How can I improve the performance here?


1 post / 0 new
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.