Hi everybody,

I want to use a parallel_for in order to do some heavy calculations on a 2D matrix. Those calculations are really heavy, since I am implementing a SIFT algorithm for image processing.

I divided the matrix in big blocks (ex. 120 x 120), and within each block I apply my calculation pixel per pixel using a parallel_for on it. At the end of each calculation (per pixel), I have to decide if that pixel is important or not, and if it's important, I want to accumulate that pixel (with some other stuff related to it) on a vector. To do so, I have to use a "global" vector to accumulate those special points.

For now, I am using a mutex that protects the vector whenever each point is added to it. However, using a mutex in such a way, it could impact negatively on the performances: in this way, what is the best algorithm or strategy that could be used?

I tried to use parallel_reduce, in order to let each block have its own STL vector, and then join the vectors after each block has finished it's computation, but is slower that the version with mutex!

I think that is quite a common problem, can you help me?

Thanks a lot for your time,

Riccardo