I'm testing out TBB right now. One of the algorithms I want to parallelize works on a row-major 2-D matrix by processing a column at a time.
How can I code this in TBB such that my algorithm can work on a column at a time?
Assuming that: matrix is M (row) x N (column), and given pointer float* that points to the (0,0) of the matrix. Assume further that the matrix is contiguous.
Originally, I hand-coded this such that for each element in the column, I stride by i*N to get to the next row.