I was wondering if anybody had suggestions on how to implement a summed area table with Intel TBB. The general idea of the algorithm:
1. Given an input, do an independent (inclusive) prefix scan on every row. Call this Intermediate.
2. Transpose Intermediate, call this IntermediateTranspose.
3. Do step (1) again, only do an inclusive prefix scan on every row of IntermediateTranspose. Call this OutputTranspose.
4. Transpose from (3) OutputTranspose -> Output.
To be clear, I'm *NOT* asking for anybody to code it for me! I'm very new to TBB and am struggling to find a way to do a prefix scan on independent rows. I am curious if there are any kinds of fancy iterators that I have not found yet. For example, a really dumb way to implement step (1):
for (row : input) tbb:prefix_scan(/* just this row */);
Are there any ways to basically unfold the outer "for each row in the input" into a prefix_scan? I can't really seem to figure out how to approach the 2D case here...
In some senses you could think of it as a matrix multiplication (for just step (1)), so maybe doing that instead of prefix_scan would be better? I don't know how I would approach that with TBB either, but the concept is multiplying with an (excuse my bad math nomenclature) "upper right triangular matrix of 1s".
# +- -+ +- -+ +- -+ # | 1 2 3 | | 1 1 1 | | 1 3 6 | # | 4 5 6 | * | 0 1 1 | = | 4 9 15 | # | 7 8 9 | | 0 0 1 | | 7 15 24 | # +- -+ +- -+ +- -+
I wouldn't want to explicitly store the matrix though, and don't think doing an implicit matrix multiply is something that would work with TBB? Also, I'm working with images (non-square), so I don't think I can use this approach anyway...
Thank you for any suggestions!