add_merge()

add_merge()

tuinenga's picture

What is the utility,the purpose,of add_merge()? I understand what it does, but not why it exists. It's not anything like scatter() ... an "add scatter" would make more sense, to me.
- paul

17 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Hans Pabst (Intel)'s picture

Hi Paul,

The terminology "merge" might result from combining independently collected information into a single entity, i.e. "resolving collisions". One application of the add_merge operator is a histogram algorithm, e.g. result = add_merge(1, image, 256) in case of 8-bit single-channel images.

Hans

tuinenga's picture

I may be slow, but that example doesn't make sense to me: the result is the same size as the source (according to the documentation). How does that make a histogram?
- paul

Zhang Z (Intel)'s picture

What Hans described is what add_merge() is supposed to do. However, the current design of add_merge() actually makes it difficult to use. This is a known issue. And as far as I know, we are going to refine the interface of add_merge() in a near future release.

Hans Pabst (Intel)'s picture

Yes, my description is based on what we will see in the next release of Intel ArBB, i.e. the issue is addressed already.

tuinenga's picture

Will it take a data container and an index container, add merge, and produce (optionally) a smaller container? Producing an inefficiently larger result is my concern. Yes, it can be trimmed afterwards, but isn't the idea of "merge" to create a smaller thing from a larger thing?
- paul

Hans Pabst (Intel)'s picture

Hi Paul

Sure, the example result = add_merge(1, image, 256) would take a 8-bit gray scale image of any size and return a dense container with 256 elements of type u32. Each of these 256 elements is accumulating the number of pixels into these 256 buckets (there are 256 shades of gray). Collisions are resolved ("merge") by incrementing the existing count, the increment is one (the first argument in the given example).

Hans

tuinenga's picture

Hi Hans,

Sounds great. In the example, the image data becomes the merge index, and ... at this point I'm lost. You explained that the "1" increments the bin, and yet the merge index is making out-of-bounds references into the data (except for indices of zero) unless constants are employed differently.

I'm unhappy to wait for the next beta,tosee how wonderful this function has become. This operation is the missing link in my"dense vector as sparse matrix" and vector multiply. (Yes,a dreaded one-liner of ArBB code.) Usingsection(add_merge(...),length) worksbutcan'tparallelize, so performance is less than other methods. Will you cut-n-paste the newinfo here on this?

- paul

Hans Pabst (Intel)'s picture

Hi Paul,

you can now have a look at samples/math/histogram (Intel ArBB Beta 5) to find an example which is using the add_merge operator.

Hans

tuinenga's picture

Hi Hans,

Yes, add_merge() is potent asan actorin SpMV so I am looking at your latest version with great interest.

- paul

tuinenga's picture

Using beta 5,this one eluded me for a while because it happens only when add_merge() is called e.g. multiple times. It seems by design, from the ref. man., to continue to sum into the result container i.e. "...and result elements set to 0 unless otherwise initialized." However, the debug version of add_merge() does not seem to do that.

Also, I'm confused by the dense container constructor. Does it initialize e.g. =T(), or not? Here are two similar routines that behave differently, Debug versus Release, and unexpectedly which raises a few issues:

void bar(dense& Out, const dense& Arg, const dense& Idx, const usize& Len)
{
Out = add_merge(Arg, Idx, Len);
}

void bar(dense& Out, const dense& Arg, const dense& Idx, const usize& Len)
{
dense tmp(Len);
tmp = add_merge(Arg, Idx, Len);
Out = tmp;
}

...when exercised by this code snip:

dense out(4);
dense arg = dense::parse("{0.,1.,2.,3.}");
dense idx = dense::parse("{ 0, 1, 2, 3}");
#ifdef _DEBUG
bar(out, arg, idx, usize(4));
bar(out, arg, idx, usize(4)); // YES, another call
#else
closure&, const dense&, const dense&, const usize&)> clo;
if (clo.empty())
clo = capture(bar);
clo(out, arg, idx, usize(4));
clo(out, arg, idx, usize(4)); // YES, another call
#endif
cout << "out:" << endl;
const_range r = out.read_only_range();
for (const_range::const_iterator i = r.begin(); i != r.end(); ++i)
cout << *i << " ";
cout << endl;

Now that I'm aware of it, my opinion is the continue-to-sum feature of add_merge is desirable ... just a quite unexpectedly advanced outcomefor the LHS container!

Best regards,
- paul

tuinenga's picture

Gentelmen,

Help! How do I nullify the cont.-to-sum feature of add_merge()? It is as if the ArBB compiler ignores anything I do to the LHS container, prior,because ArBB thinks it will be overwritten and so efficiently removes that code. I've tried fill(), replace(), and subtractingLHS from itself...

- paul

Stefanus Du Toit (Intel)'s picture
Quoting tuinenga Gentelmen,

Help! How do I nullify the cont.-to-sum feature of add_merge()? It is as if the ArBB compiler ignores anything I do the LHS container, prior,because ArBB thinks it will be overwritten and so efficietly removes that code. I've tried fill(), replace(), and subtractingLHS from itself...

- paul

The semantics of add_merge are intended to be independent of the destination container. If you find that the destination container's previous values are reused somehow then that is a bug - please provide as short a code snippet as possible and we'll fix it. If you wanted to keep merging in, e.g. in a loop, you would write something like:

X = ...;
for (...) {
  X += add_merge(...);
}

Note the "+=" usage here. Thanks, Stefanus

tuinenga's picture

Hi Stefanus,

Please seemsg. #10, above, for the bug example.

The ways I have managed to thwart the bug are:

Out= replace(res, usize(0), Len, usize(1), add_merge(Arg, Idx, Len));

Out= repeat(add_merge(Arg, Idx, Len), usize(1));

Out= section(add_merge(Arg, Idx, Len), usize(0), Len);

Out= rotate(add_merge(Arg, Idx, Len), isize(0));

Out= shift(add_merge(Arg, Idx, Len), isize(0));

Needless to say, these are not high-performance solutions.

- paul

tuinenga's picture

Yet another issue, even with a muzzled add_merge(), seems to be a memory leak. Using a closure similar to previous messages:

void bar(dense& Out, const dense& Arg, const dense& Idx, const usize& Len)
{
Out = shift(add_merge(Arg, Idx, Len), isize(0));
}

... with this code snip:

#define NB 10000
dense out(4*NB);
dense arg = repeat(dense::parse("{0.,1.,2.,3.}"), usize(NB));
dense idx = repeat(dense::parse("{ 0, 1, 2, 3}"), usize(NB));
#ifdef _DEBUG
for (int i = 1000; i > 0; --i)
bar(out, arg, idx, usize(4*NB));
#else
arbb::closure&, const dense&, const dense&, const usize&)> clo;
if (clo.empty())
clo = capture(bar);
for (int i = 1000; i > 0; --i)
clo(out, arg, idx, usize(4*NB));
#endif

... the MSVC Release executable halts with ArBB saying it's run out of memory. Reducing "NB" or the loop count will fix that.

Best regards,
- paul

Hans Pabst (Intel)'s picture

Hi Paul,

I know you would like to play, and try the add_merge operator for an SpMV implementation. To have another starting point, you may have a look at samples/math/sparse_matrix_vector (Intel ArBB Beta 5). In my opinion a "natural implementation" of SpMV is still related to a nested container although our given example is also lean and clean. To approach a good implementation using the current Beta, one can think about extending the given example code to detect contiguous row regions.

Hans

tuinenga's picture

Hi Hans,

As I have come to appreciate, add_merge() is flexible but that flexibility comes withthe cost of hard-to-parallelize tests/decisionsso Amdahl's Law looms. I agree with your take on nested containers. As of late, I'm experimenting with a "lazy evaluation" approach to this calculation; for dense matices either"eager evaluation" (or "supply push" as I like to think of it) or "lazy evaluation" ("demand pull") arrives at the same code. This is not so for sparse matrices.

Best regards,
- paul

Login to leave a comment.