Intel® Array Building Blocks (Archived)

No perfomance boost on reduction using ArBB add_reduce : Looking for reasons?

Hi,

I was trying out some of the ArBB code samples. And thought of writing a simple application to use arbb add_reduce function. I'm running Windows 7, Intel Core 2 Duo CPU : E7500, 2.93 GHz.

Running my sample gives me a speedup of just 1.6x on 2 cores by setting ARBB_OPT_LEVEL=O3.

Here is the code :

Map fusion error, by design?

With this example...

void step1(f32 &val, f32 &arg)
{
arg = (arg + neighbor(arg, 1))*0.5;
};
void step2(f32 &val, f32 &arg)
{
val = (arg + neighbor(arg, 1))*0.5;
};
void steps(f32 &val, f32 &arg)
{
arg = (arg + neighbor(arg, 1))*0.5;
val = (arg + neighbor(arg, 1))*0.5;
};
void do_steps(dense &val, dense &arg)
{
#if 1
arbb::map(step1)(val, arg);
arbb::map(step2)(val, arg);
#else
arbb::map(steps)(val, arg);
#endif
};

Overhead insight

As if an FAQ, we see significanttiming differences between ArBB's verbose report of execution time (presumably actual work being done) versus an external timingwith scoped_timer (presumably including all data copying overheads, but what else?). In detail, what all accounts for the greater "external" time?

Trivial example:

{
const closure &, dense &)> clo = capture(do_it);
clo(vbar, vfoo); // once, to process any set-ups
const scoped_timer timer(ptime, scoped_timer::unit_us);
clo(vbar, vfoo); // time this execution
}

Iterating over pages

Hello all,
I am moving forward with my first test application. I finally got it running, however my test benchmark still is "too slow".
The current code runs in 150 [ms] (measured using arbb::scoped_timer, using an arbb::auto_closure created out of the scope) and I expected it to run at least three times faster (~40 [ms]), based on other "high efficiency" implementations.
My guess is that I am still using ArBB wrong.

My current test code looks like this:

_for VS for

Hi,

I recently read the article 'When, and when not, to use the Intel ArBB _for loops'.

This article says to use _for loops 'to express serially dependent iterative computation. This is the case
where a computation must be done incrementally, with the current step
depending on the result of the previous step.'

Now I have a bunch of code which seems to just work fine as follows (using a regular for loop):

Using arbb::array inside arbb::dense

Hello,I am trying to get my first ArBB program running. My first trials ended up in weird segmentation faults, so I decided to step back and start from the "simplest code ever" and make it grow into the desired application.Right now I am stuck with using non scalar values inside arbb::dense. My test code with dense works fine, when I change f32 to arbb::array the code fails to compile and I do not understand why and how I should work around it.Could someone explain to me what is wrong in the following code, and what is correct way of doing this ?

S’abonner à Intel® Array Building Blocks (Archived)