As if an FAQ, we see significanttiming differences between ArBB's verbose report of execution time (presumably actual work being done) versus an external timingwith scoped_timer (presumably including all data copying overheads, but what else?). In detail, what all accounts for the greater "external" time?
const closure &, dense &)> clo = capture(do_it);
clo(vbar, vfoo); // once, to process any set-ups
const scoped_timer timer(ptime, scoped_timer::unit_us);
clo(vbar, vfoo); // time this execution