Thank you very much for figuring this out! This is really useful, especially your comment on "both parts". We are taking steps within this Beta program to continue making Intel ArBB a productive, portable, and performant programming model. We definitely address those kinds of experiences in first place.
pack() seems relatively expensive
如需更全面地了解编译器优化,请参阅优化注意事项.



pack() seems relatively expensive
Compared to
pack( source, mask )
... I get better timings for
split( source, select( mask, isize(1), isize(0) )).segment( usize(1) )
... i.e. it is faster to sort "in place" and keep onlythat whichyou need. And much faster if you want both parts!For example
t1 = pack( source, mask );
t2 = pack( source, !mask);
... versus
t0 = split( source, select( mask, isize(1), isize(0) ));
t1 = t0.segment( usize(1) );
t2 = t0.segment( usize(0) );
Just FYI,
- paul