pack() seems relatively expensive

pack() seems relatively expensive

Compared to

pack( source, mask )

... I get better timings for

split( source, select( mask, isize(1), isize(0) )).segment( usize(1) )

... i.e. it is faster to sort "in place" and keep onlythat whichyou need. And much faster if you want both parts!For example

t1 = pack( source, mask );
t2 = pack( source, !mask);

... versus

t0 = split( source, select( mask, isize(1), isize(0) ));
t1 = t0.segment( usize(1) );
t2 = t0.segment( usize(0) );

Just FYI,
- paul

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Thank you very much for figuring this out! This is really useful, especially your comment on "both parts". We are taking steps within this Beta program to continue making Intel ArBB a productive, portable, and performant programming model. We definitely address those kinds of experiences in first place.

Unexpectedly, I'm seeing some commonality or analogy between ArBB and IBM punch card (aka Hollerith) computing from the 1950s, '60s, and '70s.

I find it poetic (and/or ironic) that leading-edge, scientific, efficient & relatively fine-grained, parallel processing (ArBB) should share many of the same qualities with card sorters and tabulators. Is it time to review algorithmspublished inJACM?

- paul

Leave a Comment

Please sign in to add a comment. Not a member? Join today