pack() seems relatively expensive

pack() seems relatively expensive

tuinenga的头像

Compared to

pack( source, mask )

... I get better timings for

split( source, select( mask, isize(1), isize(0) )).segment( usize(1) )

... i.e. it is faster to sort "in place" and keep onlythat whichyou need. And much faster if you want both parts!For example

t1 = pack( source, mask );
t2 = pack( source, !mask);

... versus

t0 = split( source, select( mask, isize(1), isize(0) ));
t1 = t0.segment( usize(1) );
t2 = t0.segment( usize(0) );

Just FYI,
- paul

3 帖子 / 0 new
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项.
Hans Pabst (Intel)的头像

Thank you very much for figuring this out! This is really useful, especially your comment on "both parts". We are taking steps within this Beta program to continue making Intel ArBB a productive, portable, and performant programming model. We definitely address those kinds of experiences in first place.

tuinenga的头像

Unexpectedly, I'm seeing some commonality or analogy between ArBB and IBM punch card (aka Hollerith) computing from the 1950s, '60s, and '70s.
See: http://en.wikipedia.org/wiki/Unit_record_equipment

I find it poetic (and/or ironic) that leading-edge, scientific, efficient & relatively fine-grained, parallel processing (ArBB) should share many of the same qualities with card sorters and tabulators. Is it time to review algorithmspublished inJACM?

- paul

登陆并发表评论。