pack() seems relatively expensive

pack() seems relatively expensive

Compared to

pack( source, mask )

... I get better timings for

split( source, select( mask, isize(1), isize(0) )).segment( usize(1) )

... i.e. it is faster to sort "in place" and keep onlythat whichyou need. And much faster if you want both parts!For example

t1 = pack( source, mask );
t2 = pack( source, !mask);

... versus

t0 = split( source, select( mask, isize(1), isize(0) ));
t1 = t0.segment( usize(1) );
t2 = t0.segment( usize(0) );

Just FYI,
- paul

3 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

Thank you very much for figuring this out! This is really useful, especially your comment on "both parts". We are taking steps within this Beta program to continue making Intel ArBB a productive, portable, and performant programming model. We definitely address those kinds of experiences in first place.

Unexpectedly, I'm seeing some commonality or analogy between ArBB and IBM punch card (aka Hollerith) computing from the 1950s, '60s, and '70s.

I find it poetic (and/or ironic) that leading-edge, scientific, efficient & relatively fine-grained, parallel processing (ArBB) should share many of the same qualities with card sorters and tabulators. Is it time to review algorithmspublished inJACM?

- paul

Connectez-vous pour laisser un commentaire.