sse2 latency

sse2 latency

I have an app that is sorting 3 8-bit values from highest to lowest (pixel grayscale values, actually). I'm optimizing it using SSE2 integer instruction and removing branches by using bitmasks. For example, the original was something like:

if(greater than)
{

1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.