Niggling optimization problem

Niggling optimization problem

Hello all,

I have some string and vector classes with an overloaded operator= that accepts negative indices which mean "index from the end of the string", ala Python:

inline char String::operator = (int pos) const

{

if (pos < 0)

pos += len;

return data[pos];

}

I was hoping that when it was used in loops like this:

for (int i = 0; i < 1000; ++i)

if (s[i] == 'X')

s[i] = 'x';

the compiler would work out that 'i' could never be negative and eliminate the negative check in the inlined version of the operator function. Unfortunately, no matter what I do the check is always generated. I know this is a small problem but it's niggling at me Can anyone give any insight into why this happens and if it's possible to get rid of the check?

14 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

did you use -fast option or -O2 -QxT -Qipo?

Yes, compiling with /O3 and everything cranked up to the max. I suppose you can reduce the problem to something like:

for (int i = 0; i < 1000; ++i) {
if (i < 0)
printf("impossible");
}

On my setup code is generated for this loop and 'i' is compared with 0 in each iteration. Interestingly if '1000' is changed so that the loop has at most 12 iterations it's eliminated entirely. I think think this corresponds to it being unrolled as it seems to be possible to make the same happen for larger loops using #pragma unroll(255).

"printf()" is not a good example. it turns off the optimization.

I don't see the problem with following code with "icl -O3 -FA -c t.cpp". Maybe you're using an old compiler. The latest is 10.1.013.

class mystr
{
public:
mystr()
{
len = 10;
}
virtual ~mystr() {}
inline char operator = (int pos) const
{
if (pos < 0)
pos += len;
return data[pos];
}
inline char operator [] (int pos) const
{
return data[pos];
}
inline char & operator [] (int pos)
{
return data[pos];
}
private:
char data[2000];
int len;
};

void bar()
{
mystr s;

for (int i = 0; i < 1000; ++i)
if (s[i] == 'X')
s[i] = 'x';
}

int foo()
{
int ret = 0;
for (int i = 0; i < 1000; ++i)
{
if (i < 0)
ret = 0;
else
ret += i;
}
return i;
}

Thanks for taking the time to test this. After rereading my initial post I realize I wrote operator= when I meant operator[]. Sorry Nevertheless the problem is still there.

Below is some test code in terms of "mystr". With the check inside operator [], the compiler emits the tests inside the bar() loop on my system. They show up as "test reg, reg" followed by a conditional jump - easier to see with unrolling turned off.

class mystr
{
public:
mystr()
{
len = 10;
}

inline char operator [] (int pos) const
{
if (pos < 0)
pos += len;
return data[pos];
}

inline char &operator [] (int pos)
{
if (pos < 0)
pos += len;
return data[pos];
}

private:

char data[2000];
int len;
};

void bar()
{
mystr s;
for (int i = 0; i < 1000; ++i)
if (s[i] == 'X')
s[i] = 'x';
}

Yes, with this testcase, it still checks for "pos < 0".

I'll file a tracker on this. When there's a fix, I'll post the news here.

Thank you. I suppose it's worth noting that both gcc4 and vc9 do this too - that's why I wondered whether there might be some subtle reason the compiler can't assume 'i' is always positive.

TJ

Have you considered using a computational method to produce the index and thus eliminate the if test?

Assume int is 32 bits and len is never more than 16 bits

return data[pos+((pos>>16)&len)];

Use a slightly different statemtent for 64-bit int.

This eliminates the branch around the negative indexing. Note, if the compiler optimization is smart enough to know that (pos>>16) == 0 then it should remove the +((pos>>16)&len) from the expression.

If len is permitted to exceed 16 bits then there are other ways to quicklyflood the & mask with 1's when pos is negative (e.g. >>31).

Jim Dempsey

www.quickthreadprogramming.com

Thanks for your suggestion. I tried removing the branch with

pos += len & (pos >> 31);

and found that it was slower than the version with the branch, perhaps because the branch is correctly predicted 100% of the time when the sign of the index doesn't change. In fact, the overhead of the branch being there is probably about zero, but it does prevent any loop using this overloaded operator to access characters from being vectorized.

TJ,

You do not want the pos+=. Just the simple expression inside the []

inline char operator = (int pos) const
{
return data[pos+((pos>>16)&len)];
}
or something like:
inline char operator = (int pos) const
{
return data[pos+((((short*)(&pos))[1])&len)];
}
But that won't work if pos is an expression
The above will require than len < 16 bits
else use the following
inline char operator = (int pos) const
{
return data[pos+((pos>>31)&len)];
}
Jim
www.quickthreadprogramming.com

I forgot to mention. Sometimes the triglyph produces good code

return data[pos<0 ? len-pos : pos];

The higher level processors have the CMOVcc instruction and often the triglyph will use that now.

Jim

www.quickthreadprogramming.com

the problem related to "if (post <0)" extra-checking is fixed. It will be in the next major release.

Let me check all the other similar issues posted later to see if it's fixed as well.

Just a note to say I re-tested the above code on 10.1.013 Linux and the test is still generated -- though I don't know if by major release you mean 11.0.

Yes, next major release is 11.0. I should have been more clear on this.

Leave a Comment

Please sign in to add a comment. Not a member? Join today