Inefficient code for simple string comparison

Inefficient code for simple string comparison

The following reproducer exposes an inefficiency with string comparisons. The compiler generates a call to for_cpstr() that consumes substantial compute time for a case I have. Tried with versions 11.1 and 13.1 and get the same assembly code. What is that call good for ? The compiler should be able to do much better with such a simple piece of code.

subroutine scmp(a,b,l, flag)

  character(*), intent(in) :: a
  character(*), intent(in) :: b
  integer, intent(in)  :: l
  logical, intent(out) :: flag

  if (a(1:l) == b(1:l)) then
    flag = .true.
    flag = .false.
  end if

end subroutine scmp

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

for_cpstr()  compares two Fortran CHARACTER objects dealing with "strings" of different length and the different relational operators (EQ here). Is this the call itself you complain about (the overhead of the call) or the efficiency of the code implementing for_cpstr() ? I don't see anything,  the compiler could do better.

O.k. I remember that the world of multi-byte-characters and multi-language strings can be rather complex. (Fortran programmers don't do such stuff.)

In my case the string comparison deals with keywords of less than a dozen plain ASCII characters, so the subroutine call overhead gets significant. It seems like for_cpstr() then calls intel_memcmp() so the overhead is even worse. One workaround I found is to start comparing just the first character and immediately return in case of inequality. That single character operation does not invoke for_cpstr().

Specific code generation for equality / inequality would be desired here. I think it could be done with less than 10 assembly instructions.

I think the suggestion is that the compiler special-case comparison of short strings of known same constant length and just do integer compares. That seems like a worthwhile idea to me, as such comparisons are frequently found in Fortran code.

Retired 12/31/2016

yes - in general it might be helpful to have special implementations of short strings. But here the length is not a compile-time constant and intel_memcmp() does the optimizations like returning as soon as the first character pair is different, using 4-byte/8-byte integer compares and even using the vector-extensions.

the only optimization I see here would be a special version for strings of equal lenght ( saves a very, very few cycles) or in-lining  - but in-lining is not free either ( code size increase, additional register pressure).

It would be interesting to see benchmark data   on how many cycles could be gained really.

See the Oprofile data to show the relevance of this issue in the complete program context. The workaround I mentioned cuts 5.7% of total runtime.

Before workaround
samples  %        symbol name

522549   20.2889  m_tdt512_mp_psi_sweep_cycl_sf_
264716   10.2781  afpktr_
129155    5.0147  __svml_floorf4.R
 . . .
121155    4.7041  for_cpstr
78103     3.0325  _intel_fast_memcmp

After workaround
samples  %        symbol name

519767   25.7614  m_tdt512_mp_psi_sweep_cycl_sf_
162412    8.0497  afpktr_
128690    6.3783  __svml_floorf4.R
 . . .
21856     1.2935  for_cpstr
 . . .
11758     0.6959  _intel_fast_memcmp

Oh, my eyes were not good enough to see that the end position was a variable and not a constant. I agree with Heinz that an optimization here is more difficult.

Retired 12/31/2016

In the above case, the segment (1:l) is the same, in this situation (lengths equal and == operator) you could treat the "arrays" as integer(sizeof character), dimension(l). I do not think the optimization check would be hard to perform and impliment. N.B. this may introduce issues with respect to "l" indexing out of bounds.

Jim Dempsey

Leave a Comment

Please sign in to add a comment. Not a member? Join today