38986 – comparing lengths of 2 strings reads through both strings completely

Bug 38986 - comparing lengths of 2 strings reads through both strings completely

Summary: comparing lengths of 2 strings reads through both strings completely

Status:	RESOLVED WONTFIX

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	c++ (show other bugs)
Version:	4.3.2

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-01-27 12:07 UTC by esigra
Modified:	2009-02-01 10:50 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description esigra 2009-01-27 12:07:53 UTC

Suppose that someone wants to see if a string is shorter than another and writes:
   #include <string.h>
   bool f(char const * const a, char const * const b) {return strlen(a) <= strlen(b);}

Then g++ generates this:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        subl    $4, %esp
        movl    8(%ebp), %eax
        movl    %eax, (%esp)
        call    strlen
        movl    %eax, %ebx
        movl    12(%ebp), %eax
        movl    %eax, (%esp)
        call    strlen
        cmpl    %eax, %ebx
        setbe   %al
        addl    $4, %esp
        popl    %ebx
        popl    %ebp
        ret

But it looks like this code reads through both strings completely! This is obviously very efficient if one of the strings is much longer than the other. It should of course only read as far into each string as the shortest of them is long. The generated code should be similar to what this would give:
bool g(char const * a, char const * b) {
   for (;; ++a, ++b)
      if      (not *a)
         return true;
      else if (not *b)
         return false;
}

Comment 1 Wolfgang Bangerth 2009-01-30 15:13:12 UTC

Yes, I think this would be an optimizing compiler could potentially perform.
At the same time I think you are expecting too much from the compiler: it
would have to have a semantic understanding of what the strlen function
does and that the result can be obtained by merging the two calls to strlen,
then merging the two loops, and then truncating them.

I don't think that this sort of analysis is going to be available in compilers
anytime soon. Since you already have the solution, maybe this is the way to
go!

Best
 W.

Comment 2 esigra 2009-01-30 16:12:54 UTC

GCC already understands the semantics of strlen. If one of the operands to "<" is a constant and the other is strlen, it is optimized (such as "strlen(str) >= 1). It just seems like the case with strlen on both sides is missing.

Comment 3 Falk Hueffner 2009-02-01 10:50:59 UTC

The main problem is that just replacing the code by the below loop won't necessarily give a speedup. strlen is usually implemented in highly efficient and platform-specific assembly that treats several bytes at a time. I don't think the vectorizer will give equally efficient code for the loop. Also, in normal applications almost all strings are pretty short (and not of very unequal length). Probably the best way to motivate somebody to look into this is if you gave benchmark results that show a noticeable improvement for some real-world application.