[v3] libstdc++/44413 (for ext/vstring)

Doug Semler dougsemler@gmail.com
Wed Jun 9 17:57:00 GMT 2010


On Wed, Jun 9, 2010 at 11:02 AM, Paolo Carlini <paolo.carlini@oracle.com> wrote:
> On 06/09/2010 04:32 PM, Doug Semler wrote:
>> I was not the original poster/requester/suggester/whatever and just
>> jumped in the middle of the conversation :)
>>
>> Sorry if I gave the wrong impression --- the comment "I don't see it"
>> was directed at the optimization attempt itself...I don't have any
>> ideas myself ;-)
>>
> I see ;-)
>> That being said, my interest in the compiler is on the 32 bit *and* 64
>> bit windows side (iow x86 family), and seeing something that hurts one
>> side with only minor benefits on the other side makes me want to at
>> least say something.
>>
> You did the right thing, I overlooked the 32-bit case, in my desperate
> attempt to improve - very marginally - the 64-bit case.
>
> To be honest, having considered the issue a bit more, I have no idea how
> to substantively improve the 64-bit case, considering that the length of
> a string can be much larger than 2^32, and we have to return an int in
> that case too. If you could get back to the original
> poster/requester/suggester/whatever, and report back to us when you have
> additional information, it would be great.
>

(Caveat: the following applies to linux gcc 4.5 branch --- i don't
have a good 4.6 build at this time)...

I don't know the OP (actually I think it's a bugzilla submission).
Maybe there is/could be a missed optimization -- I don't know.  But:

For example changing the original code to:

return (__n1 >= __n2) ? (__n1 - __n2) : -(__n2 - __n1);

Doesn't change the behavior at *all* for 32 bit while reducing the 64
bit code to what the 32 bit code generates (TOTALLY not runtime
tested, of course).   The caveat, however, is that  that particular
line of code is taking advantage of gcc's implementation defined
behavior for the integral conversions of the unsigned subtraction into
the 32 bit int (section 4.7 i think??).  On the other hand, it's
exactly what the 32 bit version decays into *for gcc*.

In addition, maintaining the (standard's defined) behavior for
string.compare (IOW, returning any value less-than, zero, or
greater-than), the 64 bit can be reduced to something like the
following (can be changed around a bit by switching the order of
comparisons...IOW the best code to represent this isn't necessarily
selected)

return (__n2 != __n1) ? (__n1 < __n2 ? -1 : 1) : 0;

        xorl    %eax, %eax
        cmpq    %rsi, %rdi
        je      .L24
        sbbl    %eax, %eax
        orl     $1, %eax
.L24:
        rep
        ret


But again, the 32 bit *suffers* because it emits very similar code
with an additional compare and branch in place of the original
subtract (and cmove).

Anyway, my $0.02 for to ponder...(and it seems I've spent quite a bit
of time on something that really doesn't seem to add much in the grand
scheme of things ;0))  The only code that would see much in the way of
speed up is code that does an AWFUL lot of string comparisons of
strings that are substrings of each other.  And I mean *a lot*.  And
to tell the truth I don't think those 4 or 5 extra compares are going
to be the true bottleneck in those cases...I would think the potential
for cache pollution would be worse.



More information about the Libstdc++ mailing list