[v3] libstdc++/44413 (for ext/vstring)

Doug Semler dougsemler@gmail.com
Thu Jun 10 14:36:00 GMT 2010


OK, 2 questions for 64 bit --- gcc 4.5.1 branch by the way so these
questions may be invalid for 4.6 now.

First --- the current code generates:

        .cfi_startproc
        subq    %rsi, %rdi
        movl    $2147483647, %eax
        cmpq    $2147483647, %rdi
        jle     .L6
        rep
        ret
        .p2align 4,,10
        .p2align 3
.L6:
        cmpq    $-2147483648, %rdi
        movl    $-2147483648, %eax
        cmovge  %edi, %eax
        ret
        .cfi_endproc

The question is this, is the following better (note the lack of the
branch but the code path always travels through all instructions.
This is about all you can do on 64 bit modulo some performance neutral
arrangements I think...

        .cfi_startproc
        subq    %rsi, %rdi
        movq    $-2147483648, %rax
        cmpq    $-2147483648, %rdi
        cmovl   %rax, %rdi
        movl    $2147483647, %eax
        cmpq    $2147483647, %rdi
        cmovle  %rdi, %rax
        ret
        .cfi_endproc

The first is the current code, the second is the result of:
{
  const ptrdiff_t __d = __n1 - __n2;
  const ptrdiff_t __themin = __gnu_cxx::__numeric_traits<int>::__min;
  const ptrdiff_t __themax = __gnu_cxx::__numeric_traits<int>::__max;

  return std::max(std::min(__d, __themax),__themin);
}

Second question...*if* the second code block is better then is this a
missed optimization overall in gcc (I may be under the mistaken
impression that the branch could be bad --- it seems to me to be
fairly unpredictable).

Note the second code block results in exactly the same assembly output
on 32 bit as the first current (original) code block.



More information about the Libstdc++ mailing list