[v3] libstdc++/44413 (for ext/vstring)
Doug Semler
dougsemler@gmail.com
Thu Jun 10 14:36:00 GMT 2010
OK, 2 questions for 64 bit --- gcc 4.5.1 branch by the way so these
questions may be invalid for 4.6 now.
First --- the current code generates:
.cfi_startproc
subq %rsi, %rdi
movl $2147483647, %eax
cmpq $2147483647, %rdi
jle .L6
rep
ret
.p2align 4,,10
.p2align 3
.L6:
cmpq $-2147483648, %rdi
movl $-2147483648, %eax
cmovge %edi, %eax
ret
.cfi_endproc
The question is this, is the following better (note the lack of the
branch but the code path always travels through all instructions.
This is about all you can do on 64 bit modulo some performance neutral
arrangements I think...
.cfi_startproc
subq %rsi, %rdi
movq $-2147483648, %rax
cmpq $-2147483648, %rdi
cmovl %rax, %rdi
movl $2147483647, %eax
cmpq $2147483647, %rdi
cmovle %rdi, %rax
ret
.cfi_endproc
The first is the current code, the second is the result of:
{
const ptrdiff_t __d = __n1 - __n2;
const ptrdiff_t __themin = __gnu_cxx::__numeric_traits<int>::__min;
const ptrdiff_t __themax = __gnu_cxx::__numeric_traits<int>::__max;
return std::max(std::min(__d, __themax),__themin);
}
Second question...*if* the second code block is better then is this a
missed optimization overall in gcc (I may be under the mistaken
impression that the branch could be bad --- it seems to me to be
fairly unpredictable).
Note the second code block results in exactly the same assembly output
on 32 bit as the first current (original) code block.
More information about the Libstdc++
mailing list