This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: Implementing normal algorithms using predicate versions
Paolo Carlini wrote:
Nathan Myers wrote:
Historical note... in the original STL, almost everything was marked
inline so that it could all be put in header files, without need for
all the bother and variation around .o files, object libraries, and
linking. Probably the inlines Gaby cites are leftovers from that.
Stepanov thought that the number of different uses the tree-based
types in any given program would be small enough that inlining
wouldn't cost
much; also, that the compiler could make the choice to un-inline a
function marked inline but "too big", but would not choose the reverse.
Thanks for the (interesting) note!
Which is your opinion about my point (i.e, some times, for various
reasons,
-O2 is not usable and perhaps we don't want to penalize too much from the
performance point of view unoptimized build)?!? I'm certainly up for
Gaby/
Benjamin/Chris proposal, but that issue worries me a little bit...
Paolo.
BTW, sorry, I've recently been posting as Chris Jefferson, Chris and
caj. I didn't notice as I tend to disable names and just display e-mail
address on my mail client ¬_¬
I tried experimenting with some functions are various levels of
optimisation. Unsuprisingly at -O0 using a predicate causes quite a
speed hit as While at -O2 the output looks about identical, at -O1 while
it will inline the predicate, the optimiser doesn't seem able to
(doesn't want to?) totally "fold" the inlined function. For example the
function __unguarded_linear_insert in normal sort at -O1 is:
void std::__unguarded_linear_insert(_RandomAccessIterator, _Tp) [with
_RandomAccessIterator = int*, _Tp = int] (__last, __val)
{
int * __next;
int D.12360;
<bb 0>:
__next = __last;
__next = __next - 4B;
goto <bb 2> (<L1>);
<L0>:;
D.12360 = *__next;
*__last = D.12360;
__last = __next;
__next = __next - 4B;
<L1>:;
D.12360 = *__next;
if (D.12360 > __val) goto <L0>; else goto <L2>;
<L2>:;
*__last = __val;
return;
}
whereas at -O1 with a comparison object (which just calls operator <) is:
void std::__unguarded_linear_insert(_RandomAccessIterator, _Tp,
_Compare) [with _RandomAccessIterator = int*, _Tp = int, _Compare =
compare] (__last, __val, __comp)
{
const int D.12624;
const int D.12623;
int D.12622;
const int & a;
const int & b;
int * __next;
int __val.68;
int D.12453;
bool D.12451;
bool retval.67;
int D.12621;
<bb 0>:
__next = __last;
__next = __next - 4B;
goto <bb 2> (<L1>);
<L0>:;
D.12453 = *__next;
*__last = D.12453;
__last = __next;
__next = __next - 4B;
<L1>:;
a = &__val;
b = __next;
D.12623 = *a;
D.12624 = *b;
D.12622 = D.12623 < D.12624;
D.12621 = D.12622;
D.12451 = (bool) D.12621;
retval.67 = D.12451;
if (retval.67) goto <L0>; else goto <L3>;
<L3>:;
__val.68 = __val;
*__last = __val.68;
return;
}
However, it's is quite as bad as that when we turn into (for example)
x86 assembler. The first version using a comparison object becomes:
LFB684:
pushl %ebp
.LCFI0:
movl %esp, %ebp
.LCFI1:
pushl %ebx
.LCFI2:
movl 8(%ebp), %ecx
movl 12(%ebp), %ebx
leal -4(%ecx), %edx
cmpl -4(%ecx), %ebx
jge .L2
.L7:
movl (%edx), %eax
movl %eax, (%ecx)
leal -4(%edx), %eax
cmpl -4(%edx), %ebx
jge .L4
movl %edx, %ecx
movl %eax, %edx
jmp .L7
.L4:
movl %edx, %ecx
.L2:
movl %ebx, (%ecx)
popl %ebx
popl %ebp
ret
whereas the version without becomes:
.LFB684:
pushl %ebp
.LCFI0:
movl %esp, %ebp
.LCFI1:
pushl %esi
.LCFI2:
pushl %ebx
.LCFI3:
movl 8(%ebp), %ecx
movl 12(%ebp), %esi
leal -4(%ecx), %edx
movl -4(%ecx), %eax
cmpl %eax, %esi
jge .L2
.L7:
movl %eax, (%ecx)
leal -4(%edx), %ebx
movl -4(%edx), %eax
cmpl %esi, %eax
jle .L4
movl %edx, %ecx
movl %ebx, %edx
jmp .L7
.L4:
movl %edx, %ecx
.L2:
movl %esi, (%ecx)
popl %ebx
popl %esi
popl %ebp
ret
I'm not an expert on x86 assembler, so I'm not positive which of these
is better (the one with comparison operator seems to use one less
register but a few more instructions). However I'm willing to believe
that the difference between these is just "line noise" in the optimiser.
I've tried this on the entire of sort and the story appears to be that
-O1 is sufficent to remove any inefficency. -O0 gets hurt quite badly
however.
Chris