This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Implementing normal algorithms using predicate versions


Paolo Carlini wrote:

Nathan Myers wrote:

Historical note... in the original STL, almost everything was marked
inline so that it could all be put in header files, without need for
all the bother and variation around .o files, object libraries, and linking. Probably the inlines Gaby cites are leftovers from that.


Stepanov thought that the number of different uses the tree-based types in any given program would be small enough that inlining wouldn't cost
much; also, that the compiler could make the choice to un-inline a function marked inline but "too big", but would not choose the reverse.


Thanks for the (interesting) note!

Which is your opinion about my point (i.e, some times, for various reasons,
-O2 is not usable and perhaps we don't want to penalize too much from the
performance point of view unoptimized build)?!? I'm certainly up for Gaby/
Benjamin/Chris proposal, but that issue worries me a little bit...


Paolo.


BTW, sorry, I've recently been posting as Chris Jefferson, Chris and caj. I didn't notice as I tend to disable names and just display e-mail address on my mail client ¬_¬

I tried experimenting with some functions are various levels of optimisation. Unsuprisingly at -O0 using a predicate causes quite a speed hit as While at -O2 the output looks about identical, at -O1 while it will inline the predicate, the optimiser doesn't seem able to (doesn't want to?) totally "fold" the inlined function. For example the function __unguarded_linear_insert in normal sort at -O1 is:

void std::__unguarded_linear_insert(_RandomAccessIterator, _Tp) [with _RandomAccessIterator = int*, _Tp = int] (__last, __val)
{
int * __next;
int D.12360;


<bb 0>:
__next = __last;
__next = __next - 4B;
goto <bb 2> (<L1>);

<L0>:;
D.12360 = *__next;
*__last = D.12360;
__last = __next;
__next = __next - 4B;

<L1>:;
D.12360 = *__next;
if (D.12360 > __val) goto <L0>; else goto <L2>;

<L2>:;
*__last = __val;
return;

}


whereas at -O1 with a comparison object (which just calls operator <) is:


void std::__unguarded_linear_insert(_RandomAccessIterator, _Tp, _Compare) [with _RandomAccessIterator = int*, _Tp = int, _Compare = compare] (__last, __val, __comp)
{
const int D.12624;
const int D.12623;
int D.12622;
const int & a;
const int & b;
int * __next;
int __val.68;
int D.12453;
bool D.12451;
bool retval.67;
int D.12621;


<bb 0>:
__next = __last;
__next = __next - 4B;
goto <bb 2> (<L1>);

<L0>:;
D.12453 = *__next;
*__last = D.12453;
__last = __next;
__next = __next - 4B;

<L1>:;
a = &__val;
b = __next;
D.12623 = *a;
D.12624 = *b;
D.12622 = D.12623 < D.12624;
D.12621 = D.12622;
D.12451 = (bool) D.12621;
retval.67 = D.12451;
if (retval.67) goto <L0>; else goto <L3>;

<L3>:;
__val.68 = __val;
*__last = __val.68;
return;

}


However, it's is quite as bad as that when we turn into (for example) x86 assembler. The first version using a comparison object becomes:
LFB684:
pushl %ebp
.LCFI0:
movl %esp, %ebp
.LCFI1:
pushl %ebx
.LCFI2:
movl 8(%ebp), %ecx
movl 12(%ebp), %ebx
leal -4(%ecx), %edx
cmpl -4(%ecx), %ebx
jge .L2
.L7:
movl (%edx), %eax
movl %eax, (%ecx)
leal -4(%edx), %eax
cmpl -4(%edx), %ebx
jge .L4
movl %edx, %ecx
movl %eax, %edx
jmp .L7
.L4:
movl %edx, %ecx
.L2:
movl %ebx, (%ecx)
popl %ebx
popl %ebp
ret


whereas the version without becomes:

.LFB684:
  pushl    %ebp
.LCFI0:
  movl    %esp, %ebp
.LCFI1:
  pushl    %esi
.LCFI2:
  pushl    %ebx
.LCFI3:
  movl    8(%ebp), %ecx
  movl    12(%ebp), %esi
  leal    -4(%ecx), %edx
  movl    -4(%ecx), %eax
  cmpl    %eax, %esi
  jge    .L2
.L7:
  movl    %eax, (%ecx)
  leal    -4(%edx), %ebx
  movl    -4(%edx), %eax
  cmpl    %esi, %eax
  jle    .L4
  movl    %edx, %ecx
  movl    %ebx, %edx
  jmp    .L7
.L4:
  movl    %edx, %ecx
.L2:
  movl    %esi, (%ecx)
  popl    %ebx
  popl    %esi
  popl    %ebp
  ret

I'm not an expert on x86 assembler, so I'm not positive which of these is better (the one with comparison operator seems to use one less register but a few more instructions). However I'm willing to believe that the difference between these is just "line noise" in the optimiser.

I've tried this on the entire of sort and the story appears to be that -O1 is sufficent to remove any inefficency. -O0 gets hurt quite badly however.

Chris




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]