optimization/8967: [3.4 regression] Making class data members `const' pessimizes code
bangerth@dealii.org
bangerth@dealii.org
Fri Dec 20 20:20:00 GMT 2002
Old Synopsis: Making class data members `const' pessimizes code
New Synopsis: [3.4 regression] Making class data members `const' pessimizes code
State-Changed-From-To: open->analyzed
State-Changed-By: bangerth
State-Changed-When: Fri Dec 20 20:20:38 2002
State-Changed-Why:
Confirmed. For reference: this is what the second version
of the function ("Mutable") results in:
------------------
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl 8(%ebp), %eax
movl $9, (%eax)
movl $11, 4(%eax)
movl %ebp, %esp
popl %ebp
ret $4
----------------------(This is not really optimal, since it never actually
uses the stack space it allocates. All the fiddling with
the stack pointer is plain unnecessary. But it can be
worse, see below:)
The other function ("Const") yields different results
with different compilers. Here's 3.2.2pre:
---------------------
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $1, -8(%ebp)
movl 8(%ebp), %eax
movl $8, -16(%ebp)
movl $2, -4(%ebp)
movl $9, -12(%ebp)
movl $9, (%eax)
movl $11, 4(%eax)
movl %ebp, %esp
popl %ebp
ret $4
----------------
Basically, the compiler _does_ the right job in computing
the result (9,11), which are the last two stores. All the
other stores are dead!
Next, 3.3 right before the BIB merge, i.e. what will likely
be on the 3.3 branch: the generated code is the same as for
3.2.2.
Finally, mainline as of yesterday, i.e. 3.4pre:
-------------
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $9, -24(%ebp)
movl 8(%ebp), %eax
movl -24(%ebp), %edx
movl $11, -20(%ebp)
movl -20(%ebp), %ecx
movl %edx, (%eax)
movl $1, -8(%ebp)
movl %ecx, 4(%eax)
movl $2, -4(%ebp)
movl $8, -16(%ebp)
movl $9, -12(%ebp)
leave
ret $4
---------------
This is really disappointing, since most of the moves are
actually dead, or could be optimized away otherwise!
This is _worse_ than
what we previously had. I rate this as a regression, and
will put it in respective state. Note, however, that it
is a regression with respect to an already regrettable
state, and it would be nice if not only the previous state
would be restored, but the two functions really be made the
same! There is no reason why adding a more restrictive
cv-qualifier like "const" should pessimize code!
I made a couple of more experiments, but things only get
depressing: with present mainline (3.4pre), compiling the
second (better) function with -O3 -fomit-frame-pointer
yields the following code:
---------------------
subl $28, %esp
.LCFI1:
movl 32(%esp), %eax
movl $9, (%esp)
movl (%esp), %edx
movl $11, 4(%esp)
movl 4(%esp), %ecx
movl %edx, (%eax)
movl %ecx, 4(%eax)
addl $28, %esp
ret $4
----------------------
Reordering things a littly tells us that the (correctly
computed) results is first loaded into a temp stack slot,
then reloaded into ecx/edx, and finally pushed back into
the return value slot of the stack. Amazing!
The other function (the "Const" one) yields the same, but
in addition to the troubles of the above one, we have four
more dead stores of the intermediate values 1, 2, 8, 9
into temp stack slots, that are immediately afterwards
dead.
A last final datapoint: here's what icc7 with -O3 does:
-------------------
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl 8(%ebp), %eax
movl $9, -8(%ebp)
movl $11, -4(%ebp)
lea -8(%ebp), %ecx
movl (%ecx), %edx
movl %edx, (%eax)
movl 4(%ecx), %edx
movl %edx, 4(%eax)
movl %ebp, %esp
popl %ebp
ret $4
----------------------
Well, let's say this is at least not better than some of
the other stuff above...
Regards
Wolfgang
http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=8967
More information about the Gcc-prs
mailing list