optimization/8967: [3.4 regression] Making class data members `const' pessimizes code

bangerth@dealii.org bangerth@dealii.org
Fri Dec 20 20:20:00 GMT 2002


Old Synopsis: Making class data members `const' pessimizes code
New Synopsis: [3.4 regression] Making class data members `const' pessimizes code

State-Changed-From-To: open->analyzed
State-Changed-By: bangerth
State-Changed-When: Fri Dec 20 20:20:38 2002
State-Changed-Why:
    Confirmed. For reference: this is what the second version
    of the function ("Mutable") results in:
    ------------------
    	pushl	%ebp
    	movl	%esp, %ebp
    	subl	$16, %esp
    	movl	8(%ebp), %eax
    	movl	$9, (%eax)
    	movl	$11, 4(%eax)
    	movl	%ebp, %esp
    	popl	%ebp
    	ret	$4
    ----------------------(This is not really optimal, since it never actually
    uses the stack space it allocates. All the fiddling with
    the stack pointer is plain unnecessary. But it can be
    worse, see below:)
    
    The other function ("Const") yields different results
    with different compilers. Here's 3.2.2pre:
    ---------------------
    	pushl	%ebp
    	movl	%esp, %ebp
    	subl	$16, %esp
    	movl	$1, -8(%ebp)
    	movl	8(%ebp), %eax
    	movl	$8, -16(%ebp)
    	movl	$2, -4(%ebp)
    	movl	$9, -12(%ebp)
    	movl	$9, (%eax)
    	movl	$11, 4(%eax)
    	movl	%ebp, %esp
    	popl	%ebp
    	ret	$4
    ----------------
    Basically, the compiler _does_ the right job in computing
    the result (9,11), which are the last two stores. All the
    other stores are dead!
    
    Next, 3.3 right before the BIB merge, i.e. what will likely
    be on the 3.3 branch: the generated code is the same as for
    3.2.2.
    
    Finally, mainline as of yesterday, i.e. 3.4pre:
    -------------
    	pushl	%ebp
    	movl	%esp, %ebp
    	subl	$24, %esp
    	movl	$9, -24(%ebp)
    	movl	8(%ebp), %eax
    	movl	-24(%ebp), %edx
    	movl	$11, -20(%ebp)
    	movl	-20(%ebp), %ecx
    	movl	%edx, (%eax)
    	movl	$1, -8(%ebp)
    	movl	%ecx, 4(%eax)
    	movl	$2, -4(%ebp)
    	movl	$8, -16(%ebp)
    	movl	$9, -12(%ebp)
    	leave
    	ret	$4
    ---------------
    This is really disappointing, since most of the moves are
    actually dead, or could be optimized away otherwise!
    
    This is _worse_ than
    what we previously had. I rate this as a regression, and
    will put it in respective state. Note, however, that it
    is a regression with respect to an already regrettable
    state, and it would be nice if not only the previous state
    would be restored, but the two functions really be made the
    same! There is no reason why adding a more restrictive
    cv-qualifier like "const" should pessimize code!
    
    
    I made a couple of more experiments, but things only get
    depressing: with present mainline (3.4pre), compiling the
    second (better) function with -O3 -fomit-frame-pointer
    yields the following code:
    ---------------------
    	subl	$28, %esp
    .LCFI1:
    	movl	32(%esp), %eax
    	movl	$9, (%esp)
    	movl	(%esp), %edx
    	movl	$11, 4(%esp)
    	movl	4(%esp), %ecx
    	movl	%edx, (%eax)
    	movl	%ecx, 4(%eax)
    	addl	$28, %esp
    	ret	$4
    ----------------------
    Reordering things a littly tells us that the (correctly
    computed) results is first loaded into a temp stack slot,
    then reloaded into ecx/edx, and finally pushed back into
    the return value slot of the stack. Amazing!
    
    The other function (the "Const" one) yields the same, but
    in addition to the troubles of the above one, we have four
    more dead stores of the intermediate values 1, 2, 8, 9 
    into temp stack slots, that are immediately afterwards
    dead.
    
    A last final datapoint: here's what icc7 with -O3 does:
    -------------------
            pushl     %ebp
            movl      %esp, %ebp
            subl      $8, %esp 
            movl      8(%ebp), %eax
            movl      $9, -8(%ebp) 
            movl      $11, -4(%ebp) 
            lea       -8(%ebp), %ecx
            movl      (%ecx), %edx  
            movl      %edx, (%eax)  
            movl      4(%ecx), %edx 
            movl      %edx, 4(%eax) 
            movl      %ebp, %esp    
            popl      %ebp          
            ret       $4            
    ----------------------
    Well, let's say this is at least not better than some of
    the other stuff above...
    
    Regards
      Wolfgang

http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=8967



More information about the Gcc-prs mailing list