Bug 35507

Summary: [avr] 4.3.0: size of small funcion increases from 2 to 29 words
Product: gcc Reporter: Dmitry K. <dmixm>
Component: targetAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: abnikant.singh, BlanchardJ, eric.weddington, gcc-bugs
Priority: P3 Keywords: missed-optimization
Version: 4.3.0   
Target Milestone: 4.6.0   
Host: Target: avr-*-*
Build: Known to work:
Known to fail: Last reconfirmed: 2008-03-23 04:12:58

Description Dmitry K. 2008-03-08 09:12:39 UTC
For funcion:

  long mult (long x, long y)
  {
    return x * y;
  }

the avr-gcc 4.3.0 produces 29 words of code (-Os option):

  mult:
        push r14
        push r15
        push r16
        push r17
  /* prologue: function */
  /* frame size = 0 */
        mov r14,r18
        mov r15,r19
        mov r16,r20
        mov r17,r21
        mov r18,r22
        mov r19,r23
        mov r20,r24
        mov r21,r25
        mov r25,r17
        mov r24,r16
        mov r23,r15
        mov r22,r14
        rcall __mulsi3
        mov r18,r22
        mov r19,r23
        mov r20,r24
        mov r21,r25
        mov r23,r19
        mov r24,r20
        mov r25,r21
  /* epilogue start */
        pop r17
        pop r16
        pop r15
        pop r14
        ret

The result of avr-gcc 4.1.2 is 2 words:

  mult:
  /* prologue: frame size=0 */
  /* prologue end (size=0) */
        rcall __mulsi3
  /* epilogue: frame size=0 */
        ret
Comment 1 Andy Hutchinson 2008-03-09 04:35:02 UTC
I can confirms this regression.

There appears to be something strange in commutation of operands before RTL is created which may well explain why it used to work.

BThe  default expander are creating calls to commutative functions that have the opposite argument order from normal function parameters. Functionally this does not matter, but it twists up the data flow.

The argument order presented by the RTL is backwards from the ideal. This happens with both target AVR expander and the default expander (the original PR was using a default expander). Clearly this is avoidable.

It is not a function of the original code - gcc is purposely ordering the operands. Regardless of operand ordering (x*y or y*x), the RTL is always backwards. 


The other factor is the lack of forward propagation in the RTL stages. If this was effective, RTL ordering would not matter. The limited propagation that is present avoids hard registers - so never connects the arguments with the library function. Also combine can't exploit the commutation of the operands.
Comment 2 Andy Hutchinson 2008-03-09 12:23:49 UTC
Here is more info:

Testcase:

static long foo99(long b,long a)
{
	return b * a;
}

long foo2(long b, long a)
{
	return foo99(b,a);	
}

Looking at RTL, the USE of the respective libcalls are reversed. That is the RTL generated for call to MULSI3 is reversed from a normal C function that has same arguments and calling conventions.

(call_insn/u 9 8 10 920625-1.c:45 (set (reg:SI 22 r22)
        (call (mem:HI (symbol_ref:HI ("__mulsi3") [flags 0x41]) [0 S2 A8])
            (const_int 0 [0x0]))) -1 (expr_list:REG_EH_REGION (const_int -1 [0xffffffff])
        (nil))
    (expr_list:REG_DEP_TRUE (use (reg:SI 18 r18))
        (expr_list:REG_DEP_TRUE (use (reg:SI 22 r22))
            (nil))))

(call_insn/u 9 8 10 920625-1.c:51 (set (reg:SI 22 r22)
        (call (mem:HI (symbol_ref:HI ("foo99") [flags 0x3] <function_decl 0x7fdcf030 foo99>) [0 S2 A8])
            (const_int 0 [0x0]))) -1 (expr_list:REG_EH_REGION (const_int 0 [0x0])
        (nil))
    (expr_list:REG_DEP_TRUE (use (reg:SI 22 r22))
        (expr_list:REG_DEP_TRUE (use (reg:SI 18 r18))
            (nil))))



Comment 3 Richard Biener 2008-03-09 13:33:01 UTC
Hmm, we indeed cannot communicate to the backend (or register allocator) that
function arguments of a libcall are communtative.
Comment 4 Andy Hutchinson 2008-03-09 18:36:38 UTC
The problem is not commutation knowledge to the backend.

First - the use notes were a red herring. Reversing them did not help.

After much chasing thru call.c and optabs.c, it looks like neither creates nor correct the issue.

But I can fix (hide) the issue by commutating the back end expander to get  optimal code. (Of course that does not fix ALL binop libcalls!)

So I tried a non-commutative operator (% or /) - and that was optimal, with no expander changes.

So it would appear that the default "don't care" order presented to expand_binop() is wrong. Where it does critically matter (non-commutative functions), the order is ideal.

The effect on chained arithmetic or higher modes such as DImode is horrendous, and may explain other noted problems with optimizations.


FYI here is expander used to temporarily fix (hide) problem - NOTE operand numbering, relative to RTL order. 

(define_expand "mulsi3"
  [(set (reg:SI 18) (match_operand:SI 1 "register_operand" "r"))
 (set (reg:SI 22) (match_operand:SI 2 "register_operand" "r"))
   
  (parallel [(set (reg:SI 22) (mult:SI (reg:SI 22) (reg:SI 18)))
	      (clobber (reg:HI 26))
	      (clobber (reg:HI 30))])
   (set (match_operand:SI 0 "register_operand" "=&r") (reg:SI 22))]
  "AVR_HAVE_MUL"
  "")










Comment 5 abnikant 2010-09-03 06:45:50 UTC
gcc version 4.6.0 20100903 (experimental) (GCC) [Head] produces 2 words:

mult:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
        rcall __mulsi3
/* epilogue start */
        ret
        
Comment 6 Eric Weddington 2010-09-20 03:59:01 UTC
Closed as fixed in 4.6.0.