For funcion: long mult (long x, long y) { return x * y; } the avr-gcc 4.3.0 produces 29 words of code (-Os option): mult: push r14 push r15 push r16 push r17 /* prologue: function */ /* frame size = 0 */ mov r14,r18 mov r15,r19 mov r16,r20 mov r17,r21 mov r18,r22 mov r19,r23 mov r20,r24 mov r21,r25 mov r25,r17 mov r24,r16 mov r23,r15 mov r22,r14 rcall __mulsi3 mov r18,r22 mov r19,r23 mov r20,r24 mov r21,r25 mov r23,r19 mov r24,r20 mov r25,r21 /* epilogue start */ pop r17 pop r16 pop r15 pop r14 ret The result of avr-gcc 4.1.2 is 2 words: mult: /* prologue: frame size=0 */ /* prologue end (size=0) */ rcall __mulsi3 /* epilogue: frame size=0 */ ret
I can confirms this regression. There appears to be something strange in commutation of operands before RTL is created which may well explain why it used to work. BThe default expander are creating calls to commutative functions that have the opposite argument order from normal function parameters. Functionally this does not matter, but it twists up the data flow. The argument order presented by the RTL is backwards from the ideal. This happens with both target AVR expander and the default expander (the original PR was using a default expander). Clearly this is avoidable. It is not a function of the original code - gcc is purposely ordering the operands. Regardless of operand ordering (x*y or y*x), the RTL is always backwards. The other factor is the lack of forward propagation in the RTL stages. If this was effective, RTL ordering would not matter. The limited propagation that is present avoids hard registers - so never connects the arguments with the library function. Also combine can't exploit the commutation of the operands.
Here is more info: Testcase: static long foo99(long b,long a) { return b * a; } long foo2(long b, long a) { return foo99(b,a); } Looking at RTL, the USE of the respective libcalls are reversed. That is the RTL generated for call to MULSI3 is reversed from a normal C function that has same arguments and calling conventions. (call_insn/u 9 8 10 920625-1.c:45 (set (reg:SI 22 r22) (call (mem:HI (symbol_ref:HI ("__mulsi3") [flags 0x41]) [0 S2 A8]) (const_int 0 [0x0]))) -1 (expr_list:REG_EH_REGION (const_int -1 [0xffffffff]) (nil)) (expr_list:REG_DEP_TRUE (use (reg:SI 18 r18)) (expr_list:REG_DEP_TRUE (use (reg:SI 22 r22)) (nil)))) (call_insn/u 9 8 10 920625-1.c:51 (set (reg:SI 22 r22) (call (mem:HI (symbol_ref:HI ("foo99") [flags 0x3] <function_decl 0x7fdcf030 foo99>) [0 S2 A8]) (const_int 0 [0x0]))) -1 (expr_list:REG_EH_REGION (const_int 0 [0x0]) (nil)) (expr_list:REG_DEP_TRUE (use (reg:SI 22 r22)) (expr_list:REG_DEP_TRUE (use (reg:SI 18 r18)) (nil))))
Hmm, we indeed cannot communicate to the backend (or register allocator) that function arguments of a libcall are communtative.
The problem is not commutation knowledge to the backend. First - the use notes were a red herring. Reversing them did not help. After much chasing thru call.c and optabs.c, it looks like neither creates nor correct the issue. But I can fix (hide) the issue by commutating the back end expander to get optimal code. (Of course that does not fix ALL binop libcalls!) So I tried a non-commutative operator (% or /) - and that was optimal, with no expander changes. So it would appear that the default "don't care" order presented to expand_binop() is wrong. Where it does critically matter (non-commutative functions), the order is ideal. The effect on chained arithmetic or higher modes such as DImode is horrendous, and may explain other noted problems with optimizations. FYI here is expander used to temporarily fix (hide) problem - NOTE operand numbering, relative to RTL order. (define_expand "mulsi3" [(set (reg:SI 18) (match_operand:SI 1 "register_operand" "r")) (set (reg:SI 22) (match_operand:SI 2 "register_operand" "r")) (parallel [(set (reg:SI 22) (mult:SI (reg:SI 22) (reg:SI 18))) (clobber (reg:HI 26)) (clobber (reg:HI 30))]) (set (match_operand:SI 0 "register_operand" "=&r") (reg:SI 22))] "AVR_HAVE_MUL" "")
gcc version 4.6.0 20100903 (experimental) (GCC) [Head] produces 2 words: mult: /* prologue: function */ /* frame size = 0 */ /* stack size = 0 */ .L__stack_usage = 0 rcall __mulsi3 /* epilogue start */ ret
Closed as fixed in 4.6.0.