GCC 4.x tree optimization decides to put int values into long long int temporaries. When RTL expansion comes around, the expander sees only a DImode multiply and so generates three SImode multiplies to deal with the problem. GCC 3.x sees that the source values are SImode and uses mulsidi3 to generate 32x32->64 multiplies, which are much more efficient. It also picks up the accumulation. (using -O3 for all compilations) GCC 3.4 has an 84-byte stack frame, and a body of 372 instructions. GCC 4.1 has a 1416-byte stack frame, and a body of 1668 instructions. GCC 4.2 has a 1320-byte stack frame, and a body of 1565 instructions.
Created attachment 12351 [details] Test case showing mulsidi problem Multimedia processing It's not self-running, but you can see plainly from the assembly output. If you want a self-running test I can modify it a bit.
Forgot to say: I'm seeing this on an ARM cross-compilation.
I can see this with trunk as well as 4.3 branch today.
The issue is that TER does not build large trees for the multiplications as the factors are used multiple times (and come from memory). Thus the expander does not see the widened multiplication and appearantly combine / lower-subreg is not able to optimize this. I think with expand-from-SSA we could enable this optimization during expand. Simplified testcase: long long foo (int i, int j) { return (long long)i * (long long)j * (long long)i; } because we CSE (long long)i we don't optimize this case. Disabling tree-level CSE re-enables the optimizations: -fno-tree-fre -fno-tree-pre -fno-tree-dominator-opts
CC'ing micha.
Closing 4.2 branch.
GCC 4.3.4 is being released, adjusting target milestone.
Even if the problems in expand are fixed, reassoc is still going to cause problems with the original testcase. From the dse1 dump: D.2474_14 = (long long int) vLo_11; D.2475_15 = (long long int) c1_6; D.2476_16 = D.2474_14 * D.2475_15; D.2477_19 = (long long int) c2_8; D.2478_20 = D.2474_14 * D.2477_19; From the reassoc1 dump right after: D.2474_14 = (long long int) vLo_11; D.2475_15 = (long long int) c1_6; D.2477_19 = (long long int) c2_8; D.2495_16 = D.2477_19 + D.2475_15; D.2495_20 = D.2495_16 * D.2474_14; So we've traded a multiplication for an addition, but we've also made it difficult to see that we could have used mulsidi3.
Note that even for (long long)i * (long long)j * (long long)i we can only use one mulsidi3, so promoting parts of the multiplications to additions should be still beneficial. I suppose we should detect widening multiply on the tree level before re-association where it is beneficial (if the target doesn't have a native wide multiplication).
*** Bug 42498 has been marked as a duplicate of this bug. ***
Subject: Bug 29274 Author: bernds Date: Thu Apr 22 09:30:27 2010 New Revision: 158633 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158633 Log: gcc/ PR middle-end/29274 * optabs.h (expand_widening_mult): Declare. * tree-pass.h (pass_optimize_widening_mul): Declare. * tree-ssa-math-opts.c (execute_optimize_widening_mul, gate_optimize_widening_mul): New static functions. (pass_optimize_widening_mul): New. * expr.c (expand_expr_real_2) <case WIDEN_MULT_EXPR>: New case. <case MULT_EXPR>: Remove support for widening multiplies. * tree.def (WIDEN_MULT_EXPR): Tweak comment. * cfgexpand.c (expand_debug_expr) <case WIDEN_MULT_EXPR>: Use simplify_gen_unary rather than directly building extensions. * tree-cfg.c (verify_gimple_assign_binary): Add tests for WIDEN_MULT_EXPR. * expmed.c (expand_widening_mult): New function. * passes.c (init_optimization_passes): Add pass_optimize_widening_mul. gcc/testsuite/ PR middle-end/29274 * gcc.target/i386/wmul-1.c: New test. * gcc.target/i386/wmul-2.c: New test. * gcc.target/bfin/wmul-1.c: New test. * gcc.target/bfin/wmul-2.c: New test. Added: trunk/gcc/testsuite/gcc.target/bfin/wmul-1.c trunk/gcc/testsuite/gcc.target/bfin/wmul-2.c trunk/gcc/testsuite/gcc.target/i386/wmul-1.c trunk/gcc/testsuite/gcc.target/i386/wmul-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/expmed.c trunk/gcc/expr.c trunk/gcc/passes.c trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-cfg.c trunk/gcc/tree-pass.h trunk/gcc/tree-ssa-math-opts.c
Fixed for 4.6.
Subject: Bug 29274 Author: bernds Date: Thu Apr 22 11:25:44 2010 New Revision: 158642 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158642 Log: PR middle-end/29274 * gcc.target/arm/wmul-1.c: New test. * gcc.target/arm/wmul-2.c: New test. Added: trunk/gcc/testsuite/gcc.target/arm/wmul-1.c trunk/gcc/testsuite/gcc.target/arm/wmul-2.c Modified: trunk/gcc/testsuite/ChangeLog
GCC 4.3.5 is being released, adjusting target milestone.
4.3 branch is being closed, moving to 4.4.7 target.
Fixed in trunk, wontfix for the old trees, I'd think.