Bug 29274 - [4.4/4.5 Regression] not using mulsidi3
Summary: [4.4/4.5 Regression] not using mulsidi3
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.1.1
: P2 normal
Target Milestone: 4.6.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 42498 (view as bug list)
Depends on:
Blocks: 16996
  Show dependency treegraph
 
Reported: 2006-09-28 16:42 UTC by Erich Plondke
Modified: 2011-06-28 11:23 UTC (History)
10 users (show)

See Also:
Host: x86_64-suse-linux
Target: arm-unkown-elf, i?86-*-*
Build:
Known to work: 3.4.2, 4.6.0
Known to fail: 4.1.1, 4.2.0
Last reconfirmed: 2009-03-27 17:12:32


Attachments
Test case showing mulsidi problem (349 bytes, text/plain)
2006-09-28 16:45 UTC, Erich Plondke
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Erich Plondke 2006-09-28 16:42:55 UTC
GCC 4.x tree optimization decides to put int values into long long int temporaries.  When RTL expansion comes around, the expander sees only a DImode multiply and so generates three SImode multiplies to deal with the problem.

GCC 3.x sees that the source values are SImode and uses mulsidi3 to generate 32x32->64 multiplies, which are much more efficient.  It also picks up the accumulation.

(using -O3 for all compilations)

GCC 3.4 has an 84-byte stack frame, and a body of 372 instructions.
GCC 4.1 has a 1416-byte stack frame, and a body of 1668 instructions.
GCC 4.2 has a 1320-byte stack frame, and a body of 1565 instructions.
Comment 1 Erich Plondke 2006-09-28 16:45:53 UTC
Created attachment 12351 [details]
Test case showing mulsidi problem

Multimedia processing

It's not self-running, but you can see plainly from the assembly output.

If you want a self-running test I can modify it a bit.
Comment 2 Erich Plondke 2006-09-28 16:46:42 UTC
Forgot to say: I'm seeing this on an ARM cross-compilation.
Comment 3 Ramana Radhakrishnan 2009-03-27 17:12:32 UTC
I can see this with trunk as well as 4.3 branch today.
 
Comment 4 Richard Biener 2009-03-27 17:26:12 UTC
The issue is that TER does not build large trees for the multiplications as
the factors are used multiple times (and come from memory).  Thus the expander
does not see the widened multiplication and appearantly combine / lower-subreg
is not able to optimize this.

I think with expand-from-SSA we could enable this optimization during expand.

Simplified testcase:

long long foo (int i, int j)
{
   return (long long)i * (long long)j * (long long)i;
}

because we CSE (long long)i we don't optimize this case.  Disabling tree-level
CSE re-enables the optimizations: -fno-tree-fre -fno-tree-pre 
-fno-tree-dominator-opts
Comment 5 Richard Biener 2009-03-27 17:27:35 UTC
CC'ing micha.
Comment 6 Joseph S. Myers 2009-03-31 19:46:44 UTC
Closing 4.2 branch.
Comment 7 Richard Biener 2009-08-04 12:27:59 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 8 Nathan Froyd 2009-08-26 15:50:21 UTC
Even if the problems in expand are fixed, reassoc is still going to cause problems with the original testcase.  From the dse1 dump:

  D.2474_14 = (long long int) vLo_11;
  D.2475_15 = (long long int) c1_6;
  D.2476_16 = D.2474_14 * D.2475_15;
  D.2477_19 = (long long int) c2_8;
  D.2478_20 = D.2474_14 * D.2477_19;

From the reassoc1 dump right after:

  D.2474_14 = (long long int) vLo_11;
  D.2475_15 = (long long int) c1_6;
  D.2477_19 = (long long int) c2_8;
  D.2495_16 = D.2477_19 + D.2475_15;
  D.2495_20 = D.2495_16 * D.2474_14;

So we've traded a multiplication for an addition, but we've also made it difficult to see that we could have used mulsidi3.

Comment 9 Richard Biener 2009-08-26 18:35:33 UTC
Note that even for

  (long long)i * (long long)j * (long long)i

we can only use one mulsidi3, so promoting parts of the multiplications to
additions should be still beneficial.

I suppose we should detect widening multiply on the tree level before
re-association where it is beneficial (if the target doesn't have a native
wide multiplication).
Comment 10 Ramana Radhakrishnan 2009-12-29 23:21:40 UTC
*** Bug 42498 has been marked as a duplicate of this bug. ***
Comment 11 Bernd Schmidt 2010-04-22 09:31:25 UTC
Subject: Bug 29274

Author: bernds
Date: Thu Apr 22 09:30:27 2010
New Revision: 158633

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158633
Log:
gcc/
	PR middle-end/29274
	* optabs.h (expand_widening_mult): Declare.
	* tree-pass.h (pass_optimize_widening_mul): Declare.
	* tree-ssa-math-opts.c (execute_optimize_widening_mul,
	gate_optimize_widening_mul): New static functions.
	(pass_optimize_widening_mul): New.
	* expr.c (expand_expr_real_2) <case WIDEN_MULT_EXPR>: New
	case.
	<case MULT_EXPR>: Remove support for widening multiplies.
	* tree.def (WIDEN_MULT_EXPR): Tweak comment.
	* cfgexpand.c (expand_debug_expr) <case WIDEN_MULT_EXPR>: Use
	simplify_gen_unary rather than directly building extensions.
	* tree-cfg.c (verify_gimple_assign_binary): Add tests for
	WIDEN_MULT_EXPR.
	* expmed.c (expand_widening_mult): New function.
	* passes.c (init_optimization_passes): Add pass_optimize_widening_mul.

gcc/testsuite/
	PR middle-end/29274
	* gcc.target/i386/wmul-1.c: New test.
	* gcc.target/i386/wmul-2.c: New test.
	* gcc.target/bfin/wmul-1.c: New test.
	* gcc.target/bfin/wmul-2.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/bfin/wmul-1.c
    trunk/gcc/testsuite/gcc.target/bfin/wmul-2.c
    trunk/gcc/testsuite/gcc.target/i386/wmul-1.c
    trunk/gcc/testsuite/gcc.target/i386/wmul-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c
    trunk/gcc/expmed.c
    trunk/gcc/expr.c
    trunk/gcc/passes.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-cfg.c
    trunk/gcc/tree-pass.h
    trunk/gcc/tree-ssa-math-opts.c

Comment 12 Richard Biener 2010-04-22 10:11:52 UTC
Fixed for 4.6.
Comment 13 Bernd Schmidt 2010-04-22 11:26:29 UTC
Subject: Bug 29274

Author: bernds
Date: Thu Apr 22 11:25:44 2010
New Revision: 158642

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158642
Log:
	PR middle-end/29274
	* gcc.target/arm/wmul-1.c: New test.
	* gcc.target/arm/wmul-2.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/arm/wmul-1.c
    trunk/gcc/testsuite/gcc.target/arm/wmul-2.c
Modified:
    trunk/gcc/testsuite/ChangeLog

Comment 14 Richard Biener 2010-05-22 18:11:19 UTC
GCC 4.3.5 is being released, adjusting target milestone.
Comment 15 Richard Biener 2011-06-27 12:11:59 UTC
4.3 branch is being closed, moving to 4.4.7 target.
Comment 16 Bernd Schmidt 2011-06-27 12:53:20 UTC
Fixed in trunk, wontfix for the old trees, I'd think.