Bug 18589 - could optimize FP multiplies better
Summary: could optimize FP multiplies better
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 3.4.2
: P2 enhancement
Target Milestone: 4.8.0
Assignee: Bill Schmidt
URL:
Keywords: missed-optimization, TREE
Depends on: 22312
Blocks:
  Show dependency treegraph
 
Reported: 2004-11-21 09:28 UTC by Debian GCC Maintainers
Modified: 2012-04-12 16:16 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-01-04 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Debian GCC Maintainers 2004-11-21 09:28:01 UTC
[forwarded from http://bugs.debian.org/268115]

  Matthias

The bug submitter writes:


compiling this function:
double baz(double foo, double bar)
{
   return foo*foo*foo*foo*bar*bar*bar*bar;
}

 on amd64 with -O6 -ffast-math, gcc emits this code:

foo.o:     file format elf64-x86-64

Disassembly of section .text:

... (some similar functions that I was messing around with) ...
0000000000000050 <ddbar>:
  50:	f2 0f 59 c0          	mulsd  %xmm0,%xmm0
  54:	f2 0f 59 c0          	mulsd  %xmm0,%xmm0
  58:	f2 0f 59 c1          	mulsd  %xmm1,%xmm0
  5c:	f2 0f 59 c1          	mulsd  %xmm1,%xmm0
  60:	f2 0f 59 c1          	mulsd  %xmm1,%xmm0
  64:	f2 0f 59 c1          	mulsd  %xmm1,%xmm0
  68:	c3                   	retq   


 So, it notices that it can do foo*foo*foo*foo with two mulsd instructions,
but it misses the same optimization for bar*bar*bar*bar.  It would save one
FP multiply overall to do:
mulsd %xmm0, %xmm0
mulsd %xmm1, %xmm1
mulsd %xmm0, %xmm0
mulsd %xmm1, %xmm1
mulsd %xmm1, %xmm0
retq
 Also, the two non-dependent muls could run in parallel.
 
 Without -ffast-math, of course, gcc can't take advantage of the laws of
arithmetic like that and has to do all the multiplies the straightforward
way.
Comment 1 Andrew Pinski 2004-11-21 14:24:07 UTC
Confirmed.
Actually the most optimial code would be:
        mulsd   %xmm1, %xmm0
        mulsd   %xmm0, %xmm0
        mulsd   %xmm0, %xmm0
aka
(foo*bar)^4
Comment 2 Andrew Pinski 2005-01-12 06:50:30 UTC
This is actually not a target issue, it can be shown on ppc also and other targets including x86.
doing (f1*f2)^2^2 will be the best every where as it is only three instructions and it would take the 
same time as what is proposed if there are two FPU units  but what I said is the smallest and fastest 
version.
Comment 3 Andrew Pinski 2005-07-05 19:37:08 UTC
I think PR 22312 mentions what the current problem with reassoc is (well once I submit the patch to 
introduce reassociation for fp).
Comment 4 Andrew Pinski 2006-03-05 17:43:28 UTC
In 4.2.0 and above we get:
baz:
.LFB2:
        mulsd   %xmm1, %xmm0
        mulsd   %xmm0, %xmm0
        mulsd   %xmm0, %xmm0
        ret

Which is what I recommend but we don't get that on the tree level:
  return bar * bar * __builtin_pow (foo, 4.0e+0) * bar * bar;

So this is just a tree level missed optimization.
Comment 5 Andrew Pinski 2012-01-05 23:06:56 UTC
The tree level on the trunk we get:
  powmult.2_10 = foo_1(D) * foo_1(D);
  D.1709_4 = bar_3(D) * bar_3(D);
  D.1710_5 = D.1709_4 * bar_3(D);
  D.1711_6 = D.1710_5 * bar_3(D);
  powmult.2_9 = D.1711_6 * powmult.2_10;
  D.1707_7 = powmult.2_9 * powmult.2_10;
--- CUT ---
which is just as bad.
Comment 6 Richard Biener 2012-01-09 12:20:31 UTC
Yeah, reassoc does not canonicalize to pow () so the tree level optimal
expansion does not trigger [in reality reassoc should probably do both
on-the-fly - linearly expand existing pow()s to expose them to multiply
chains and sorting, combine them back and then emit them in optimal form].

Bill, something you want to tackle?
Comment 7 Bill Schmidt 2012-01-09 13:06:34 UTC
Sure, I'll at least have a look at it when I get some time.
Comment 8 Bill Schmidt 2012-01-13 22:27:48 UTC
I've started to look at this -- I'll plan to get a patch in place for 4.8.
Comment 9 Bill Schmidt 2012-04-12 16:15:24 UTC
Author: wschmidt
Date: Thu Apr 12 16:15:13 2012
New Revision: 186384

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=186384
Log:
gcc:

2012-04-12  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/18589
	* tree-ssa-reassoc.c (reassociate_stats): Add two fields.
	(operand_entry): Add count field.
	(add_repeat_to_ops_vec): New function.
	(completely_remove_stmt): Likewise.
	(remove_def_if_absorbed_call): Likewise.
	(remove_visited_stmt_chain): Remove feeding builtin pow/powi calls.
	(acceptable_pow_call): New function.
	(linearize_expr_tree): Look for builtin pow/powi calls and add operand
	entries with repeat counts when found.
	(repeat_factor_d): New struct and associated typedefs.
	(repeat_factor_vec): New static vector variable.
	(compare_repeat_factors): New function.
	(get_reassoc_pow_ssa_name): Likewise.
	(attempt_builtin_powi): Likewise.
	(reassociate_bb): Call attempt_builtin_powi.
	(fini_reassoc): Two new calls to statistics_counter_event.

gcc/testsuite:

2012-04-12  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/18589
	* gcc.dg/tree-ssa/pr18589-1.c: New test.
	* gcc.dg/tree-ssa/pr18589-2.c: Likewise.
	* gcc.dg/tree-ssa/pr18589-3.c: Likewise.
	* gcc.dg/tree-ssa/pr18589-4.c: Likewise.
	* gcc.dg/tree-ssa/pr18589-5.c: Likewise.
	* gcc.dg/tree-ssa/pr18589-6.c: Likewise.
	* gcc.dg/tree-ssa/pr18589-7.c: Likewise.
	* gcc.dg/tree-ssa/pr18589-8.c: Likewise.
	* gcc.dg/tree-ssa/pr18589-9.c: Likewise.
	* gcc.dg/tree-ssa/pr18589-10.c: Likewise.



Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr18589-1.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr18589-2.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr18589-3.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr18589-9.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-reassoc.c
Comment 10 Bill Schmidt 2012-04-12 16:16:50 UTC
Fixed.