Bug 41574

Summary: Distribute floating point expressions causes bad code [4.4 only]
Product: gcc Reporter: Doug Kwan <dougkwan>
Component: rtl-optimizationAssignee: Doug Kwan <dougkwan>
Status: RESOLVED FIXED    
Severity: normal CC: gcc-bugs
Priority: P3 Keywords: wrong-code
Version: 4.4.0   
Target Milestone: 4.4.3   
Host: x86_64-unknown-linux-gnu Target: arm-none-eabi
Build: x86_64-unknown-linux-gnu Known to work: 4.5.0
Known to fail: 4.4.0 Last reconfirmed: 2009-10-15 10:44:31

Description Doug Kwan 2009-10-05 08:01:11 UTC
We found this problem on ARM but I believe the problem affect other platforms as well.  In the function distribute_and_simplify() of combine.c, there is not check for floating point expressions.  Sometimes it incorrectly optimizes a floating point RTL.

-----bug.c----
static const double one=1.0;

double
f(double x)
{
  /* This is incorrectly transformed to x + x*x */
  return x*(one+x);
}
---------

$ arm-eabi-gcc -O2 -S -march=armv7-a -mfloat-abi=hard -mfpu=neon -fno-unsafe-math-optimizations -g0 bug.c
$ cat bug.s
	.arch armv7-a
	.eabi_attribute 27, 3
	.eabi_attribute 28, 1
	.fpu neon
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 1
	.eabi_attribute 30, 2
	.eabi_attribute 18, 4
	.file	"bug.c"
	.text
	.align	2
	.global	f
	.type	f, %function
f:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	fmacd	d0, d0, d0
	bx	lr
	.size	f, .-f
	.ident	"GCC: (GNU) 4.5.0 20091005 (experimental)"

The expression x*(1.0 + x) above is optmized into x + x * x using the fmacd instruction, which multiplies and accumlates.  The following is part combine pass dump, note how instruction 13 is modified.:

---

;; Function f (f)

starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
insn_cost 2: 4
insn_cost 6: 4
insn_cost 7: 4
insn_cost 8: 4
insn_cost 13: 4
insn_cost 16: 0
deferring deletion of insn with uid = 2.
modifying insn i2     7 r138:DF=s0:DF+r139:DF
      REG_DEAD: r139:DF
deferring rescan insn with uid = 7.
modifying insn i3     8 r137:DF=r138:DF*s0:DF
      REG_DEAD: r138:DF
deferring rescan insn with uid = 8.
deferring deletion of insn with uid = 7.
deferring deletion of insn with uid = 6.
modifying insn i3     8 r137:DF=s0:DF*s0:DF+s0:DF
deferring rescan insn with uid = 8.
deferring deletion of insn with uid = 8.
modifying insn i3    13 s0:DF=s0:DF*s0:DF+s0:DF
deferring rescan insn with uid = 13.
(note 4 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(note 2 4 3 2 NOTE_INSN_DELETED)

(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)

(note 6 3 7 2 NOTE_INSN_DELETED)

(note 7 6 8 2 NOTE_INSN_DELETED)

(note 8 7 13 2 NOTE_INSN_DELETED)

(insn 13 8 16 2 bug.c:8 (set (reg/i:DF 63 s0)
        (plus:DF (mult:DF (reg:DF 63 s0 [ x ])
                (reg:DF 63 s0 [ x ]))
            (reg:DF 63 s0 [ x ]))) 610 {*muldf3adddf_vfp} (nil))

(insn 16 13 0 2 bug.c:8 (use (reg/i:DF 63 s0)) -1 (nil))
starting the processing of deferred insns
deleting insn with uid = 2.
deleting insn with uid = 6.
deleting insn with uid = 7.
deleting insn with uid = 8.
rescanning insn with uid = 13.
deleting insn with uid = 13.
ending the processing of deferred insns

;; Combiner totals: 10 attempts, 10 substitutions (3 requiring new space),
;; 3 successes.

---

The problem happens in this part of distribute_and_simplify_rtx ():


  tmp = apply_distributive_law (simplify_gen_binary (inner_code, mode,
                                                     new_op0, new_op1));
  if (GET_CODE (tmp) != outer_code
      && rtx_cost (tmp, SET, optimize_this_for_speed_p)
         < rtx_cost (x, SET, optimize_this_for_speed_p))
    return tmp;

It synthesizes a new expression by distributing one of the sub-expressions and then call apply_distribute_law.  In the test-case above, apply_distribute_law detects a floating point RTL expression and returns immediately but the simplified expression generated has a lower RTL-cost than the original expression.  Hence distribute_and_simplify_rtx returns the simplified expression, eventhough it should not unless -funsafe-math-optimizations is given.

I think the fix is to add this test at the entrance of the function distribute_and_simplify_rtx

  /* Distributivity is not true for floating point as it can change the
     value.  So we don't do it unless -funsafe-math-optimizations.  */
  if (FLOAT_MODE_P (GET_MODE (x))
      && ! flag_unsafe_math_optimizations)
    return NULL_RTX;
Comment 1 dougkwan 2009-10-05 09:09:02 UTC
Subject: Bug 41574

Author: dougkwan
Date: Mon Oct  5 09:08:46 2009
New Revision: 152443

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=152443
Log:
2009-10-05  Doug Kwan  <dougkwan@google.com>

	PR rtl-optimization/41574
	Index: combine.c (distribute_and_simplify_rtx): Quit if RTX mode is
	floating point and we are not doing unsafe math optimizations.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c

Comment 2 Ramana Radhakrishnan 2009-10-05 09:32:19 UTC
Fixed.
Comment 3 Jakub Jelinek 2009-10-05 09:48:28 UTC
The ChangeLog entry is wrong.
Comment 4 Eric Botcazou 2009-10-05 10:00:20 UTC
> The ChangeLog entry is wrong.

And folks from Google shouldn't feel entitled to break a freeze imposed by other folks from Google even if, yes, it is annoyingly long. :-)
Comment 5 Doug Kwan 2009-10-05 15:44:09 UTC
Subject: Re:  Distribute floating point 
	expressions causes bad code.

I am aware of the fact the stage one has ended but this is a bug fix,
not an experimental new feature.  Did I break a code freeze?  If so, I
am sorry and can back out the fix until the tree is reopen.

2009/10/5 ebotcazou at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org>:
>
>
> ------- Comment #4 from ebotcazou at gcc dot gnu dot org  2009-10-05 10:00 -------
>> The ChangeLog entry is wrong.
>
> And folks from Google shouldn't feel entitled to break a freeze imposed by
> other folks from Google even if, yes, it is annoyingly long. :-)
>
>
> --
>
> ebotcazou at gcc dot gnu dot org changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |ebotcazou at gcc dot gnu dot
>                   |                            |org
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41574
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
>
Comment 6 Doug Kwan 2009-10-05 15:48:55 UTC
Subject: Re:  Distribute floating point 
	expressions causes bad code.

Just saw Diego's e-mail about the me breaking the freeze.  Sorry I
should have checked that before checking thing in.  It was just my
bad.

2009/10/5 Doug Kwan (關振德) <dougkwan@google.com>:
> I am aware of the fact the stage one has ended but this is a bug fix,
> not an experimental new feature.  Did I break a code freeze?  If so, I
> am sorry and can back out the fix until the tree is reopen.
>
> 2009/10/5 ebotcazou at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org>:
>>
>>
>> ------- Comment #4 from ebotcazou at gcc dot gnu dot org  2009-10-05 10:00 -------
>>> The ChangeLog entry is wrong.
>>
>> And folks from Google shouldn't feel entitled to break a freeze imposed by
>> other folks from Google even if, yes, it is annoyingly long. :-)
>>
>>
>> --
>>
>> ebotcazou at gcc dot gnu dot org changed:
>>
>>           What    |Removed                     |Added
>> ----------------------------------------------------------------------------
>>                 CC|                            |ebotcazou at gcc dot gnu dot
>>                   |                            |org
>>
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41574
>>
>> ------- You are receiving this mail because: -------
>> You reported the bug, or are watching the reporter.
>>
>
Comment 7 dougkwan 2009-10-08 22:17:10 UTC
Subject: Bug 41574

Author: dougkwan
Date: Thu Oct  8 22:16:58 2009
New Revision: 152580

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=152580
Log:
2009-10-08  Doug Kwan  <dougkwan@google.com>

	PR rtl-optimization/41574
	* gcc.dg/pr41574.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/pr41574.c
Modified:
    trunk/gcc/testsuite/ChangeLog

Comment 8 Doug Kwan 2009-10-08 22:20:28 UTC
This is fixed in trunk but at least gcc-4.4.0, where this bug was found, is still broken.
Comment 9 Ramana Radhakrishnan 2009-10-15 10:44:31 UTC
(In reply to comment #8)
> This is fixed in trunk but at least gcc-4.4.0, where this bug was found, is
> still broken.
> 

I have no approval rights but can you test & ask to backport this to 4.4 branch after the freeze for the 4.4.2 release is lifted ? 

Ramana
Comment 10 Doug Kwan 2009-10-20 06:22:20 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > This is fixed in trunk but at least gcc-4.4.0, where this bug was found, is
> > still broken.
> > 
> 
> I have no approval rights but can you test & ask to backport this to 4.4 branch
> after the freeze for the 4.4.2 release is lifted ? 

Sorry about the late reply. Yes, I can prepare a fix for 4.4.2

-Doug 

Comment 11 Ramana Radhakrishnan 2009-12-11 11:21:49 UTC
Subject: Bug 41574

Author: ramana
Date: Fri Dec 11 11:21:33 2009
New Revision: 155157

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155157
Log:
Fix PR41574 on 4.4 branch.

2009-12-11  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

	2009-10-05  Doug Kwan  <dougkwan@google.com>

	PR rtl-optimization/41574
	* combine.c (distribute_and_simplify_rtx): Quit if RTX mode is
	floating point and we are not doing unsafe math optimizations.


2009-12-11  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

	2009-10-08  Doug Kwan  <dougkwan@google.com>

	PR rtl-optimization/41574
	* gcc.dg/pr41574.c: New test.



Added:
    branches/gcc-4_4-branch/gcc/testsuite/gcc.dg/pr41574.c
      - copied unchanged from r152580, trunk/gcc/testsuite/gcc.dg/pr41574.c
Modified:
    branches/gcc-4_4-branch/gcc/ChangeLog
    branches/gcc-4_4-branch/gcc/combine.c
    branches/gcc-4_4-branch/gcc/testsuite/ChangeLog

Comment 12 Ramana Radhakrishnan 2009-12-11 17:46:59 UTC
Fixed .