Bug 65307 - [4.9 Regression] Incorrect optimization breaks basic arithmetic
Summary: [4.9 Regression] Incorrect optimization breaks basic arithmetic
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.2
: P2 normal
Target Milestone: 5.0
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2015-03-04 08:18 UTC by madars+gccbug
Modified: 2016-08-05 10:45 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.8.3, 5.0
Known to fail: 4.9.2
Last reconfirmed: 2015-03-04 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description madars+gccbug 2015-03-04 08:18:17 UTC
The following code sample exhibits a bug in a gcc -O2 optimization pass. Namely, having defined two() and six() with the obvious return values, the value of two() * 2 + six() * 5 gets an assembly of 1 shl 5 (i.e. 32, instead of the correct 34).

Code: http://web.mit.edu/madars/Public/gcc-basic-arithmetic-bug.c

gcc 4.9.1 and 4.9.2 with -O2 are both buggy, -O1 makes the bug go away; the bug does not seem to be present in gcc 4.8 and earlier.
Comment 1 madars+gccbug 2015-03-04 08:33:50 UTC
I should clarify that the same behavior can be observed on Debian, Ubuntu and Fedora:

gcc (Ubuntu 4.9.1-16ubuntu6) 4.9.1
gcc (Debian 4.9.1-19) 4.9.1
gcc (GCC) 4.9.2 20141101 (Red Hat 4.9.2-1)

Moreover, Alex Chernyakhovsky reports that switching two() * 2 to two() + two() or two() << 1 produces correct results, so this seems to have to do with the handling of multiplication.
Comment 2 Anders Kaseorg 2015-03-04 09:20:39 UTC
Confirmed, except that changing two() * 2 to two() + two() or two() << 1 does not make a difference for me.  This looks more related to inlining:

-O1: works
-O2: fails
-O1 -finline-small-functions: fails
-O2 -fno-inline-small-functions: works

And if I mark two() and six() as inline, then it fails even at -O1.

(GCC 4.9.2-10ubuntu7 on Ubuntu vivid amd64)
Comment 3 Marek Polacek 2015-03-04 10:20:44 UTC
Confirmed for 4.9.  It's been fixed on trunk with r216728.
Comment 4 Kai Tietz 2015-03-04 12:30:26 UTC
This is a duplicate of PR/65216.  Underlying issue is in reassoc1-pass.  This issue is fixed for 5.0

*** This bug has been marked as a duplicate of bug 65216 ***
Comment 5 Marek Polacek 2015-03-04 12:51:09 UTC
Why do you think it is a duplicate?  PR65216 was a 5 Regression, something that worked in 4.9; this one is an issue with 4.9 and not 5.  PR65216 was also about -O3 only, this one fails even with -O2.  PR65216 started after this one got fixed.
Comment 6 Manuel López-Ibáñez 2015-03-04 12:52:47 UTC
(In reply to Kai Tietz from comment #4)
> This is a duplicate of PR/65216.  Underlying issue is in reassoc1-pass. 
> This issue is fixed for 5.0
> 
> *** This bug has been marked as a duplicate of bug 65216 ***

Is it fixed in 4.9.x? Because that PR 65216 is marked as fixed and does not mention 4.9.x, only 5.0.
Comment 7 Kai Tietz 2015-03-04 13:06:52 UTC
Well, it looked like the same issue by inspection dumps, as folding issue happens in reassoc-pass.  Of course it might be that forward-prop patch is the actual issue.

I noticed for -O3 on 4.9.x that valid computation (dse1):

...
g (const unsigned int from_f)
{
  const unsigned int thirty_four;
  unsigned int _3;
  unsigned int _4;
  unsigned int global.0_8;
  unsigned int _9;
  unsigned int _10;

  <bb 2>:
  global.0_8 = global;
  _9 = global.0_8 * 2;
  _3 = _9 * 2;
  _10 = _9 * 3;
  _4 = _10 * 5;
  thirty_four_5 = _3 + _4;
...

Shows after reassoc1 pass:
...
g (const unsigned int from_f)
{
  const unsigned int thirty_four;
  unsigned int _3;
  unsigned int _4;
  unsigned int global.0_8;
  unsigned int _9;
  unsigned int _10;

  <bb 2>:
  global.0_8 = global;
  _9 = global.0_8 * 2;
  _4 = 15;
  _3 = _4 + 2;
  _10 = _9 * _3;
  thirty_four_5 = _10;
...

so it seems to be the forward-propagate patch.

Sorry for the noise
Comment 8 Mikhail Maltsev 2015-03-04 13:44:28 UTC
A simpler testcase:

static inline unsigned cp(unsigned x)                                                                                                                                                                                                        
{                                                                                                                                                                                                                                            
    return x;                                                                                                                                                                                                                                
}                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                             
unsigned f(unsigned x)                                                                                                                                                                                                                       
{
    return cp(x * 2) * 2 + cp(cp(x * 2) * 3) * 5;
}

$ cat ./test1.c.085t.phiopt2

;; Function f (f, funcdef_no=1, decl_uid=1393, symbol_order=1)

f (unsigned int x)
{
  unsigned int _2;
  unsigned int _3;
  unsigned int _4;
  unsigned int _5;
  unsigned int _6;

  <bb 2>:
  _2 = x_1(D) * 2;
  _5 = 15;
  _3 = _5 + 2;
  _4 = _2 * _3;
  _6 = _4;
  return _6;

}

$ cat ./test1.c.087t.ccp3

;; Function f (f, funcdef_no=1, decl_uid=1393, symbol_order=1)


Pass statistics:
----------------

Immediate_uses: 

x_1(D) : --> single use.
_2 = x_1(D) * 2;

_2 : --> single use.
_4 = _2 * _3;

_3 : --> single use.
_4 = _2 * _3;

_4 : --> single use.
_6 = _4;

_5 : --> single use.
_3 = _5 + 2;

_6 : --> single use.
return _6;

.MEM_7(D) : --> single use.
# VUSE <.MEM_7(D)>
return _6;

Adding Destination of edge (0 -> 2) to worklist


Simulating block 2

Visiting statement:
# RANGE [0, 4294967295] NONZERO 0x000000000fffffffe
_2 = x_1(D) * 2;
which is likely CONSTANT
Lattice value changed to CONSTANT 0x00000000000000000 (0x000000000fffffffe).  Adding SSA edges to worklist.

Visiting statement:
# RANGE [0, 4294967295] NONZERO 0x000000000fffffffe
_5 = 15;
which is likely CONSTANT
Lattice value changed to CONSTANT 14.  Adding SSA edges to worklist.

Visiting statement:
_3 = _5 + 2;
which is likely CONSTANT
Lattice value changed to CONSTANT 16.  Adding SSA edges to worklist.

Visiting statement:
_4 = _2 * _3;
which is likely CONSTANT
Lattice value changed to CONSTANT 0x00000000000000000 (0x000000000ffffffe0).  Adding SSA edges to worklist.

Visiting statement:
# RANGE [0, 4294967295] NONZERO 0x000000000fffffffe
_6 = _4;
Lattice value changed to CONSTANT 0x00000000000000000 (0x000000000ffffffe0).  Adding SSA edges to worklist.

Visiting statement:
# VUSE <.MEM_7(D)>
return _6;
No interesting values produced.  Marked VARYING.

Substituting values and folding statements

Folding statement: return _6;
Not folded
Folding statement: _6 = _4;
Not folded
Folding statement: _4 = _2 * _3;
Folded into: _4 = _2 * 16;

Removing dead stmt _3 = 16;

Removing dead stmt _5 = 14;

Folding statement: _2 = x_1(D) * 2;
Not folded

Pass statistics:
----------------
Constants propagated: 1
Statements deleted: 2

f (unsigned intD.4 xD.1392)
{
  unsigned intD.4 _2;
  unsigned intD.4 _4;
  unsigned intD.4 _6;

;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
;;    prev block 0, next block 1, flags: (NEW, REACHABLE)
;;    pred:       ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
  # RANGE [0, 4294967295] NONZERO 0x000000000fffffffe
  _2 = x_1(D) * 2;
  # RANGE [0, 4294967295] NONZERO 0x000000000ffffffe0
  _4 = _2 * 16;
  # RANGE [0, 4294967295] NONZERO 0x000000000ffffffe0
  _6 = _4;
  # VUSE <.MEM_7(D)>
  return _6;
;;    succ:       EXIT [100.0%] 

}
Comment 9 Richard Biener 2015-03-04 14:21:41 UTC
Visiting statement:
# RANGE [0, 4294967295] NONZERO 0x000000000fffffffe
_5 = 15;
which is likely CONSTANT
Lattice value changed to CONSTANT 14.  Adding SSA edges to worklist.

is of course overly optimistic ;)  (but yes, nonzero bits info is bogus)

I'd say we should eventually assert instead of miscompiling things.

That is,

static prop_value_t
evaluate_stmt (gimple stmt)
{
...
  if (flag_tree_bit_ccp
      && ((is_constant && TREE_CODE (val.value) == INTEGER_CST)
          || (!is_constant && likelyvalue != UNDEFINED))
      && gimple_get_lhs (stmt)
      && TREE_CODE (gimple_get_lhs (stmt)) == SSA_NAME)
    {
          if (!is_constant)
            {
...
            }
          else
            {
              double_int valv = tree_to_double_int (val.value);

here assert

     gcc_assert ((valv & ~val.mask & ~nonzero_bits).is_zero ());

that is, known bits in valv should be consistent with nonzero_bits info.

              if (!(valv & ~nonzero_bits & mask).is_zero ())
                val.value = double_int_to_tree (TREE_TYPE (lhs),
                                                valv & nonzero_bits);
              if (nonzero_bits.is_zero ())
                val.mask = double_int_zero;
              else
                val.mask = val.mask & (nonzero_bits | ~mask);
            }
Comment 10 Richard Biener 2015-03-04 14:44:23 UTC
The simpler testcase reproduces on trunk for me.  Trunk assert:

Index: gcc/tree-ssa-ccp.c
===================================================================
--- gcc/tree-ssa-ccp.c  (revision 221174)
+++ gcc/tree-ssa-ccp.c  (working copy)
@@ -1901,9 +1922,13 @@ evaluate_stmt (gimple stmt)
            }
          else
            {
-             if (wi::bit_and_not (val.value, nonzero_bits) != 0)
-               val.value = wide_int_to_tree (TREE_TYPE (lhs),
-                                             nonzero_bits & val.value);
+             wide_int tem = wi::bit_and_not (val.value, nonzero_bits);
+             if (tem != 0)
+               {
+                 gcc_assert (extend_mask (tem).and_not (val.mask) == 0);
+                 val.value = wide_int_to_tree (TREE_TYPE (lhs),
+                                               nonzero_bits & val.value);
+               }
              if (nonzero_bits == 0)
                val.mask = 0;
              else
Comment 11 Richard Biener 2015-03-04 14:56:25 UTC
Fixed 4.9 assert:

              gcc_assert ((valv & ~val.mask 
                           & ~nonzero_bits & mask).is_zero ());

fixed trunk assert:

@@ -1901,9 +1922,14 @@ evaluate_stmt (gimple stmt)
            }
          else
            {
-             if (wi::bit_and_not (val.value, nonzero_bits) != 0)
-               val.value = wide_int_to_tree (TREE_TYPE (lhs),
-                                             nonzero_bits & val.value);
+             widest_int tem = wi::bit_and_not (wi::to_widest (val.value),
+                                               extend_mask (nonzero_bits));
+             if (tem != 0)
+               {
+                 gcc_assert (tem.and_not (val.mask) == 0);
+                 val.value = wide_int_to_tree (TREE_TYPE (lhs),
+                                               nonzero_bits & val.value);
+               }
              if (nonzero_bits == 0)
                val.mask = 0;
              else


That is, when CCP computes a constant it verifies that nonzero bits info
is consistent with that.

Such assert might not be ready for prime-time as for example in unreachable
code-regions any inconsistency can happen, like in

  if ((i & 1) == 0) { if (i == 3) { .... } }
Comment 12 madars+gccbug 2015-03-05 01:30:35 UTC
By the way, in g++ the bug can be triggered even with -O1 and without marking any functions inline (implicitly or explicitly):

http://web.mit.edu/madars/Public/gcc-basic-arithmetic-bug-O1.cpp
Comment 13 Jakub Jelinek 2015-06-26 19:54:00 UTC
GCC 4.9.3 has been released.
Comment 14 Richard Biener 2016-08-03 11:17:29 UTC
Fixed.
Comment 15 Ozkan Sezer 2016-08-05 10:45:56 UTC
(In reply to Richard Biener from comment #14)
> Fixed.

By which commit was this fixed?  Is the fix applicable to the now-closed
4.9 branch at all?