[Bug tree-optimization/108164] [12/13 Regression] wrong code with "-O3 -fno-tree-dce" on x86_64-linux-gnu since r12-5267-g540d92ae9b629eb4

rguenth at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Mon Dec 19 13:53:27 GMT 2022


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108164

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, it's correct.

short __attribute__((noipa))
foo(short f)
{
  while (f >= -1)
    f++;
  return f;
}

int main ()
{
  if (foo (-1) != -32768)
    __builtin_abort ();
  return 0;
}

shows exactly the same vectorization (-O3 -fno-vect-cost-model --param
vect-epilogues-nomask=0).

With the testcase in the description thread2 performs some threading which
isn't performed on this testcase though and that's a trigger.

-fdbg-cnt=registered_jump_thread:3-4 triggers it (3-3 and 4-4 is broken as
well).

The difference between -fdbg-cnt=registered_jump_thread:3-3 (borken) and
-fdisable-tree-thread2 (OK) is

--- a/a-t.c.254t.optimized      2022-12-19 13:43:00.654410480 +0100
+++ b/a-t.c.254t.optimized      2022-12-19 13:43:08.818523519 +0100
@@ -125,7 +125,7 @@

   <bb 4> [local count: 118111600]:
   # RANGE [irange] short int [-INF, -2]
-  # f_34 = PHI <-32768(3), f_36(5)>
+  # f_34 = PHI <-32767(3), f_36(5)>
   # RANGE [irange] int [-2147483647, 1]
   _4 = c.3_31 + 1;
   if (_4 != 1)

this difference appears at a-t.c.196t.dom3 which follows thread2.  We enter
dom3 with

  <bb 15> [local count: 105119324]:
  # f_71 = PHI <f_87(14), _50(6), f_26(8), f_40(9), f_6(10), f_46(11),
f_104(12), f_108(13)> 

  <bb 16> [local count: 118111600]:
  # RANGE [irange] short int [-INF, -2]
  # f_34 = PHI <f_71(15), f_36(17)>

and the dom3 dump has things like

Optimizing block #9

LKUP STMT f.1_96 = PHI <f.1_60, 32767>
2>>> STMT f.1_96 = PHI <f.1_60, 32767> 
<<<< STMT f.1_96 = PHI <f.1_60, 32767>
Optimizing statement _9 = f.1_96 + 2;
  Replaced 'f.1_96' with constant '32767'
gimple_simplified to _9 = 32769;
  Folded to: _9 = 32769;
_9 : global value re-evaluated to [irange] UNDEFINED
LKUP STMT _9 = 32769
==== ASGN _9 = 32769
Optimizing statement f_40 = (short int) _9;
  Replaced '_9' with constant '32769'
gimple_simplified to f_40 = -32767;
  Folded to: f_40 = -32767;
f_40 : global value re-evaluated to [irange] UNDEFINED
LKUP STMT f_40 = -32767

Something goes wrong here.  For example for

  _9 = 32769;

we have [irange] unsigned short [1, 32768] as global range and
gimple_ranger::update_stmt will update that to UNDEFINED


That bogus value comes from cprop_into_successor_phis where we have
a SSA_NAME_VALUE of -32767 recorded for f_71.  The only place I see is

0>>> COPY f_71 = -32767
0>>> COPY f_34 = -32767
LKUP STMT _4 = c.3_31 plus_expr 1
LKUP STMT _4 ne_expr 1
 Registering killing_def (path_oracle) _4
 Registering value_relation (path_oracle) (_4 > c.3_31) (root: bb9)
<<<< COPY f_34 = -32767
<<<< COPY f_71 = -32767

but as you can see we revert that again.

The value pops in again from record_equivalences_from_phis when visiting BB 15
via

      /* If we managed to iterate through each PHI alternative without
         breaking out of the loop, then we have a PHI which may create
         a useful equivalence.  We do not need to record unwind data for
         this, since this is a true assignment and not an equivalence
         inferred from a comparison.  All uses of this ssa name are dominated
         by this assignment, so unwinding just costs time and space.  */
      if (i == gimple_phi_num_args (phi))
        {
          if (may_propagate_copy (lhs, rhs)) 
            set_ssa_name_value (lhs, rhs);

because just one edge is marked EDGE_EXECUTABLE (9 -> 15).  That means
the value computed in BB9 is wrong.  That's exactly that with the
UNDEFINED global range result.

I _think_ what may go wrong is that we emit

  <bb 6> [local count: 94607391]:
  _50 = BIT_FIELD_REF <vect_f_27.24_52, 16, 112>;
  niters_vector_mult_vf.19_74 = bnd.18_75 << 3;
  _72 = (short int) niters_vector_mult_vf.19_74;
  tmp.20_73 = f_36 + _72;
  if (niters_vector_mult_vf.19_74 == niters.17_100)
    goto <bb 15>; [12.50%]
  else
    goto <bb 7>; [87.50%]

  <bb 7> [local count: 82781467]:
  # f_92 = PHI <tmp.20_73(6)>
  # RANGE [irange] unsigned short [0, 32767][+INF, +INF]
  f.1_98 = (unsigned short) f_92;

see how we replace the final value with something computed in signed
arithmetic.  This is also visible in my shorter testcase.

I have a patch.


More information about the Gcc-bugs mailing list