Bug 59303 - [4.9 Regression] uninit-pred-8_b.c and uninit-pred-9_b.c fail after better optimizations
Summary: [4.9 Regression] uninit-pred-8_b.c and uninit-pred-9_b.c fail after better op...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.0
: P1 normal
Target Milestone: 4.9.0
Assignee: Not yet assigned to anyone
URL:
Keywords: diagnostic
Depends on:
Blocks:
 
Reported: 2013-11-26 15:53 UTC by christophe.lyon
Modified: 2014-01-03 11:00 UTC (History)
3 users (show)

See Also:
Host:
Target: arm, x86_64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-11-27 00:00:00


Attachments
Patch file : cleanup + more predicate simplification rules (11.88 KB, patch)
2013-12-21 08:21 UTC, davidxl
Details | Diff
cleanups (3.16 KB, patch)
2013-12-21 09:45 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description christophe.lyon 2013-11-26 15:53:21 UTC
Since commit 204194 from Andrew Pinski:
    2013-10-29  Andrew Pinski <apinski@cavium.com>

            * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
            (ifcombine_ifandif): Handle cases where
            maybe_fold_and_comparisons fails, combining the branches
            anyways.
            (tree_ssa_ifcombine): Inverse the order of
            the basic block walk, increases the number of combinings.
            * gimple.h (gsi_start_nondebug_after_labels_bb): New function.

    2013-10-29  Andrew Pinski <apinski@cavium.com>
                Zhenqiang Chen  <zhenqiang.chen@linaro.org>

            * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
            * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
            * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
            * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
            * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
            * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.
            * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
            conditional move to be used.
            * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove.


I have noticed regressions on ARM targets. The 2 mentioned testcases now FAIL (used to PASS):
  gcc.dg/uninit-pred-8_b.c bogus warning (test for bogus messages, line 20)
  gcc.dg/uninit-pred-9_b.c bogus warning (test for bogus messages, line 24)

It is the case in the following configurations:
targets: arm-none-linux-gnueabi, arm-none-linux-gnueabihf, armeb-none-linux-gnueabihf
mode: arm,thumb
cpu: cortex-a9
fpu: default, vfpv3, neon-fp16

targets: aarch64-none-elf, aarch64-none-linux

These 2 regressions do not appear when configuring --target arm-none-linux-gnueabihf --with-cpu=cortex-a5 --with-fpu=vfpv3-d16-fp16


However, when configuring --target arm-none-linux-gnueabihf --with-cpu=cortex-a5 --with-fpu=vfpv3-d16-fp16, some of the newly introduced tests fail:
  gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c scan-tree-dump optimized "&"
  gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c scan-tree-dump optimized "&"
  gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c scan-tree-dump-times optimized "&" 2
  gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c scan-tree-dump-times optimized "\\|" 2
Comment 1 Andrew Pinski 2013-11-26 19:49:35 UTC
> However, when configuring --target arm-none-linux-gnueabihf --with-cpu=cortex-a5
> --with-fpu=vfpv3-d16-fp16, some of the newly introduced tests fail:

That is a testsuite issue only and should be reported separately. BRANCH_COST is low for cortex-a5.


The other testcases are really a buggy predicated unitialized warning pass, see PR 49498 for another instance of the same testcase failing.  I had looked into fixing that warning pass but it was complex to understand what was going wrong.
Comment 2 Andrew Pinski 2013-11-26 19:50:38 UTC
One more comment, it now fails on x86_64 also.
Comment 3 christophe.lyon 2013-11-27 08:30:45 UTC
I have filed PR 59308 about the ssa-ifcombine-ccmp* tests failing on cortex-a5.

Should I mark this bug report (59303) as a duplicate of 49498?
Comment 4 Richard Biener 2013-11-27 08:59:24 UTC
Confirmed.  They fail for quite some while.  The uninit pass needs to cope
with combined predicates.
Comment 5 davidxl 2013-12-19 18:30:04 UTC
Fixing this requires more powerful predicate analysis with the help of value equivalent classes.

From the dump:

      "Use in stmt blah (v_1);
      is guarded by :
      if (_23 != 0)"


_23 = pephitmp_22 | _8, so the use guard is

        prephitmp_22 != 0 || _8 != 0

prephitmp_22 = PHI <0(13), prephitmp_21(3), so the guard can be translated to

        prephitmp_21 != 0 || _8 != 0

prephitmp_21 = r_9(D) <= 19, so the predicate becomes

         r_9(D) <= 19 || _8 != 0


For the def:

[CHECK] Found def edge 1 in v_1 = PHI <v_14(D)(13), r_9(D)(3)>
Operand defs of phi v_1 = PHI <v_14(D)(13), r_9(D)(3)>
is guarded by :
       if (_8 != 0)
      (.OR.)
        (.NOT.) if (_8 != 0)
        (.AND.)
         if (_13 != 0)

the predicate is
       _8 != 0 || _13 != 0

where _13 = _10 | _12, 10_ = (r_9(D) <= 19), _12 = l_11(D) != 0

so the predicate is
       _8 != 0 || r_9(D) <= 19 || 1_11(D) != 0

which is the superset for the use.


Perhaps we need to do around of predicate normalization before comparison.

David
Comment 6 davidxl 2013-12-19 23:18:38 UTC
I am working on a solution (and some cleanups).

David
Comment 7 davidxl 2013-12-20 06:50:07 UTC
Testing a fix that addresses the problem. With the fix, the dump shows:

Use in stmt: blah (v_1);
is guarded by :
m_6(D) > 100
(.OR.)
n_4(D) <= 9
(.OR.)
r_9(D) <= 19

Operand defs of phi: v_1 = PHI <v_14(D)(13), r_9(D)(3)>
is guarded by :
m_6(D) > 100
(.OR.)
n_4(D) <= 9
(.OR.)
l_11(D) != 0
(.OR.)
r_9(D) <= 19

which is correct.
Comment 8 davidxl 2013-12-21 08:21:26 UTC
Created attachment 31495 [details]
Patch file : cleanup + more predicate simplification rules
Comment 9 Jakub Jelinek 2013-12-21 09:45:59 UTC
Created attachment 31496 [details]
cleanups

I had also a brief look at this recently, but haven't made progress beyond attached formatting/cleanup patch so far (plus only dumping details with *-details, otherwise the dumps are pretty much inconsistent).
My conclusion has also been that to fix this up the predicates would need to be normalized before comparison, and simplified, at least if the initial subset check fails.  On uninit-pred-8_b.c there has been additional complication because PRE decides to change the IL and we end up with:
  <bb 3>:
  pretmp_24 = r_10(D) <= 19;
  goto <bb 5>;

  <bb 4>:
  _11 = r_10(D) <= 19;
  _13 = l_12(D) != 0;
  _14 = _11 | _13;
  if (_14 != 0)
    goto <bb 14>;
  else
    goto <bb 15>;

  <bb 14>:
  goto <bb 3>;
  
  <bb 15>:

  <bb 5>:
  # v_1 = PHI <v_15(D)(15), r_10(D)(3)>
  # prephitmp_30 = PHI <0(15), pretmp_24(3)>
  if (m_7(D) != 0)
    goto <bb 6>;
  else
    goto <bb 7>;

  <bb 6>:
  g.0_17 = g;
  g.1_18 = g.0_17 + 1;
  g = g.1_18;
  goto <bb 8>;
  
  <bb 7>:
  bar ();

  <bb 8>:
  _3 = prephitmp_30 | _9;
  if (_3 != 0)
so the prephitmp_30 would need to be canonicalized into _14 != 0 && r_10(D) <= 19 aka (r_10(D) <= 19 || l_12(D) != 0) && r_10(D) <= 19 and from there to r_10(D) <= 19.
Comment 10 davidxl 2013-12-21 18:25:52 UTC
My patch does this.
1) it first does aggressive simplification iteratively (more rules can be added later). 
2) it then does normalization by building up larger predicate trees by following ud chain and bitwise or/and operations. It handles simple PHIs including also degenerated phis.

The new dump reports:

in foo,

predicate for def:

m_7(D) > 100
(.OR.)
n_5(D) <= 9
(.OR.)
l_12(D) != 0
(.OR.)
r_10(D) <= 19

use1:

m_7(D) > 100
(.OR.)
n_5(D) <= 9
(.OR.)
r_10(D) <= 9

use2:

m_7(D) > 100
(.OR.)
n_5(D) <= 9
(.OR.)
r_10(D) <= 19


For foo2, the predicates are properly built, but the patch has a bug in predicate set inclusion testing leading to a false negative.

David

(In reply to Jakub Jelinek from comment #9)
> Created attachment 31496 [details]
> cleanups
> 
> I had also a brief look at this recently, but haven't made progress beyond
> attached formatting/cleanup patch so far (plus only dumping details with
> *-details, otherwise the dumps are pretty much inconsistent).
> My conclusion has also been that to fix this up the predicates would need to
> be normalized before comparison, and simplified, at least if the initial
> subset check fails.  On uninit-pred-8_b.c there has been additional
> complication because PRE decides to change the IL and we end up with:
>   <bb 3>:
>   pretmp_24 = r_10(D) <= 19;
>   goto <bb 5>;
> 
>   <bb 4>:
>   _11 = r_10(D) <= 19;
>   _13 = l_12(D) != 0;
>   _14 = _11 | _13;
>   if (_14 != 0)
>     goto <bb 14>;
>   else
>     goto <bb 15>;
> 
>   <bb 14>:
>   goto <bb 3>;
>   
>   <bb 15>:
> 
>   <bb 5>:
>   # v_1 = PHI <v_15(D)(15), r_10(D)(3)>
>   # prephitmp_30 = PHI <0(15), pretmp_24(3)>
>   if (m_7(D) != 0)
>     goto <bb 6>;
>   else
>     goto <bb 7>;
> 
>   <bb 6>:
>   g.0_17 = g;
>   g.1_18 = g.0_17 + 1;
>   g = g.1_18;
>   goto <bb 8>;
>   
>   <bb 7>:
>   bar ();
> 
>   <bb 8>:
>   _3 = prephitmp_30 | _9;
>   if (_3 != 0)
> so the prephitmp_30 would need to be canonicalized into _14 != 0 && r_10(D)
> <= 19 aka (r_10(D) <= 19 || l_12(D) != 0) && r_10(D) <= 19 and from there to
> r_10(D) <= 19.
Comment 11 davidxl 2013-12-21 19:57:56 UTC
The false negative bug introduced in the patch is fixed. Will submit the patch for review soon.

(In reply to davidxl from comment #10)
> My patch does this.
> 1) it first does aggressive simplification iteratively (more rules can be
> added later). 
> 2) it then does normalization by building up larger predicate trees by
> following ud chain and bitwise or/and operations. It handles simple PHIs
> including also degenerated phis.
> 
> The new dump reports:
> 
> in foo,
> 
> predicate for def:
> 
> m_7(D) > 100
> (.OR.)
> n_5(D) <= 9
> (.OR.)
> l_12(D) != 0
> (.OR.)
> r_10(D) <= 19
> 
> use1:
> 
> m_7(D) > 100
> (.OR.)
> n_5(D) <= 9
> (.OR.)
> r_10(D) <= 9
> 
> use2:
> 
> m_7(D) > 100
> (.OR.)
> n_5(D) <= 9
> (.OR.)
> r_10(D) <= 19
> 
> 
> For foo2, the predicates are properly built, but the patch has a bug in
> predicate set inclusion testing leading to a false negative.
> 
> David
> 
> (In reply to Jakub Jelinek from comment #9)
> > Created attachment 31496 [details]
> > cleanups
> > 
> > I had also a brief look at this recently, but haven't made progress beyond
> > attached formatting/cleanup patch so far (plus only dumping details with
> > *-details, otherwise the dumps are pretty much inconsistent).
> > My conclusion has also been that to fix this up the predicates would need to
> > be normalized before comparison, and simplified, at least if the initial
> > subset check fails.  On uninit-pred-8_b.c there has been additional
> > complication because PRE decides to change the IL and we end up with:
> >   <bb 3>:
> >   pretmp_24 = r_10(D) <= 19;
> >   goto <bb 5>;
> > 
> >   <bb 4>:
> >   _11 = r_10(D) <= 19;
> >   _13 = l_12(D) != 0;
> >   _14 = _11 | _13;
> >   if (_14 != 0)
> >     goto <bb 14>;
> >   else
> >     goto <bb 15>;
> > 
> >   <bb 14>:
> >   goto <bb 3>;
> >   
> >   <bb 15>:
> > 
> >   <bb 5>:
> >   # v_1 = PHI <v_15(D)(15), r_10(D)(3)>
> >   # prephitmp_30 = PHI <0(15), pretmp_24(3)>
> >   if (m_7(D) != 0)
> >     goto <bb 6>;
> >   else
> >     goto <bb 7>;
> > 
> >   <bb 6>:
> >   g.0_17 = g;
> >   g.1_18 = g.0_17 + 1;
> >   g = g.1_18;
> >   goto <bb 8>;
> >   
> >   <bb 7>:
> >   bar ();
> > 
> >   <bb 8>:
> >   _3 = prephitmp_30 | _9;
> >   if (_3 != 0)
> > so the prephitmp_30 would need to be canonicalized into _14 != 0 && r_10(D)
> > <= 19 aka (r_10(D) <= 19 || l_12(D) != 0) && r_10(D) <= 19 and from there to
> > r_10(D) <= 19.
Comment 12 Jakub Jelinek 2014-01-03 11:00:30 UTC
Author: davidxl
Date: Fri Jan  3 00:40:57 2014
New Revision: 206309

URL: http://gcc.gnu.org/viewcvs?rev=206309&root=gcc&view=rev
Log:
Fix PR/59303 -- uninit analysis enhancement

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-uninit.c