85180 – Infinite loop in RTL DSE optimizer

Bug 85180 - Infinite loop in RTL DSE optimizer

Summary: Infinite loop in RTL DSE optimizer

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	8.0.1

Importance:	P3 normal
Target Milestone:	8.0
Assignee:	Richard Biener

URL:
Keywords:	compile-time-hog

Duplicates (1):	85862 (view as bug list)
Depends on:
Blocks:

Reported:	2018-04-03 15:53 UTC by Ulrich Weigand
Modified:	2023-07-22 02:57 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:	s390x-ibm-linux
Build:
Known to work:	8.0
Known to fail:	7.3.1
Last reconfirmed:	2018-04-04 00:00:00

Attachments
Test case - run with "cc1plus -O" (774 bytes, text/plain) 2018-04-03 15:53 UTC, Ulrich Weigand	Details
patch (1.69 KB, patch) 2018-04-05 08:26 UTC, Richard Biener	Details \| Diff
alternative patch (1.58 KB, patch) 2018-04-05 09:50 UTC, Richard Biener	Details \| Diff
other alternative (1.72 KB, patch) 2018-04-05 10:03 UTC, Richard Biener	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ulrich Weigand 2018-04-03 15:53:16 UTC

Created attachment 43828 [details]
Test case - run with "cc1plus -O"

When attempting to compile the attached testcase simply with "cc1plus -O" on a s390x-ibm-linux target, the compilation process never terminates.  The problem appear to originate in the dse.c pass; building with -fno-dse makes the problem go away.

I'm not completely sure that this is really an infinite loop, strictly speaking, or just some exponential time behavior somewhere.  In any case, at the time the compiler hangs, it sits in a long chain of find_base_term calls ultimately originating at the canon_output_dependence in dse.c:1593 (record_store).

Comment 1 Richard Biener 2018-04-04 08:48:59 UTC

find_base_term is known to be quadratic... (there's some dups for this I bet).  I suppose you tested trunk.

I guess since this is present for such a long time now we should simply limit
the with and/or depth of the find_base_term recursion given it's safe to
return 0.

Comment 2 Richard Biener 2018-04-04 09:41:11 UTC

Patched find_base_term that tracks width/depth shows

#0  rhs_regno (x=0x7f3e30 <_start>)
    at /space/rguenther/src/svn/early-lto-debug/gcc/rtl.h:1895
#1  0x0000000000bb724d in find_base_term (x=0x7ffff6a84f18, width=118, 
    depth=176) at /space/rguenther/src/svn/early-lto-debug/gcc/alias.c:1894

where one major offender of width is the CSELIB loc iteration.  Adding a
hash-set to see if we recurse into already visited refs shows:

#1  0x0000000000bb7252 in find_base_term (x=0x2805d60, width=61, depth=121, 
    set=...) at /space/rguenther/src/svn/early-lto-debug/gcc/alias.c:1886
1886      gcc_assert (!set.add (x));

so indeed this is an endless recursion, not just one that is very deep.

(gdb) p debug_rtx (x)
(value:DI 11:4356 @0x2805d60/0x27f5e40)

coming via

(gdb) p debug_rtx (x)
(minus:DI (value:DI 20:20 @0x2805e38/0x27f5ff0)
    (value:DI 11:4356 @0x2805d60/0x27f5e40))

(gdb) p debug_rtx (x)
(value:DI 21:4465 @0x2805e50/0x27f6020)

(plus:DI (value:DI 21:4465 @0x2805e50/0x27f6020)
    (const_int 1 [0x1]))
...

not sure if values should ever form a cycle this way?

Anyhow - limiting search width/depth would of course catch this.

Comment 3 Jakub Jelinek 2018-04-04 15:22:50 UTC

Simplified testcase:
struct words
{
#define V(n) short word_##n;
#define W(n) V(n)
#define X(n) W(n##0) W(n##1) W(n##2) W(n##3) W(n##4) W(n##5) W(n##6) W(n##7) W(n##8) W(n##9)
#define Y X(0) X(1) X(2) X(3) X(4) X(5)
  Y
};

static char *pbuf, **pbuf_ptrs;

void
foo (struct words *w)
{
  char *aa = pbuf, **cptr = pbuf_ptrs;

  *cptr = aa;
#undef V
#define V(n) \
  __builtin_sprintf (*(cptr++), "word_" #n " %d ", w->word_##n); \
  *cptr = (aa += __builtin_strlen (aa) + 1);
  Y

  __builtin_sprintf (*(cptr++), " ");
  *cptr = (aa += __builtin_strlen (aa) + 1);
}

void
bar (char *x, char **y)
{
  pbuf = x;
  pbuf_ptrs = y;
}

Infinite recursion is not possible and is not what's going on here, that is prevented through:
      f = val->locs;
      /* Temporarily reset val->locs to avoid infinite recursion.  */
      val->locs = NULL;
...
      val->locs = f;
in find_base_term.

I think the problem is rather that for PLUS and MINUS we recurse up to twice (once for each argument) and for VALUEs we can recurse even more times if it has long locs list, so if we are unlucky it can be exponential, one single toplevel find_base_term call recursing transitively on a particular VALUE many times.

Now, I guess caching find_base_term return value for each VALUE (or perhaps delayed, look up say 20 VALUEs without caching and only then build the cache and start caching) could be a way out of this, but the val->locs = NULL; hack above will not be very good for that, because then we might cache a negative answer even for VALUEs that would yield non-NULL answer if we called find_base_term for it from the toplevel.  Dunno if that is acceptable or not.

I also wonder if we couldn't (either in new struct cselib_val field or on the side table) just record find_base_term value (and find_base_value?) of each VALUE we create when we process the instruction that created the VALUE.  Or is the find_base_term not meant to be sticky for each VALUE forever and is supposed to change as further insns are processed (invalidate locs from VALUEs etc.)?

Comment 4 Jakub Jelinek 2018-04-04 15:30:37 UTC

Actually, find_base_value is probably ok, it doesn't handle VALUEs and for PLUS/MINUS it just guesses one operand on which to recurse, rather than both.

Another possibility is to add some counter and count how many VALUEs we've processed for one toplevel call (or how many recursions we've done) and limit that by some constant or param, and just return NULL after we hit that limit.

Comment 5 Jakub Jelinek 2018-04-04 16:55:04 UTC

Another testcase running into this.

char *bar (void);
__INTPTR_TYPE__ baz (void);

void
foo (__INTPTR_TYPE__ *q)
{
  char *p = bar ();
  __INTPTR_TYPE__ a = baz ();
  __INTPTR_TYPE__ b = baz ();
  int i = 0;
#define X q[i++] = a; q[i++] = b; a = a + b; b = b + a;
#define Y X X X X X X X X X X
#define Z Y Y Y Y Y Y Y Y Y Y
  Z Z Z Z Z Z Z Z Z Z
  p[a] = 1;
  p[b] = 2;
}

With Y X X X instead of the 10 Zs I see with -O1 -ftime-report:
 dead store elim1                   :   0.43 ( 81%)   0.00 (  0%)   0.43 ( 81%)       6 kB (  0%)
With Y X X X X:
 dead store elim1                   :   1.15 ( 88%)   0.00 (  0%)   1.15 ( 88%)       7 kB (  0%)
With Y X X X X X:
 dead store elim1                   :   3.29 ( 92%)   0.00 (  0%)   3.29 ( 91%)       7 kB (  0%)
With Y X X X X X X:
 dead store elim1                   :   9.18 ( 93%)   0.00 (  0%)   9.19 ( 93%)       8 kB (  0%)
With Y X X X X X X X:
 dead store elim1                   :  24.71 ( 94%)   0.00 (  0%)  24.73 ( 94%)       8 kB (  1%)
With Y X X X X X X X X:
 dead store elim1                   :  68.89 ( 94%)   0.00 (  0%)  68.94 ( 94%)       9 kB (  1%)
With Y X X X X X X X X X:
 dead store elim1                   : 190.42 ( 95%)   0.00 (  0%) 190.56 ( 95%)       9 kB (  1%)
With Y Y:
 dead store elim1                   : 522.59 ( 95%)   0.00 (  0%) 522.96 ( 95%)      10 kB (  1%)

Comment 6 Richard Biener 2018-04-05 08:19:03 UTC

(In reply to Jakub Jelinek from comment #4)
> Actually, find_base_value is probably ok, it doesn't handle VALUEs and for
> PLUS/MINUS it just guesses one operand on which to recurse, rather than both.

find_base_value is ok in that it walks the full expression once but the
linear complexity breaks down once the expression forms a tree (thus we
get sharing, only possible via VALUEs I guess).

> Another possibility is to add some counter and count how many VALUEs we've
> processed for one toplevel call (or how many recursions we've done) and
> limit that by some constant or param, and just return NULL after we hit that
> limit.

Hmm, so the val->locs = NULL trick would need to be extended to only "pop"
the NULLs before the toplevel call returns.

The following fixes the testcase.  On the testcase most of the time the
stack remains empty, once we have 1 entry and 4 times we have 236
(the problematic one I guess).  On your last testcase (which is also
fixed) I get 3 times 2000 and 2001 times 2001 while in the majority
of calls 0, 1 or 2 (added printf of visited_vals.length ()).

> ./cc1 -quiet t.c -O -I include 2>&1 | sort | uniq -c
 795295 0
 130751 1
   1488 2
      3 2000
   2001 2001

Index: gcc/alias.c
===================================================================
--- gcc/alias.c (revision 259082)
+++ gcc/alias.c (working copy)
@@ -1876,7 +1876,8 @@ rtx_equal_for_memref_p (const_rtx x, con
 }
 
 static rtx
-find_base_term (rtx x)
+find_base_term (rtx x, vec<std::pair<cselib_val *,
+                                    struct elt_loc_list *> > &visited_vals)
 {
   cselib_val *val;
   struct elt_loc_list *l, *f;
@@ -1910,7 +1911,7 @@ find_base_term (rtx x)
     case POST_DEC:
     case PRE_MODIFY:
     case POST_MODIFY:
-      return find_base_term (XEXP (x, 0));
+      return find_base_term (XEXP (x, 0), visited_vals);
 
     case ZERO_EXTEND:
     case SIGN_EXTEND:  /* Used for Alpha/NT pointers */
@@ -1921,7 +1922,7 @@ find_base_term (rtx x)
        return 0;
 
       {
-       rtx temp = find_base_term (XEXP (x, 0));
+       rtx temp = find_base_term (XEXP (x, 0), visited_vals);
 
        if (temp != 0 && CONSTANT_P (temp))
          temp = convert_memory_address (Pmode, temp);
@@ -1940,7 +1941,9 @@ find_base_term (rtx x)
        return static_reg_base_value[STACK_POINTER_REGNUM];
 
       f = val->locs;
-      /* Temporarily reset val->locs to avoid infinite recursion.  */
+      /* Reset val->locs to avoid infinite recursion.  */
+      if (f)
+       visited_vals.safe_push (std::make_pair (val, f));
       val->locs = NULL;
 
       for (l = f; l; l = l->next)
@@ -1949,16 +1952,15 @@ find_base_term (rtx x)
            && !CSELIB_VAL_PTR (l->loc)->locs->next
            && CSELIB_VAL_PTR (l->loc)->locs->loc == x)
          continue;
-       else if ((ret = find_base_term (l->loc)) != 0)
+       else if ((ret = find_base_term (l->loc, visited_vals)) != 0)
          break;
 
-      val->locs = f;
       return ret;
 
     case LO_SUM:
       /* The standard form is (lo_sum reg sym) so look only at the
          second operand.  */
-      return find_base_term (XEXP (x, 1));
+      return find_base_term (XEXP (x, 1), visited_vals);
 
     case CONST:
       x = XEXP (x, 0);
@@ -1984,7 +1986,7 @@ find_base_term (rtx x)
           other operand is the base register.  */
 
        if (tmp1 == pic_offset_table_rtx && CONSTANT_P (tmp2))
-         return find_base_term (tmp2);
+         return find_base_term (tmp2, visited_vals);
 
        /* If either operand is known to be a pointer, then prefer it
           to determine the base term.  */
@@ -2001,12 +2003,12 @@ find_base_term (rtx x)
           term is from a pointer or is a named object or a special address
           (like an argument or stack reference), then use it for the
           base term.  */
-       rtx base = find_base_term (tmp1);
+       rtx base = find_base_term (tmp1, visited_vals);
        if (base != NULL_RTX
            && ((REG_P (tmp1) && REG_POINTER (tmp1))
                 || known_base_value_p (base)))
          return base;
-       base = find_base_term (tmp2);
+       base = find_base_term (tmp2, visited_vals);
        if (base != NULL_RTX
            && ((REG_P (tmp2) && REG_POINTER (tmp2))
                 || known_base_value_p (base)))
@@ -2020,7 +2022,7 @@ find_base_term (rtx x)
 
     case AND:
       if (CONST_INT_P (XEXP (x, 1)) && INTVAL (XEXP (x, 1)) != 0)
-       return find_base_term (XEXP (x, 0));
+       return find_base_term (XEXP (x, 0), visited_vals);
       return 0;
 
     case SYMBOL_REF:
@@ -2032,6 +2034,16 @@ find_base_term (rtx x)
     }
 }
 
+static rtx
+find_base_term (rtx x)
+{
+  auto_vec<std::pair<cselib_val *, struct elt_loc_list *>, 32> visited_vals;
+  rtx res = find_base_term (x, visited_vals);
+  for (unsigned i = 0; i < visited_vals.length (); ++i)
+    visited_vals[i].first->locs = visited_vals[i].second;
+  return res;
+}
+
 /* Return true if accesses to address X may alias accesses based
    on the stack pointer.  */

Comment 7 Richard Biener 2018-04-05 08:26:06 UTC

Created attachment 43849 [details]
patch

Patch I am testing.

Comment 8 Jakub Jelinek 2018-04-05 08:45:45 UTC

If find_base_term always returned whatever first returned non-NULL up to the ultimate caller, then I think the above would work fine.  Sadly, that is the case only in most of the spots, not all of them.
The PLUS/MINUS case is the major exception (and *_EXTEND too, but that doesn't matter):
        rtx base = find_base_term (tmp1);
        if (base != NULL_RTX
            && ((REG_P (tmp1) && REG_POINTER (tmp1))
                 || known_base_value_p (base)))
          return base;
        base = find_base_term (tmp2);
        if (base != NULL_RTX
            && ((REG_P (tmp2) && REG_POINTER (tmp2))
                 || known_base_value_p (base)))
          return base;
If tmp1 or tmp2 aren't REGs with REG_POINTER (the REG with REG_POINTER case isn't that interesting, because the find_base_term recursion is one level only in that case and no VALUEs are involved), then we only return whatever has been returned if known_base_value_p (base).  So other bases (that is group (1), incoming arguments, right?) might be returned before your patch and not after it, e.g. if
we at toplevel call find_base_term on a VALUE that has (plus (value 123) (value 345)) and (value 456) locs, value 123 has (minus (value 456) (value 345)), value 345 doesn't have a base term and value 456 is incoming argument.  When recursing on the plus and minus, we see that find_base_term on 456 returned non-NULL base that is !known_base_value_p (base) and return NULL, then we walk value 456 again and at this point without your patch we would return the incoming ADDRESS, but with the patch don't, because we've already walked it.

Perhaps those cases are sufficiently rare that we don't care, still, it would be nice to gather at least some statistics on this on multiple targets (say x86_64 and powerpc64) bootstraps/regtests, tweak your patch so that the 2 argument find_base_term can have the second argument NULL and in that case it behaves the old way and call it twice, once with NULL argument and then with non-NULL one and compare the result, gather statistics on how many toplevel find_base_term calls there were and how many out of them resulted in different result.

Comment 9 rguenther@suse.de 2018-04-05 09:04:25 UTC

On Thu, 5 Apr 2018, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180
> 
> --- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> If find_base_term always returned whatever first returned non-NULL up to the
> ultimate caller, then I think the above would work fine.  Sadly, that is the
> case only in most of the spots, not all of them.
> The PLUS/MINUS case is the major exception (and *_EXTEND too, but that doesn't
> matter):
>         rtx base = find_base_term (tmp1);
>         if (base != NULL_RTX
>             && ((REG_P (tmp1) && REG_POINTER (tmp1))
>                  || known_base_value_p (base)))
>           return base;
>         base = find_base_term (tmp2);
>         if (base != NULL_RTX
>             && ((REG_P (tmp2) && REG_POINTER (tmp2))
>                  || known_base_value_p (base)))
>           return base;
> If tmp1 or tmp2 aren't REGs with REG_POINTER (the REG with REG_POINTER case
> isn't that interesting, because the find_base_term recursion is one level only
> in that case and no VALUEs are involved), then we only return whatever has been
> returned if known_base_value_p (base).  So other bases (that is group (1),
> incoming arguments, right?) might be returned before your patch and not after
> it, e.g. if
> we at toplevel call find_base_term on a VALUE that has (plus (value 123) (value
> 345)) and (value 456) locs, value 123 has (minus (value 456) (value 345)),
> value 345 doesn't have a base term and value 456 is incoming argument.  When
> recursing on the plus and minus, we see that find_base_term on 456 returned
> non-NULL base that is !known_base_value_p (base) and return NULL, then we walk
> value 456 again and at this point without your patch we would return the
> incoming ADDRESS, but with the patch don't, because we've already walked it.
> 
> Perhaps those cases are sufficiently rare that we don't care, still, it would
> be nice to gather at least some statistics on this on multiple targets (say
> x86_64 and powerpc64) bootstraps/regtests, tweak your patch so that the 2
> argument find_base_term can have the second argument NULL and in that case it
> behaves the old way and call it twice, once with NULL argument and then with
> non-NULL one and compare the result, gather statistics on how many toplevel
> find_base_term calls there were and how many out of them resulted in different
> result.

So we could conservatively address this by

  unsigned saved_stackpos = visited_vals.length ();
  rtx base = find_base_term (tmp1, visited_vals);
  if (base != NULL_RTX
            && ((REG_P (tmp1) && REG_POINTER (tmp1))
                 || known_base_value_p (base)))
          return base;
  if (base != NULL_RTX)
    unwind-visited_vals-to-saved-stackpos;

and the same for the tmp2 case.  But I'm not sure whether that's a good
idea since it looks like we may end up re-introducing the exponential
complexity if we never return any of the found bases.

Fixing this case with keeping the current behavior would instead ask
for caching the outcome of find_base_term on VALUEs where we then
can remove the val->locs frobbing.

The question is whether that is worth it (as you say) and what
the cost of doing so is.  It looks like we may be able to
change cselib_val in a way to make a direct-mapped cache
possible without too much overhead?  find_base_term is quite
performance sensitive IIRC.  OTOH cselib_val might be size-sensitive...
Using a bit in the RTX and re-using cselib_val->val_rtx and keeping
an unwinding stack for that might be possible if the more trivial
implementation with a hash-map is too expensive.

Comment 10 Richard Biener 2018-04-05 09:50:41 UTC

Created attachment 43850 [details]
alternative patch

The most simple hash_map variant is the attached.  Comparing (-O0 optimized cc1)
compile-time for your testcase shows

hash-map:
> /usr/bin/time ./cc1 -quiet t.c -O -I include 
13.80user 0.01system 0:13.83elapsed 99%CPU (0avgtext+0avgdata 43556maxresident)k
0inputs+144outputs (0major+7967minor)pagefaults 0swaps

vector-unwind:
> /usr/bin/time ./cc1 -quiet t.c -O -I include 
12.10user 0.01system 0:12.12elapsed 99%CPU (0avgtext+0avgdata 43656maxresident)k
0inputs+144outputs (0major+7949minor)pagefaults 0swaps

just building alias.c with -O2:

hash-map:
11.28user 0.01system 0:11.30elapsed 99%CPU (0avgtext+0avgdata 43256maxresident)k
0inputs+144outputs (0major+7855minor)pagefaults 0swaps

vector-unwind:
11.03user 0.01system 0:11.04elapsed 100%CPU (0avgtext+0avgdata 43168maxresident)k
0inputs+144outputs (0major+7872minor)pagefaults 0swaps

caching directly in an enlarged cselib_val yields in

11.03user 0.02system 0:11.05elapsed 99%CPU (0avgtext+0avgdata 43148maxresident)k
0inputs+144outputs (0major+7852minor)pagefaults 0swaps

Index: gcc/alias.c
===================================================================
--- gcc/alias.c (revision 259082)
+++ gcc/alias.c (working copy)
@@ -1939,6 +1939,9 @@ find_base_term (rtx x)
       if (cselib_sp_based_value_p (val))
        return static_reg_base_value[STACK_POINTER_REGNUM];
 
+      if (val->base_term != (rtx)-1)
+       return val->base_term;
+      val->base_term = NULL_RTX;
       f = val->locs;
       /* Temporarily reset val->locs to avoid infinite recursion.  */
       val->locs = NULL;
@@ -1953,6 +1956,7 @@ find_base_term (rtx x)
          break;
 
       val->locs = f;
+      val->base_term = ret;
       return ret;
 
     case LO_SUM:

Comment 11 Richard Biener 2018-04-05 09:51:51 UTC

So it looks like caching in cselib_val->rtx and unwinding that via a stack might be the fastest variant (if we care)?

Comment 12 Jakub Jelinek 2018-04-05 09:56:31 UTC

True, but might eat more compile time memory.
Further, the question stands, is what find_base_term returns for a particular VALUE cacheable for the whole duration between cselib_init and cselib_finish in each pass, or not?  Would be nice to again do instrumented bootstrap/regtest to gather data whether it ever differs and how often if it does.

Comment 13 Richard Biener 2018-04-05 10:03:31 UTC

Created attachment 43851 [details]
other alternative

Like this.

11.00user 0.01system 0:11.01elapsed 99%CPU (0avgtext+0avgdata 43204maxresident)k
0inputs+144outputs (0major+7848minor)pagefaults 0swaps

needed to avoid calling cselib_sp_based_value_p because that didn't like the
"messed up" val_rtx.  Now the question is whether that's too dangerous given
we don't control FIND_BASE_TERM as defined by targets... (I could guard that
with !VALUE and re-do that below in the VALUE case after the caching, but
eventually that function may recurse itself and wreck the whole thing
anyways...  it also looks like it is invented to do some RTX massaging
before finding the base value, not find it itself?)

Comment 14 Richard Biener 2018-04-06 08:31:24 UTC

Author: rguenth
Date: Fri Apr  6 08:30:52 2018
New Revision: 259166

URL: https://gcc.gnu.org/viewcvs?rev=259166&root=gcc&view=rev
Log:
2018-04-06  Richard Biener  <rguenther@suse.de>

	PR middle-end/85180
	* alias.c (find_base_term): New wrapper around find_base_term
	unwinding CSELIB_VAL_PTR changes.
	(find_base_term): Do not restore CSELIB_VAL_PTR during the
	recursion.

	* gcc.dg/pr85180.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/pr85180.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/alias.c
    trunk/gcc/testsuite/ChangeLog

Comment 15 Richard Biener 2018-04-06 08:46:42 UTC

Fixed for GCC 8.

Comment 16 romain.naour 2018-04-30 08:21:19 UTC

Hi,

gcc 7.3.0 is affected by this bug but only on microblaze architecture, see [1].
Do you plan to backport this patch on gcc 7.x?
It is safe to do so without take the risk to break something with other
architecture or optimization level?

Best regards,
Romain

[1] http://lists.busybox.net/pipermail/buildroot/2018-April/220156.html

Comment 17 Richard Biener 2018-04-30 08:24:31 UTC

(In reply to romain.naour from comment #16)
> Hi,
> 
> gcc 7.3.0 is affected by this bug but only on microblaze architecture, see
> [1].
> Do you plan to backport this patch on gcc 7.x?
> It is safe to do so without take the risk to break something with other
> architecture or optimization level?
> 
> Best regards,
> Romain
> 
> [1] http://lists.busybox.net/pipermail/buildroot/2018-April/220156.html

The bug isn't a regression so technically it doesn't qualify.  OTOH it
looks reasonably safe to backport and the bug is annoying.  Jakub, would
you be ok with a backport?

Comment 18 Romain Naour 2018-05-09 10:19:58 UTC

(In reply to Richard Biener from comment #17)
> (In reply to romain.naour from comment #16)
> > Hi,
> > 
> > gcc 7.3.0 is affected by this bug but only on microblaze architecture, see
> > [1].
> > Do you plan to backport this patch on gcc 7.x?
> > It is safe to do so without take the risk to break something with other
> > architecture or optimization level?
> > 
> > Best regards,
> > Romain
> > 
> > [1] http://lists.busybox.net/pipermail/buildroot/2018-April/220156.html
> 
> The bug isn't a regression so technically it doesn't qualify.  OTOH it
> looks reasonably safe to backport and the bug is annoying.  Jakub, would
> you be ok with a backport?

Ping?

Comment 19 rguenther@suse.de 2018-05-09 10:35:44 UTC

On Wed, 9 May 2018, romain.naour at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180
> 
> Romain Naour <romain.naour at gmail dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |romain.naour at gmail dot com
> 
> --- Comment #18 from Romain Naour <romain.naour at gmail dot com> ---
> (In reply to Richard Biener from comment #17)
> > (In reply to romain.naour from comment #16)
> > > Hi,
> > > 
> > > gcc 7.3.0 is affected by this bug but only on microblaze architecture, see
> > > [1].
> > > Do you plan to backport this patch on gcc 7.x?
> > > It is safe to do so without take the risk to break something with other
> > > architecture or optimization level?
> > > 
> > > Best regards,
> > > Romain
> > > 
> > > [1] http://lists.busybox.net/pipermail/buildroot/2018-April/220156.html
> > 
> > The bug isn't a regression so technically it doesn't qualify.  OTOH it
> > looks reasonably safe to backport and the bug is annoying.  Jakub, would
> > you be ok with a backport?
> 
> Ping?

Jakub said it doesn't qualify give it's too risky.

Comment 20 Matt Weber 2018-05-10 12:56:32 UTC

Another datapoint that libnss 3.35 fails to build on the microblaze arch (uclibc) with any of the 6.x series and this bug's patch resolves that (https://github.com/gcc-mirror/gcc/commit/df03ebc3574a0d7893127e3b9754a01abf2d8b70).  

microblazeel-buildroot-linux-uclibc-gcc -o Linux2.6_microblazeel_microblazeel-buildroot-linux-uclibc-gcc.br_real_glibc_PTH_OPT.OBJ/sslgrp.o -c -O2 -fPIC   -pipe -ffunction-sections -fdata-sections -DHAVE_STRERROR -DLINUX -Dlinux -Wall -Werror -DXP_UNIX -UDEBUG -DNDEBUG -D_REENTRANT -DNSS_NO_INIT_SUPPORT -DUSE_UTIL_DIRECTLY -DNO_NSPR_10_SUPPORT -DSSL_DISABLE_DEPRECATED_CIPHER_SUITE_NAMES -Isysroot/usr/include/nspr -Ilibnss-3.35/dist/include -I../../../dist/public/nss -I../../../dist/private/nss  sslgrp.c


However the libnss build works fine with the 7.x branch and master.  I've been bisecting all 3 branches for a few days and figured I should just try this patch before debugging the 6.x branch further.  I even went back and tried to find common ancestors between the 6.x and 7.x and couldn't get a point where the good/bad builds matched up.  Guessing backports on 6.x from master introduced the libnss bug variant as the gcc/cse* and gcc/alias.x files had a lot of updates from 5.x to 7.x.

Here's the stack trace on cc1 when its hung building a libnss sslgrp.c file.......
( A lot more find_base_terms() before this one)
#71 0x00000000005ae67d in find_base_term(rtx_def*) ()
#72 0x00000000005ae532 in find_base_term(rtx_def*) ()
#73 0x00000000005ae67d in find_base_term(rtx_def*) ()
#74 0x00000000005ae532 in find_base_term(rtx_def*) ()
#75 0x00000000005ae67d in find_base_term(rtx_def*) ()
#76 0x00000000005ae568 in find_base_term(rtx_def*) ()
#77 0x00000000005b137d in write_dependence_p(rtx_def const*, rtx_def const*, machine_mode, rtx_def*, bool, bool, bool) ()
#78 0x00000000005b1585 in canon_anti_dependence(rtx_def const*, bool, rtx_def const*, machine_mode, rtx_def*) ()
#79 0x000000000061c5e3 in cselib_invalidate_mem(rtx_def*) ()
#80 0x000000000061d32d in cselib_record_sets(rtx_insn*) ()
#81 0x000000000061e5c8 in cselib_process_insn(rtx_insn*) ()
#82 0x00000000008774b8 in reload_cse_regs_1() ()
#83 0x00000000008777dc in (anonymous namespace)::pass_postreload_cse::execute(function*) ()
#84 0x000000000086714d in execute_one_pass(opt_pass*) ()

Comment 21 Richard Biener 2018-05-22 13:52:56 UTC

*** Bug 85862 has been marked as a duplicate of this bug. ***