Bug 61964

Summary: [4.8/4.9 regression] krb5 database propagation enters infinite loop; reduced test case
Product: gcc Reporter: Anders Kaseorg <andersk>
Component: tree-optimizationAssignee: Richard Biener <rguenth>
Status: RESOLVED FIXED    
Severity: major CC: ghudson, mikpelinux, tom, vries
Priority: P3 Keywords: wrong-code
Version: 4.8.3   
Target Milestone: 4.8.4   
See Also: https://launchpad.net/bugs/1347147
Host: Target:
Build: Known to work: 4.10.0, 4.8.4, 4.9.2
Known to fail: 4.8.3 Last reconfirmed: 2014-07-31 00:00:00
Bug Depends on: 62030    
Bug Blocks:    
Attachments: Alternative patch

Description Anders Kaseorg 2014-07-30 13:21:26 UTC
Kerberos is miscompiled by gcc-4.8.  The impact is detailed at https://bugs.launchpad.net/bugs/1347147, but here is a reduced test case.  The expected return is 0, but when compiled with gcc-4.8 -O2, it returns 1.

$ cat bug.c

struct node { struct node *next, *prev; } node;
struct head { struct node *first; } heads[5];
int k = 2;
struct head *head = &heads[2];

int main()
{
  node.prev = (void *)head;
  head->first = &node;

  struct node *n = head->first;
  struct head *h = &heads[k];

  if (n->prev == (void *)h)
    h->first = n->next;
  else
    n->prev->next = n->next;

  n->next = h->first;
  return n->next == &node;
}

$ gcc-4.7 -Wall -O2 bug.c -o bug; ./bug; echo $?
0
$ gcc-4.8 -Wall -O2 bug.c -o bug; ./bug; echo $?
1
$ gcc-4.9 -Wall -O2 bug.c -o bug; ./bug; echo $?
0
$ dpkg -l gcc-4.7 gcc-4.8 gcc-4.9
[…]
ii  gcc-4.7  4.7.4-2ubuntu1  amd64  GNU C compiler
ii  gcc-4.8  4.8.3-6ubuntu1  amd64  GNU C compiler
ii  gcc-4.9  4.9.1-3ubuntu2  amd64  GNU C compiler

I bisected the point where the problem disappeared between 4.8 and 4.9 at r202525.  However, I don’t understand why.  I’m scared by the fact that r202525 was intended to fix a “missed-optimization” bug (bug 58404).
Comment 1 Richard Biener 2014-07-30 13:39:40 UTC
The testcase is violating strict-aliasing rules as you access a struct head
as struct node here:

  if (n->prev == (void *)h)
    h->first = n->next;
  else
    n->prev->next = n->next;

as n->prev points to &heads[0] while h is &heads[2] (an out-of-bound pointer).
So n->prev is a struct head and you access a next field of a struct node of it.

Changing k to 0 makes the testcase pass (now you don't run into the bogus
path).
Comment 2 ghudson 2014-07-30 14:31:43 UTC
How do you conclude that n->prev points to &heads[0]?  node.prev receives the value (void *)head, where head is initialized to &heads[2].  I cannot see any uses of &heads[0] in the test program.
Comment 3 Anders Kaseorg 2014-07-30 20:21:04 UTC
(In reply to Richard Biener from comment #1)
> The testcase is violating strict-aliasing rules as you access a struct head
> as struct node here:

Agree with ghudson here: n->prev points to &heads[2], not &heads[0].

Although assigning a casted struct head * to a struct node * is arguably sloppy, the standard does not prohibit it, as long as it is never dereferenced in that form.
Comment 4 Anders Kaseorg 2014-07-31 04:19:53 UTC
Another bisect between 4.7 and 4.8 shows that the bug appeared with r189321 (bug 52009).

My test case has triggers the bug in more versions than Kerberos does: as far as I can tell, Kerberos was unaffected until r192604.
Comment 5 Mikael Pettersson 2014-07-31 09:29:28 UTC
I've been staring as this test case, and I cannot find any dereference of a wrong-typed pointer value.  The only oddity I can find is that at

  if (n->prev == (void *)h)

n == &node, n->prev == (struct node *)&heads[2] (so wrong-typed), h == &heads[2], so there is a '==' being applied to a wrong-typed pointer.  Is that undefined behaviour?  I'll note that changing the test to

  if ((void *)n->prev == (void *)h)

still reproduces the wrong-code while looking technically Ok.

Also, there is no out-of-bounds error.
Comment 6 Andreas Schwab 2014-07-31 09:48:33 UTC
Equality test against pointer to void is explicitly allowed by the standard and implicitly converts the other pointer to void*.
Comment 7 Richard Biener 2014-07-31 10:05:28 UTC
(In reply to Anders Kaseorg from comment #4)
> Another bisect between 4.7 and 4.8 shows that the bug appeared with r189321
> (bug 52009).
> 
> My test case has triggers the bug in more versions than Kerberos does: as
> far as I can tell, Kerberos was unaffected until r192604.

Thanks - that pin-points it.  tail-merging concludes that

  <bb 3>:
  _11 = n_7->next;
  MEM[(struct head *)_10].first = _11;
  goto <bb 5>;

and

  <bb 4>:
  _13 = n_7->next;
  _10->next = _13;

are equivalent.  But they are not - the stores are performed using
different alias sets.

Note that the actual miscompile happens during RTL instruction scheduling.

In 4.9 and trunk tail-merging is faced with

  <bb 3>:
  _11 = n_7->next;
  MEM[(struct head *)&heads][k.1_8].first = _11;
  goto <bb 5>;

  <bb 4>:
  _13 = n_7->next;
  _10->next = _13;

instead but I bet the issue is still there.

So it simply does (on the 4.8 branch):

    case GIMPLE_ASSIGN:
      lhs1 = gimple_get_lhs (s1);
      lhs2 = gimple_get_lhs (s2);
      if (TREE_CODE (lhs1) != SSA_NAME
          && TREE_CODE (lhs2) != SSA_NAME)
        return (vn_valueize (gimple_vdef (s1))
                == vn_valueize (gimple_vdef (s2)));

which shows that we value-number the VDEFs the same.

IMHO VDEF value-numbering is dangerous here.

I have a patch.
Comment 8 Tom de Vries 2014-07-31 12:19:46 UTC
Using this patch on the example from the description field, I can modify the example on the command line:
...
$ diff -u bug-orig.c bug-mod.c 
--- bug-orig.c	2014-07-31 14:00:50.663275717 +0200
+++ bug-mod.c	2014-07-31 14:01:49.403276412 +0200
@@ -11,7 +11,7 @@
   struct node *n = head->first;
   struct head *h = &heads[k];
 
-  if (n->prev == (void *)h)
+  if (FORCE n->prev == (void *)h)
     h->first = n->next;
   else
     n->prev->next = n->next;
...

1. -DFORCE="" gives the original
2. -DFORCE="1 ||" forces the condition to true
3. -DFORCE="0 &&" forces the confition to false

In this experiment, we don't use tree-tail-merge:
...
$ gcc -DFORCE="1 ||" bug-mod.c -O2 -fno-strict-aliasing -fno-tree-tail-merge && ./a.out ; echo $?
0
$ gcc -DFORCE="1 ||" bug-mod.c -O2 -fstrict-aliasing -fno-tree-tail-merge && ./a.out ; echo $?
0
$ gcc -DFORCE="0 &&" bug-mod.c -O2 -fno-strict-aliasing -fno-tree-tail-merge && ./a.out ; echo $?
0
$ gcc -DFORCE="0 &&" bug-mod.c -O2 -fstrict-aliasing -fno-tree-tail-merge && ./a.out ; echo $?
1
...

The last result seems to suggest that the example is not type-safe.

My understanding is that the problem is in the line:
  n->prev->next = n->next;
where we effectively do:
  /* ((struct node*)&heads[2])->next = node.next */
which is type-unsafe.
Comment 9 rguenther@suse.de 2014-07-31 12:24:06 UTC
On Thu, 31 Jul 2014, vries at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61964
> 
> vries at gcc dot gnu.org changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |vries at gcc dot gnu.org
> 
> --- Comment #8 from vries at gcc dot gnu.org ---
> Using this patch on the example from the description field, I can modify the
> example on the command line:
> ...
> $ diff -u bug-orig.c bug-mod.c 
> --- bug-orig.c    2014-07-31 14:00:50.663275717 +0200
> +++ bug-mod.c    2014-07-31 14:01:49.403276412 +0200
> @@ -11,7 +11,7 @@
>    struct node *n = head->first;
>    struct head *h = &heads[k];
> 
> -  if (n->prev == (void *)h)
> +  if (FORCE n->prev == (void *)h)
>      h->first = n->next;
>    else
>      n->prev->next = n->next;
> ...
> 
> 1. -DFORCE="" gives the original
> 2. -DFORCE="1 ||" forces the condition to true
> 3. -DFORCE="0 &&" forces the confition to false
> 
> In this experiment, we don't use tree-tail-merge:
> ...
> $ gcc -DFORCE="1 ||" bug-mod.c -O2 -fno-strict-aliasing -fno-tree-tail-merge &&
> ./a.out ; echo $?
> 0
> $ gcc -DFORCE="1 ||" bug-mod.c -O2 -fstrict-aliasing -fno-tree-tail-merge &&
> ./a.out ; echo $?
> 0
> $ gcc -DFORCE="0 &&" bug-mod.c -O2 -fno-strict-aliasing -fno-tree-tail-merge &&
> ./a.out ; echo $?
> 0
> $ gcc -DFORCE="0 &&" bug-mod.c -O2 -fstrict-aliasing -fno-tree-tail-merge &&
> ./a.out ; echo $?
> 1
> ...
> 
> The last result seems to suggest that the example is not type-safe.
> 
> My understanding is that the problem is in the line:
>   n->prev->next = n->next;
> where we effectively do:
>   /* ((struct node*)&heads[2])->next = node.next */
> which is type-unsafe.

But that line is never executed at runtime (well, unless tail
merging comes along and makes it the only version present).

Richard.
Comment 10 Richard Biener 2014-07-31 14:07:31 UTC
Author: rguenth
Date: Thu Jul 31 14:06:59 2014
New Revision: 213375

URL: https://gcc.gnu.org/viewcvs?rev=213375&root=gcc&view=rev
Log:
2014-07-31  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/61964
	* tree-ssa-tail-merge.c (gimple_equal_p): Handle non-SSA LHS solely
	by structural equality.

	* gcc.dg/torture/pr61964.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr61964.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-tail-merge.c
Comment 11 Richard Biener 2014-07-31 14:10:24 UTC
Fixed on trunk sofar.
Comment 12 Tom de Vries 2014-07-31 17:09:47 UTC
Created attachment 33220 [details]
Alternative patch

> But that line is never executed at runtime (well, unless tail
> merging comes along and makes it the only version present).

Ah, right, we consider a program with dead type-unsafe code valid.

This follow-up patch attempts to fix things less conservatively on trunk. Shall I put this through testing or do you see a problem with this approach?

Furthermore, I suspect that a similar issue exists for loads, I'll look into that.
Comment 13 Richard Biener 2014-08-01 07:32:07 UTC
(In reply to vries from comment #12)
> Created attachment 33220 [details]
> Alternative patch
> 
> > But that line is never executed at runtime (well, unless tail
> > merging comes along and makes it the only version present).
> 
> Ah, right, we consider a program with dead type-unsafe code valid.
> 
> This follow-up patch attempts to fix things less conservatively on trunk.
> Shall I put this through testing or do you see a problem with this approach?

Hum.  I don't like guarding optimizations with !flag_strict_aliasing, that is,
-fno-strict-aliasing shouldn't get us more optimization.

Also on trunk I'd like to rip out the use of the SCCVN lattice from
tail-merging as there FRE/PRE value-replace every SSA name which means
we don't need it.  The tight entanglement between PRE and tail-merge has
given me more headaches recently.

> Furthermore, I suspect that a similar issue exists for loads, I'll look into
> that.

I don't think so.
Comment 14 Richard Biener 2014-08-01 07:36:48 UTC
Author: rguenth
Date: Fri Aug  1 07:36:16 2014
New Revision: 213404

URL: https://gcc.gnu.org/viewcvs?rev=213404&root=gcc&view=rev
Log:
2014-08-01  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/61964
	* tree-ssa-tail-merge.c (gimple_equal_p): Handle non-SSA LHS solely
        by structural equality.

	* gcc.dg/torture/pr61964.c: New testcase.
	* gcc.dg/pr51879-18.c: XFAIL.

Added:
    branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/torture/pr61964.c
Modified:
    branches/gcc-4_9-branch/gcc/ChangeLog
    branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/pr51879-18.c
    branches/gcc-4_9-branch/gcc/tree-ssa-tail-merge.c
Comment 15 Richard Biener 2014-08-01 07:40:33 UTC
Author: rguenth
Date: Fri Aug  1 07:40:01 2014
New Revision: 213405

URL: https://gcc.gnu.org/viewcvs?rev=213405&root=gcc&view=rev
Log:
2014-08-01  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/61964
	* tree-ssa-tail-merge.c (gimple_operand_equal_value_p): New
	function merged from trunk.
	(gimple_equal_p): Handle non-SSA LHS solely by structural
	equality.

	* gcc.dg/torture/pr61964.c: New testcase.
	* gcc.dg/pr51879-18.c: XFAIL.

Added:
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/torture/pr61964.c
Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/pr51879-18.c
    branches/gcc-4_8-branch/gcc/tree-ssa-tail-merge.c
Comment 16 Richard Biener 2014-08-01 08:17:18 UTC
Fixed.
Comment 17 Anders Kaseorg 2014-08-01 08:18:37 UTC
Thanks.  I verified that GCC 4.8 r213405 fixes my test case and the original Kerberos problem.
Comment 18 Tom de Vries 2014-08-04 00:24:57 UTC
(In reply to Richard Biener from comment #13)
> (In reply to vries from comment #12)
> > Furthermore, I suspect that a similar issue exists for loads, I'll look into
> > that.
> 
> I don't think so.

How about PR62004 ?