Bug 51933 - [4.7 regression] wrong code due to -free
Summary: [4.7 regression] wrong code due to -free
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.7.0
: P1 normal
Target Milestone: 4.7.0
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2012-01-21 14:37 UTC by Jakub Jelinek
Modified: 2012-01-23 09:28 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.6.3
Known to fail: 4.7.0
Last reconfirmed: 2012-01-21 00:00:00


Attachments
gcc47-pr51933.patch (1.86 KB, patch)
2012-01-22 09:42 UTC, Jakub Jelinek
Details | Diff
gcc47-pr51933.patch (4.34 KB, patch)
2012-01-22 18:59 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2012-01-21 14:37:04 UTC
static signed char v1;
static unsigned char v2[256], v3[256];

__attribute__((noclone, noinline)) void
foo (void)
{
  asm volatile ("" : : "r" (&v1), "r" (v2), "r" (v3) : "memory");
}

__attribute__((noclone, noinline)) int
bar (const int x, const unsigned short *y, char *z)
{
  int i;
  unsigned short u;
  if (!v1)
    foo ();
  for (i = 0; i < x; i++)
    {
      u = y[i];
      z[i] = u < 0x0100 ? v2[u] : v3[u & 0xff];
    }
  z[x] = '\0';
  return x;
}

int
main (void)
{
  char buf[18];
  unsigned short s[18];
  unsigned char c[18] = "abcdefghijklmnopq";
  int i;
  for (i = 0; i < 256; i++)
    {
      v2[i] = i;
      v3[i] = i + 1;
    }
  for (i = 0; i < 18; i++)
    s[i] = c[i];
  s[5] |= 0x600;
  s[6] |= 0x500;
  s[11] |= 0x2000;
  s[15] |= 0x500;
  foo ();
  if (bar (17, s, buf) != 17
      || __builtin_memcmp (buf, "abcdeghhijkmmnoqq", 18) != 0)
    __builtin_abort ();
  return 0;
}

is miscompiled on x86_64 at -O2, works with -O2 -fno-ree.
Introduced with http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=182574
Comment 1 Jakub Jelinek 2012-01-21 15:24:26 UTC
Consider following 3 routines:

unsigned long
foo (unsigned short *s, int i)
{
  unsigned short x = s[i];
  if (x < 0x211)
    return (unsigned char) x;
  asm volatile ("");
  return 6;
}
unsigned long
bar (unsigned short *s, int i)
{
  unsigned short x = s[i];
  if (x < 0x211)
    return (unsigned char) x;
  asm volatile ("");
  return x;
}
unsigned long
baz (unsigned short *s, int i)
{
  unsigned short x = s[i];
  if (x < 0x211)
    return x;
  asm volatile ("");
  return (unsigned char) x;
}

In foo ree doesn't optimize away the QImode -> DImode zero extension, because
merge_def_and_ext checks:
  && GET_MODE (SET_DEST (*sub_rtx)) == ext_src_mode
and in this case ext_src_mode is QImode, but SET_DEST has HImode.
In bar ree doesn't optimize it away either, first merge_def_and_ext_checks is called with the QImode ext_src_mode (which doesn't match) and only afterwards
on the HImode -> DImode extension, which is optimized (that is correct).
In baz (which is the same as bar in #c0 testcase) unfortunately merge_def_and_ext is first called on the HImode -> DImode extension, which is
optimized (that is fine) by turning the HImode load into a DImode zero_extend load from HImode and removing the HImode -> DImode extension.
But then when make_defs_and_copies_lists is called on the QImode -> DImode extension, we reach:
	  /* Initialize the work list.  */
	  if (!get_defs (extend_insn, src_reg, &work_list))
	    {
	      VEC_free (rtx, heap, work_list);
	      /* The number of defs being equal to zero can only mean that all the
	         definitions have been previously merged.  */
	      return 2;
	    }
and because the definition has been changed already.  But nothing performs the ext_src_mode check that used to be performed otherwise.
So we either need to do the src mode checking earlier when we haven't started modifying insns (in add_removable_extension?), or we'd need to look even at the newly added defs and see what mode they extend from.
Comment 2 Eric Botcazou 2012-01-21 15:44:46 UTC
> But then when make_defs_and_copies_lists is called on the QImode -> DImode
> extension, we reach:
>       /* Initialize the work list.  */
>       if (!get_defs (extend_insn, src_reg, &work_list))
>         {
>           VEC_free (rtx, heap, work_list);
>           /* The number of defs being equal to zero can only mean that all the
>              definitions have been previously merged.  */
>           return 2;
>         }
> and because the definition has been changed already.

Yes, this particular return embodies some implicit assumptions...  Will fix.
Comment 3 Jakub Jelinek 2012-01-21 15:57:23 UTC
Completely untested fix:
--- gcc/ree.c.jj	2011-12-28 10:52:44.000000000 +0100
+++ gcc/ree.c	2012-01-21 16:55:27.290731139 +0100
@@ -776,6 +776,23 @@ add_removable_extension (rtx x ATTRIBUTE
 	      }
 	    return;
 	  }
+	else
+	  {
+	    rtx set = single_set (DF_REF_INSN (def->ref));
+	    if (set == NULL_RTX
+		|| !REG_P (SET_DEST (set))
+		|| GET_MODE (SET_DEST (set)) != GET_MODE (XEXP (src, 0)))
+	      {
+		if (dump_file)
+		  {
+		    fprintf (dump_file, "Cannot eliminate extension:\n");
+		    print_rtl_single (dump_file, rei->insn);
+		    fprintf (dump_file,
+			     " because def insn has different mode\n");
+		  }
+		return;
+	      }
+	  }
 
       /* Then add the candidate to the list and insert the reaching definitions
          into the definition map.  */
Comment 4 Jakub Jelinek 2012-01-21 18:40:23 UTC
That unfortunately slightly pessimizes it.  E.g. on i686-linux bootstrap
on:
../../gcc/config/i386/i386.c ix86_memory_move_cost
../../gcc/config/i386/i386.c ix86_register_move_cost
../../../../libgfortran/io/write.c write_float
../../../libgfortran/io/write.c write_float
s-regpat.adb system__regpat__dump_until
/usr/src/gcc/gcc/testsuite/gcc.target/i386/avx-vpackuswb-1.c do_test
/usr/src/gcc/gcc/testsuite/gcc.target/i386/sse2-packuswb-1.c do_test
/usr/src/gcc/libstdc++-v3/testsuite/ext/rope/1.cc _S_tree_concat
/usr/src/gcc/libstdc++-v3/testsuite/ext/rope/2.cc _S_tree_concat
/usr/src/gcc/libstdc++-v3/testsuite/ext/rope/3.cc _S_tree_concat
/usr/src/gcc/libstdc++-v3/testsuite/ext/rope/44963.cc _S_tree_concat
/usr/src/gcc/libstdc++-v3/testsuite/ext/rope/4.cc _S_tree_concat
/usr/src/gcc/libstdc++-v3/testsuite/ext/rope/pthread7-rope.cc _S_tree_concat
and on x86_64-linux bootstrap sans regtest:
../../gcc/ada/par.adb par__prag__process_restrictions_or_restriction_warnings
../../gcc/config/i386/i386.c ix86_memory_move_cost
../../gcc/config/i386/i386.c ix86_register_move_cost
../../gcc/java/jcf-parse.c rewrite_reflection_indexes
../../../libgfortran/io/write.c write_float
s-regpat.adb system__regpat__dump_until

The problem is if the def insn satisfies is_cond_copy_insn, then it can be in a wider mode, yet all definitions could still be in a narrower mode.
So, perhaps we'd need to have a recursive function that for is_cond_copy_insn defs calls get_defs on both args and recurses, checking mode of real definitions only.  Not sure if we'd need also this is_insn_visited array for it to avoid recursing infinitely.

BTW, for 4.8, I guess a couple of ree.c cleanups will be needed, e.g. several functions allocate and free way too many heap allocated vectors, I guess if we allocate them in the caller (or caller's caller) and just pass addresses of those vectors around, we could just VEC_truncate the vectors and avoid the permanent malloc/free.
Comment 5 Eric Botcazou 2012-01-21 21:27:40 UTC
> But then when make_defs_and_copies_lists is called on the QImode -> DImode
> extension, we reach:
>       /* Initialize the work list.  */
>       if (!get_defs (extend_insn, src_reg, &work_list))
>         {
>           VEC_free (rtx, heap, work_list);
>           /* The number of defs being equal to zero can only mean that all the
>              definitions have been previously merged.  */
>           return 2;
>         }
> and because the definition has been changed already.  But nothing performs the
> ext_src_mode check that used to be performed otherwise.
> So we either need to do the src mode checking earlier when we haven't started
> modifying insns (in add_removable_extension?), or we'd need to look even at the
> newly added defs and see what mode they extend from.

OK, I didn't realize that there could be an implicit truncation hidden within an extension.  And I think that your change to add_removable_extension is fine, as it still leaves the new flavor of the pass more powerful than the old one.

The other PR is a different problem.
Comment 6 Jakub Jelinek 2012-01-22 09:42:05 UTC
Created attachment 26410 [details]
gcc47-pr51933.patch

Patch I wrote last night.  Passed bootstrap/regtest, the only cases where it prevented something was in the new testcase and in jcf-parse.c (rewrite_reflection_indexes) with -m64, where it fixed an miscompilation.  What happened there is that some earlier bb (but in a loop) was doing a SImode->DImode zero extension, and later in the loop there was a SImode ior, followed by HImode->SImode zero extension of that, which fed the earlier SImode->DImode zero extension.  The SImode->DImode insn was processed first, removed and changed the HImode->SImode zero extension into HImode->DImode zero extension.  But as the HImode->SImode extension was changed, its df links were gone.  When the HImode-> zero extension is later processed as candidate, get_defs returns NULL and we assume all the dependencies have been adjusted.

Which leads me to thinking that this patch is a wrong approach, and perhaps we should instead DF_DEFER_INSN_RESCAN in ree.c and adjust, so that we can cope
with the adjusted insns.
Comment 7 Jakub Jelinek 2012-01-22 18:59:43 UTC
Created attachment 26414 [details]
gcc47-pr51933.patch

Untested fix with deferred insn rescanning.  Reverts the PR51924 find_and_remove_re fix as it is no longer needed.
Comment 8 Jakub Jelinek 2012-01-23 09:25:55 UTC
Author: jakub
Date: Mon Jan 23 09:25:52 2012
New Revision: 183416

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183416
Log:
	PR rtl-optimization/51933
	* ree.c (transform_ifelse): Return true right away if dstreg is
	already wider or equal to cand->mode.
	(enum ext_modified_kind, struct ext_modified, ext_state): New types.
	(make_defs_and_copies_lists): Remove defs_list and copies_list
	arguments, add state argument, just truncate state->work_list
	instead of always allocating and freeing the vector.  Assert that
	get_defs succeeds instead of returning 2.  Changed return type to
	bool.
	(merge_def_and_ext): Add state argument.  If SET_DEST doesn't
	have ext_src_mode, see if it has been modified already with the
	right kind of extension and has been extended before from the
	ext_src_mode.  If SET_DEST is already wider or equal to cand->mode,
	just return true.  Remember the original mode in state->modified
	array.
	(combine_reaching_defs): Add state argument.  Don't allocate and
	free here def_list, copied_list and vec vectors, instead just
	VEC_truncate the vectors in *state.  Don't handle outcome == 2
	here.
	(find_and_remove_re): Set DF_DEFER_INSN_RESCAN df flag.
	Add state variable, clear vectors in it, initialize state.modified
	if needed.  Free all the vectors at the end and state.modified too.
	Don't skip a candidate if the extension expression has been modified.

	* gcc.c-torture/execute/pr51933.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr51933.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ree.c
    trunk/gcc/testsuite/ChangeLog
Comment 9 Jakub Jelinek 2012-01-23 09:28:26 UTC
Fixed.