static signed char v1; static unsigned char v2[256], v3[256]; __attribute__((noclone, noinline)) void foo (void) { asm volatile ("" : : "r" (&v1), "r" (v2), "r" (v3) : "memory"); } __attribute__((noclone, noinline)) int bar (const int x, const unsigned short *y, char *z) { int i; unsigned short u; if (!v1) foo (); for (i = 0; i < x; i++) { u = y[i]; z[i] = u < 0x0100 ? v2[u] : v3[u & 0xff]; } z[x] = '\0'; return x; } int main (void) { char buf[18]; unsigned short s[18]; unsigned char c[18] = "abcdefghijklmnopq"; int i; for (i = 0; i < 256; i++) { v2[i] = i; v3[i] = i + 1; } for (i = 0; i < 18; i++) s[i] = c[i]; s[5] |= 0x600; s[6] |= 0x500; s[11] |= 0x2000; s[15] |= 0x500; foo (); if (bar (17, s, buf) != 17 || __builtin_memcmp (buf, "abcdeghhijkmmnoqq", 18) != 0) __builtin_abort (); return 0; } is miscompiled on x86_64 at -O2, works with -O2 -fno-ree. Introduced with http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=182574
Consider following 3 routines: unsigned long foo (unsigned short *s, int i) { unsigned short x = s[i]; if (x < 0x211) return (unsigned char) x; asm volatile (""); return 6; } unsigned long bar (unsigned short *s, int i) { unsigned short x = s[i]; if (x < 0x211) return (unsigned char) x; asm volatile (""); return x; } unsigned long baz (unsigned short *s, int i) { unsigned short x = s[i]; if (x < 0x211) return x; asm volatile (""); return (unsigned char) x; } In foo ree doesn't optimize away the QImode -> DImode zero extension, because merge_def_and_ext checks: && GET_MODE (SET_DEST (*sub_rtx)) == ext_src_mode and in this case ext_src_mode is QImode, but SET_DEST has HImode. In bar ree doesn't optimize it away either, first merge_def_and_ext_checks is called with the QImode ext_src_mode (which doesn't match) and only afterwards on the HImode -> DImode extension, which is optimized (that is correct). In baz (which is the same as bar in #c0 testcase) unfortunately merge_def_and_ext is first called on the HImode -> DImode extension, which is optimized (that is fine) by turning the HImode load into a DImode zero_extend load from HImode and removing the HImode -> DImode extension. But then when make_defs_and_copies_lists is called on the QImode -> DImode extension, we reach: /* Initialize the work list. */ if (!get_defs (extend_insn, src_reg, &work_list)) { VEC_free (rtx, heap, work_list); /* The number of defs being equal to zero can only mean that all the definitions have been previously merged. */ return 2; } and because the definition has been changed already. But nothing performs the ext_src_mode check that used to be performed otherwise. So we either need to do the src mode checking earlier when we haven't started modifying insns (in add_removable_extension?), or we'd need to look even at the newly added defs and see what mode they extend from.
> But then when make_defs_and_copies_lists is called on the QImode -> DImode > extension, we reach: > /* Initialize the work list. */ > if (!get_defs (extend_insn, src_reg, &work_list)) > { > VEC_free (rtx, heap, work_list); > /* The number of defs being equal to zero can only mean that all the > definitions have been previously merged. */ > return 2; > } > and because the definition has been changed already. Yes, this particular return embodies some implicit assumptions... Will fix.
Completely untested fix: --- gcc/ree.c.jj 2011-12-28 10:52:44.000000000 +0100 +++ gcc/ree.c 2012-01-21 16:55:27.290731139 +0100 @@ -776,6 +776,23 @@ add_removable_extension (rtx x ATTRIBUTE } return; } + else + { + rtx set = single_set (DF_REF_INSN (def->ref)); + if (set == NULL_RTX + || !REG_P (SET_DEST (set)) + || GET_MODE (SET_DEST (set)) != GET_MODE (XEXP (src, 0))) + { + if (dump_file) + { + fprintf (dump_file, "Cannot eliminate extension:\n"); + print_rtl_single (dump_file, rei->insn); + fprintf (dump_file, + " because def insn has different mode\n"); + } + return; + } + } /* Then add the candidate to the list and insert the reaching definitions into the definition map. */
That unfortunately slightly pessimizes it. E.g. on i686-linux bootstrap on: ../../gcc/config/i386/i386.c ix86_memory_move_cost ../../gcc/config/i386/i386.c ix86_register_move_cost ../../../../libgfortran/io/write.c write_float ../../../libgfortran/io/write.c write_float s-regpat.adb system__regpat__dump_until /usr/src/gcc/gcc/testsuite/gcc.target/i386/avx-vpackuswb-1.c do_test /usr/src/gcc/gcc/testsuite/gcc.target/i386/sse2-packuswb-1.c do_test /usr/src/gcc/libstdc++-v3/testsuite/ext/rope/1.cc _S_tree_concat /usr/src/gcc/libstdc++-v3/testsuite/ext/rope/2.cc _S_tree_concat /usr/src/gcc/libstdc++-v3/testsuite/ext/rope/3.cc _S_tree_concat /usr/src/gcc/libstdc++-v3/testsuite/ext/rope/44963.cc _S_tree_concat /usr/src/gcc/libstdc++-v3/testsuite/ext/rope/4.cc _S_tree_concat /usr/src/gcc/libstdc++-v3/testsuite/ext/rope/pthread7-rope.cc _S_tree_concat and on x86_64-linux bootstrap sans regtest: ../../gcc/ada/par.adb par__prag__process_restrictions_or_restriction_warnings ../../gcc/config/i386/i386.c ix86_memory_move_cost ../../gcc/config/i386/i386.c ix86_register_move_cost ../../gcc/java/jcf-parse.c rewrite_reflection_indexes ../../../libgfortran/io/write.c write_float s-regpat.adb system__regpat__dump_until The problem is if the def insn satisfies is_cond_copy_insn, then it can be in a wider mode, yet all definitions could still be in a narrower mode. So, perhaps we'd need to have a recursive function that for is_cond_copy_insn defs calls get_defs on both args and recurses, checking mode of real definitions only. Not sure if we'd need also this is_insn_visited array for it to avoid recursing infinitely. BTW, for 4.8, I guess a couple of ree.c cleanups will be needed, e.g. several functions allocate and free way too many heap allocated vectors, I guess if we allocate them in the caller (or caller's caller) and just pass addresses of those vectors around, we could just VEC_truncate the vectors and avoid the permanent malloc/free.
> But then when make_defs_and_copies_lists is called on the QImode -> DImode > extension, we reach: > /* Initialize the work list. */ > if (!get_defs (extend_insn, src_reg, &work_list)) > { > VEC_free (rtx, heap, work_list); > /* The number of defs being equal to zero can only mean that all the > definitions have been previously merged. */ > return 2; > } > and because the definition has been changed already. But nothing performs the > ext_src_mode check that used to be performed otherwise. > So we either need to do the src mode checking earlier when we haven't started > modifying insns (in add_removable_extension?), or we'd need to look even at the > newly added defs and see what mode they extend from. OK, I didn't realize that there could be an implicit truncation hidden within an extension. And I think that your change to add_removable_extension is fine, as it still leaves the new flavor of the pass more powerful than the old one. The other PR is a different problem.
Created attachment 26410 [details] gcc47-pr51933.patch Patch I wrote last night. Passed bootstrap/regtest, the only cases where it prevented something was in the new testcase and in jcf-parse.c (rewrite_reflection_indexes) with -m64, where it fixed an miscompilation. What happened there is that some earlier bb (but in a loop) was doing a SImode->DImode zero extension, and later in the loop there was a SImode ior, followed by HImode->SImode zero extension of that, which fed the earlier SImode->DImode zero extension. The SImode->DImode insn was processed first, removed and changed the HImode->SImode zero extension into HImode->DImode zero extension. But as the HImode->SImode extension was changed, its df links were gone. When the HImode-> zero extension is later processed as candidate, get_defs returns NULL and we assume all the dependencies have been adjusted. Which leads me to thinking that this patch is a wrong approach, and perhaps we should instead DF_DEFER_INSN_RESCAN in ree.c and adjust, so that we can cope with the adjusted insns.
Created attachment 26414 [details] gcc47-pr51933.patch Untested fix with deferred insn rescanning. Reverts the PR51924 find_and_remove_re fix as it is no longer needed.
Author: jakub Date: Mon Jan 23 09:25:52 2012 New Revision: 183416 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183416 Log: PR rtl-optimization/51933 * ree.c (transform_ifelse): Return true right away if dstreg is already wider or equal to cand->mode. (enum ext_modified_kind, struct ext_modified, ext_state): New types. (make_defs_and_copies_lists): Remove defs_list and copies_list arguments, add state argument, just truncate state->work_list instead of always allocating and freeing the vector. Assert that get_defs succeeds instead of returning 2. Changed return type to bool. (merge_def_and_ext): Add state argument. If SET_DEST doesn't have ext_src_mode, see if it has been modified already with the right kind of extension and has been extended before from the ext_src_mode. If SET_DEST is already wider or equal to cand->mode, just return true. Remember the original mode in state->modified array. (combine_reaching_defs): Add state argument. Don't allocate and free here def_list, copied_list and vec vectors, instead just VEC_truncate the vectors in *state. Don't handle outcome == 2 here. (find_and_remove_re): Set DF_DEFER_INSN_RESCAN df flag. Add state variable, clear vectors in it, initialize state.modified if needed. Free all the vectors at the end and state.modified too. Don't skip a candidate if the extension expression has been modified. * gcc.c-torture/execute/pr51933.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr51933.c Modified: trunk/gcc/ChangeLog trunk/gcc/ree.c trunk/gcc/testsuite/ChangeLog
Fixed.