For gcc.target/i386/pad-10.c copyrename transforms foo2 (int z, int x) { int D.1754; <bb 2>: if (x_2(D) == 1) goto <bb 3>; else goto <bb 4>; <bb 3>: bar (); D.1754_4 = z_3(D); goto <bb 5>; <bb 4>: D.1754_5 = x_2(D) + z_3(D); <bb 5>: # D.1754_1 = PHI <D.1754_4(3), D.1754_5(4)> return D.1754_1; } to foo2 (int z, int x) { <bb 2>: if (x_2(D) == 1) goto <bb 3>; else goto <bb 4>; <bb 3>: bar (); z_4 = z_3(D); goto <bb 5>; <bb 4>: z_5 = x_2(D) + z_3(D); <bb 5>: # z_1 = PHI <z_4(3), z_5(4)> return z_1; } note the bogus stmts z_5 = x_2(D) + z_3(D); <bb 5>: # z_1 = PHI <z_4(3), z_5(4)> return z_1; which assign to z values different from the incoming parameter value. [the question is if copyrename is still useful now that we have VTA and will insert debug stmts for assignments to names we'd loose over a copyprop/dce combo]
Apart from generic brokeness of copyrename (as it leaks DECL_NAMEs relevant to debug info to places where another same DECL_NAME could be still live) this specific issue is here: for (i = 0; i < gimple_phi_num_args (phi); i++) { tree arg = gimple_phi_arg (phi, i)->def; if (TREE_CODE (arg) == SSA_NAME) updated |= copy_rename_partition_coalesce (map, res, arg, debug); } copy_rename_partition_coalesce is symmetric, but we only can assign arg the partition of res this way and at the end, if all args then have the same partition assign that to res.
Created attachment 27962 [details] patch Patch which breaks gcc.target/i386/pad-10.c for real (and other testcases due to dump differences). We now get the correct <bb 4>: D.1759_7 = x_3(D) + z_6(D); <bb 5>: # D.1759_1 = PHI <z_6(D)(3), D.1759_7(4)> return D.1759_1;
Alex, any opinion on whether copyrename is still useful for debug info purposes with VTA enabled? There is still the theoretical benefit of out-of-SSA coalescing, but we could do this renaming at out-of-SSA time instead of repeatedly and focus entirely on coalescing opportunities.
The other thing which copyrename helps is register allocation as it allows out of ssa to coalesce some ssa names which could not do before.
On Wed, 8 Aug 2012, pinskia at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54200 > > --- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-08-08 15:31:09 UTC --- > The other thing which copyrename helps is register allocation as it allows out > of ssa to coalesce some ssa names which could not do before. Yes, though that's a matter of adjusting what coalescing can coalesce (which is only limited to the sets it computes conflicts for)
Guality test that fails: /* PR tree-optimization/54200 */ /* { dg-do run } */ /* { dg-options "-g -fno-var-tracking-assignments" } */ int o __attribute__((used)); void bar (void) { o = 2; } int __attribute__((noinline,noclone)) foo (int z, int x, int b) { if (x == 1) { bar (); return z; } else { int a = (x + z) + b; return a; /* { dg-final { gdb-test 20 "z" "3" } } */ } } int main () { foo (3, 2, 1); return 0; } the patch fixes it, but not at -Os (need to investigate that).
Author: rguenth Date: Mon Aug 13 09:29:28 2012 New Revision: 190339 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190339 Log: 2012-08-13 Richard Guenther <rguenther@suse.de> PR tree-optimization/54200 * tree-ssa-copyrename.c (rename_ssa_copies): Do not add PHI results to another partition if not all PHI arguments have the same partition. * gcc.dg/guality/pr54200.c: New testcase. * gcc.dg/tree-ssa/slsr-8.c: Adjust. Added: trunk/gcc/testsuite/gcc.dg/guality/pr54200.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c trunk/gcc/tree-ssa-copyrename.c
Fixed as far as I am concerned.
I see following in report for x86: FAIL: gcc.dg/guality/pr54200.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects line 20 z == 3
(In reply to comment #9) > I see following in report for x86: > > FAIL: gcc.dg/guality/pr54200.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects line 20 z == 3 That's what I said in the commit mail.
Right! Sorry for the noise...
This patch regressed code quality on find_vma function in Linux kernel on s390x: struct rb_node { unsigned long __rb_parent_color; struct rb_node *rb_right; struct rb_node *rb_left; } __attribute__ ((aligned (sizeof (long)))); struct rb_root { struct rb_node *rb_node; }; struct vm_area_struct { unsigned long vm_start; unsigned long vm_end; struct vm_area_struct *vm_next, *vm_prev; struct rb_node vm_rb; }; struct mm_struct { struct vm_area_struct *mmap; struct rb_root mm_rb; struct vm_area_struct *mmap_cache; }; struct vm_area_struct * find_vma (struct mm_struct *mm, unsigned long addr) { struct vm_area_struct *vma = ((void *) 0); static _Bool __attribute__ ((__section__ (".data.unlikely"))) __warned; int __ret_warn_once = !!(!mm); if (__builtin_expect (!!(__ret_warn_once), 0)) { int __ret_warn_on = !!(!__warned); if (__builtin_expect (!!(__ret_warn_on), 0)) asm volatile ("": :"i" (1920), "i" (((1 << 0) | ((9) << 8))), "i" (64)); if (__builtin_expect (!!(__ret_warn_on), 0)) __warned = 1; } if (__builtin_expect (!!(__ret_warn_once), 0)) return ((void *) 0); vma = mm->mmap_cache; if (!(vma && vma->vm_end > addr && vma->vm_start <= addr)) { struct rb_node *rb_node; rb_node = mm->mm_rb.rb_node; vma = ((void *) 0); while (rb_node) { struct vm_area_struct *vma_tmp; const typeof (((struct vm_area_struct *)0)->vm_rb) *__mptr = rb_node; vma_tmp = (struct vm_area_struct *) ((char *) __mptr - __builtin_offsetof (struct vm_area_struct, vm_rb)); if (vma_tmp->vm_end > addr) { vma = vma_tmp; if (vma_tmp->vm_start <= addr) break; rb_node = rb_node->rb_left; } else rb_node = rb_node->rb_right; } if (vma) mm->mmap_cache = vma; } return vma; } with: -fno-strict-aliasing -fno-common -fno-delete-null-pointer-checks -O2 -m64 -mbackchain -msoft-float -march=z9-109 -mpacked-stack -mstack-size=16384 -fno-strength-reduce -fno-stack-protector -fomit-frame-pointer -fno-inline-functions-called-once -fconserve-stack The difference in *.optimized is just: - # vma_5 = PHI <0B(18), vma_2(15), vma_20(6), vma_2(16), 0B(7)> - return vma_5; + # _5 = PHI <0B(18), vma_2(15), vma_20(6), vma_2(16), 0B(7)> + return _5; but later on CSE2 decides to use REG_EQUAL somewhere for some reason, and we end up reading mm->mmap_cache twice, first into a register to do the comparisons of it, and then when we know it is the value we actually want to return, we just forget we have it in a pseudo and read it again from memory.
On Fri, 29 Mar 2013, jakub at gcc dot gnu.org wrote: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54200 > > Jakub Jelinek <jakub at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |jakub at gcc dot gnu.org > > --- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-03-29 15:36:15 UTC --- > This patch regressed code quality on find_vma function in Linux kernel on > s390x: > struct rb_node > { > unsigned long __rb_parent_color; > struct rb_node *rb_right; > struct rb_node *rb_left; > } __attribute__ ((aligned (sizeof (long)))); > struct rb_root { > struct rb_node *rb_node; > }; > struct vm_area_struct > { > unsigned long vm_start; > unsigned long vm_end; > struct vm_area_struct *vm_next, *vm_prev; > struct rb_node vm_rb; > }; > struct mm_struct > { > struct vm_area_struct *mmap; > struct rb_root mm_rb; > struct vm_area_struct *mmap_cache; > }; > struct vm_area_struct * > find_vma (struct mm_struct *mm, unsigned long addr) > { > struct vm_area_struct *vma = ((void *) 0); > static _Bool __attribute__ ((__section__ (".data.unlikely"))) __warned; > int __ret_warn_once = !!(!mm); > if (__builtin_expect (!!(__ret_warn_once), 0)) > { > int __ret_warn_on = !!(!__warned); > if (__builtin_expect (!!(__ret_warn_on), 0)) > asm volatile ("": :"i" (1920), "i" (((1 << 0) | ((9) << 8))), "i" (64)); > if (__builtin_expect (!!(__ret_warn_on), 0)) > __warned = 1; > } > if (__builtin_expect (!!(__ret_warn_once), 0)) > return ((void *) 0); > vma = mm->mmap_cache; > if (!(vma && vma->vm_end > addr && vma->vm_start <= addr)) > { > struct rb_node *rb_node; > rb_node = mm->mm_rb.rb_node; > vma = ((void *) 0); > while (rb_node) > { > struct vm_area_struct *vma_tmp; > const typeof (((struct vm_area_struct *)0)->vm_rb) *__mptr = rb_node; > vma_tmp = (struct vm_area_struct *) ((char *) __mptr - __builtin_offsetof > (struct vm_area_struct, vm_rb)); > if (vma_tmp->vm_end > addr) > { > vma = vma_tmp; > if (vma_tmp->vm_start <= addr) > break; > rb_node = rb_node->rb_left; > } > else > rb_node = rb_node->rb_right; > } > if (vma) > mm->mmap_cache = vma; > } > return vma; > } > > with: > -fno-strict-aliasing -fno-common -fno-delete-null-pointer-checks -O2 -m64 > -mbackchain -msoft-float -march=z9-109 -mpacked-stack -mstack-size=16384 > -fno-strength-reduce -fno-stack-protector -fomit-frame-pointer > -fno-inline-functions-called-once -fconserve-stack > > The difference in *.optimized is just: > - # vma_5 = PHI <0B(18), vma_2(15), vma_20(6), vma_2(16), 0B(7)> > - return vma_5; > + # _5 = PHI <0B(18), vma_2(15), vma_20(6), vma_2(16), 0B(7)> > + return _5; > but later on CSE2 decides to use REG_EQUAL somewhere for some reason, and we > end up reading mm->mmap_cache twice, first into a register to do the > comparisons of it, and then when we know it is the value we actually want to > return, we just forget we have it in a pseudo and read it again from memory. Surely not a bug with the patch but a bug in CSE. What the patch can change is how pseudos are coalesced. A "fix" may be in the coalescing code, not restrict coalescing to SSA names with the same underlying decl (with the cost issue of a bigger conflict map).
(In reply to Richard Biener from comment #10) > (In reply to comment #9) > > I see following in report for x86: > > > > FAIL: gcc.dg/guality/pr54200.c -O2 -flto -fuse-linker-plugin > > -fno-fat-lto-objects line 20 z == 3 > > That's what I said in the commit mail. Should this be XFAIL then, if it is working as intended in your mind?
On Sun, 10 Jul 2016, nightstrike at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54200 > > nightstrike <nightstrike at gmail dot com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |nightstrike at gmail dot com > > --- Comment #14 from nightstrike <nightstrike at gmail dot com> --- > (In reply to Richard Biener from comment #10) > > (In reply to comment #9) > > > I see following in report for x86: > > > > > > FAIL: gcc.dg/guality/pr54200.c -O2 -flto -fuse-linker-plugin > > > -fno-fat-lto-objects line 20 z == 3 > > > > That's what I said in the commit mail. > > Should this be XFAIL then, if it is working as intended in your mind? Not sure if my analysis is still up-to-date given copyrename is no more on GCC 6+. On trunk I see -Os fail on x86_64 and -O2 -flto on i?86. The testcase was to be a regression test for copyrename which was removed on GCC 6+. Note that guality XFAILs tend to be inherently target specific. Looking at the -Os RTL, the REG attrs seem bogus: (insn:TI 16 15 17 4 (parallel [ (set (reg:SI 0 ax [92]) (plus:SI (reg/v:SI 0 ax [orig:89 z ] [89]) (reg/v:SI 4 si [orig:90 x ] [90]))) (clobber (reg:CC 17 flags)) ]) /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.dg/guality/pr54200.c:19 211 {*addsi_1} (expr_list:REG_DEAD (reg/v:SI 4 si [orig:90 x ] [90]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (ok) (insn:TI 17 16 25 4 (parallel [ (set (reg/v:SI 0 ax [orig:89 z ] [89]) (plus:SI (reg:SI 0 ax [92]) (reg/v:SI 1 dx [orig:91 b ] [91]))) (clobber (reg:CC 17 flags)) ]) /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.dg/guality/pr54200.c:19 211 {*addsi_1} (expr_list:REG_DEAD (reg/v:SI 1 dx [orig:91 b ] [91]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) the destination regdecl is bogus, should be a, not z. (insn 25 17 38 4 (use (reg/i:SI 0 ax)) /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.dg/guality/pr54200.c:22 -1 (nil)) (note 38 25 33 4 NOTE_INSN_EPILOGUE_BEG) (jump_insn:TI 33 38 34 4 (simple_return) /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.dg/guality/pr54200.c:22 697 {simple_return_internal} When single-stepping through foo locals show various bougs debug-info values but I guess that is kind-of expected without VTA. The above shows that while copyrename was fixed to not assign a bogus decl to the PHI result later RTL expansion will do the same mistake again (by means of out-of-SSA coalescing), which -fno-tree-coalesce-vars fixes, but then later when the single return is split somehow bogus information is constructed for debugging. I suggest to open a new bugreport for this.