Summary: | [4.3 Regression] a+=2; a+=2 not simplified to a+=4; with -O3 (ok with gcc-4.2.1) | ||
---|---|---|---|
Product: | gcc | Reporter: | Wouter Vermaelen <vermaelen.wouter> |
Component: | tree-optimization | Assignee: | Richard Biener <rguenth> |
Status: | RESOLVED FIXED | ||
Severity: | minor | CC: | gcc-bugs, rguenth |
Priority: | P3 | Keywords: | alias, missed-optimization, TREE |
Version: | 4.3.0 | ||
Target Milestone: | 4.3.0 | ||
Host: | Target: | ||
Build: | Known to work: | 4.2.1 | |
Known to fail: | 4.3.0 | Last reconfirmed: | 2007-09-03 14:36:17 |
Description
Wouter Vermaelen
2007-09-03 11:23:52 UTC
# VUSE <tab_56, SMT.9_58, SMT.10_60> D.2581_35 = this_2(D)->D.2503.a; D.2582_36 = (unsigned int) D.2581_35; D.2583_37 = D.2582_36 + 2; D.2584_38 = (int) D.2583_37; # tab_76 = VDEF <tab_56> # SMT.9_77 = VDEF <SMT.9_58> D.2529_3->a = D.2584_38; # VUSE <tab_76, SMT.9_77, SMT.10_60> D.2586_40 = this_2(D)->D.2503.a; D.2587_41 = (unsigned int) D.2586_40; D.2588_42 = D.2587_41 + 2; D.2589_43 = (int) D.2588_42; # tab_78 = VDEF <tab_76> # SMT.9_79 = VDEF <SMT.9_77> D.2529_3->a = D.2589_43; hmmm, aliasing The problem is that forwprop doesn't propagate addr_exprs to memory reference stmts in early optimization anymore (due to the volatile issues) and value numbering cannot deal with the different (but same) load/store addresses: D.2605_34 = &this_2(D)->D.2527; # VUSE <tab_57, SMT.9_59, SMT.11_63> D.2606_35 = this_2(D)->D.2527.a; ... # tab_77 = VDEF <tab_57> # SMT.11_78 = VDEF <SMT.11_63> D.2605_34->a = D.2609_38; (also aliasing computes different answers here, for whatever reason). With scheduling an extra forwprop pass before FRE the second _load_ of a is eliminated, but DSE still cannot figure the dead store: # VUSE <tab_57, SMT.9_59, SMT.11_63> D.2606_35 = this_2(D)->D.2527.a; D.2607_36 = (unsigned int) D.2606_35; D.2608_37 = D.2607_36 + 2; D.2609_38 = (int) D.2608_37; # tab_77 = VDEF <tab_57> # SMT.9_93 = VDEF <SMT.9_59> # SMT.11_78 = VDEF <SMT.11_63> this_2(D)->D.2527.a = D.2609_38; D.2612_41 = (unsigned int) D.2609_38; D.2613_42 = D.2612_41 + 2; D.2614_43 = (int) D.2613_42; # tab_79 = VDEF <tab_77> # SMT.9_94 = VDEF <SMT.9_93> # SMT.11_80 = VDEF <SMT.11_78> this_2(D)->D.2527.a = D.2614_43; That is, rtl level DSE removes the dead store: _ZN3CPU7executeEv: .LFB5: pushq %rbx .LCFI0: movq %rdi, %rbx leaq 8(%rdi), %rdi call _ZN5Clock1fEv .p2align 4,,10 .p2align 3 .L6: movl 2064(%rbx), %eax shrl $8, %eax mov %eax, %eax movq 16(%rbx,%rax,8), %rdx testq %rdx, %rdx je .L2 movzbl 2064(%rbx), %eax addl $4, 8(%rbx) movzbl (%rdx,%rax), %eax .L3: movzbl %al, %eax addl $2, 8(%rbx) salq $4, %rax movq _ZN3CPU3tabE(%rax), %rdx testb $1, %dl jne .L4 movq %rbx, %rdi addq _ZN3CPU3tabE+8(%rax), %rdi call *%rdx jmp .L6 .p2align 4,,10 .p2align 3 .L4: movq %rbx, %rdi addq _ZN3CPU3tabE+8(%rax), %rdi movq (%rdi), %rax movq -1(%rdx,%rax), %rdx call *%rdx jmp .L6 .L2: movq %rbx, %rdi call _ZN3CPU8readSlowEv .p2align 4,,6 .p2align 3 jmp .L3 .LFE5: I have a patch that makes it work apart from the tree level DSE issue. Subject: Bug 33291 Author: rguenth Date: Tue Sep 4 08:38:56 2007 New Revision: 128068 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=128068 Log: 2007-09-04 Richard Guenther <rguenther@suse.de> PR tree-optimization/33291 * tree-pretty-print.c (dump_generic_node): Dump all qualifiers for pointer types, not only first. Dump qualifiers for aggregate types as well. * tree-ssa-ccp.c (maybe_fold_offset_to_array_ref): Always use the canonical type for building ARRAY_REFs. * gimplify.c (canonicalize_addr_expr): Clean up. The correct validness check is compatibility of the pointer types. Always use the canonical type for building ARRAY_REFs and ADDR_EXPRs. * tree-ssa-forwprop.c (forward_propagate_addr_expr): Revert change that disabled propagation of ADDR_EXPRs into statements with volatile ops. * gcc.dg/volatile2.c: New testcase. * gcc.dg/pr32721.c: Adjust volatile reference pattern. * gcc.dg/tree-ssa/forwprop-1.c: Remove xfail. * gcc.dg/tree-ssa/forwprop-2.c: Likewise. * gcc.dg/tree-ssa/pr17141-1.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/volatile2.c Modified: trunk/gcc/ChangeLog trunk/gcc/gimplify.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/pr32721.c trunk/gcc/testsuite/gcc.dg/tree-ssa/forwprop-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/forwprop-2.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr17141-1.c trunk/gcc/tree-pretty-print.c trunk/gcc/tree-ssa-ccp.c trunk/gcc/tree-ssa-forwprop.c Fixed. Thanks for looking into this so quickly! I confirm the problem is solved for the reduced testcase. However in my original code the dead-store is not eliminated. Do you want me to file a separate bug report for that? .... mov (%rbx),%edx movzbl %cl,%edi lea 0x3(%rdx),%r8d add $0x5,%edx mov %r8d,(%rbx) movzbl (%rsi,%rdi,1),%eax mov %edx,(%rbx) .... Yes please. |