This is GCC Bugzilla
This is GCC Bugzilla Version 2.20+
View Bug Activity | Format For Printing | Clone This Bug
Load PRE is scheduled for 4.2, I'm creating this bug because load PRE is currently split between CSE and GCSE (this bug blocks the "optimizations caught by CSE" meta-bug, PR19721). Given this code, unsigned outcnt; extern void flush_outbuf(void); void bi_windup(unsigned char *outbuf, unsigned char bi_buf) { outbuf[outcnt] = bi_buf; if (outcnt == 16384) flush_outbuf(); outbuf[outcnt] = bi_buf; } we'd want the code to become void bi_windup(unsigned char *outbuf, unsigned char bi_buf) { int t1 = outcnt; outbuf[t1] = bi_buf; int t2 = outcnt, t3; if (t2 == 16384) { flush_outbuf(); t3 = outcnt; } else t3 = t2; outbuf[t3] = bi_buf; } currently this optimization is split between CSE (with path following) and GCSE. Actually, GCSE is able to do it alone if the number of GCSE passes is increased.
Subject: Re: New: load PRE is missing On Thu, 2005-08-18 at 07:55 +0000, bonzini at gcc dot gnu dot org wrote: > Load PRE is scheduled for 4.2, I'm creating this bug because load PRE is > currently split between CSE and GCSE (this bug blocks the "optimizations caught > by CSE" meta-bug, PR19721). Given this code, > > unsigned outcnt; > extern void flush_outbuf(void); > > void > bi_windup(unsigned char *outbuf, unsigned char bi_buf) > { > outbuf[outcnt] = bi_buf; > if (outcnt == 16384) > flush_outbuf(); > outbuf[outcnt] = bi_buf; > } > > we'd want the code to become > > void > bi_windup(unsigned char *outbuf, unsigned char bi_buf) > { > int t1 = outcnt; > outbuf[t1] = bi_buf; > int t2 = outcnt, t3; > if (t2 == 16384) { > flush_outbuf(); > t3 = outcnt; > } else > t3 = t2; > outbuf[t3] = bi_buf; > } > This doesn't remove a single load. In fact, i'm not sure what you think is better about this code. However, it certainly won't be caught by load PRE.
outcnt aliases with outbuf
Subject: Re: load PRE is missing On Thu, 2005-08-18 at 13:44 +0000, bonzini at gcc dot gnu dot org wrote: > ------- Additional Comments From bonzini at gcc dot gnu dot org 2005-08-18 13:44 ------- > outcnt aliases with outbuf > Then there is nothing to optimize in this testcase. If you look at the code you want, you have exactly the same number of loads from the global along the main path, and have actually *added* an extra load along the other path. This is *not* an optimization.
Sorry. outcnt aliases with outbuf, so the load of t2 cannot be removed. The GIMPLE code that is now emitted is something like: void bi_windup(unsigned char *outbuf, unsigned char bi_buf) { int t1 = outcnt; outbuf[t1] = bi_buf; int t2 = outcnt; if (t2 == 16384) flush_outbuf(); int t3 = outcnt; outbuf[t3] = bi_buf; } where t3 == t2 if flush_outbuf is not called. My code removes the load into t3 if flush_outbuf is not called.
Reconfirmed after the first load PRE patch went in, as it does not handle globals.
Ehm, wouldn't "unsigned char *" alias everything? GCSE doesn't do load PRE for me on the original test case, either. And, as long as "outbuf[outcnt] = bi_buf;" has a V_MAY_DEF for outcnt, you're not going to see any load PRE happening here. With a changed test case, GCSE _does_ perform load PRE: unsigned long outcnt; extern void flush_outbuf(void); void bi_windup(unsigned int *outbuf, unsigned int bi_buf) { unsigned long t1 = outcnt; outbuf[t1] = bi_buf; unsigned long t2 = outcnt; if (t2 == 16384) flush_outbuf(); unsigned long t3 = outcnt; outbuf[t3] = bi_buf; } This gives me the following asm on x86-64: bi_windup: movq %rbx, -16(%rsp) movq %rbp, -8(%rsp) subq $24, %rsp movq outcnt(%rip), %rax movq %rdi, %rbp movl %esi, %ebx cmpq $16384, %rax movl %esi, (%rbp,%rax,4) je .L6 .L2: movl %ebx, (%rbp,%rax,4) movq 8(%rsp), %rbx movq 16(%rsp), %rbp addq $24, %rsp ret .p2align 4,,7 .L6: call flush_outbuf movq outcnt(%rip), %rax jmp .L2 Looking at the .pre tree dump, there isn't anything obvious in the way of doing load PRE here AFAICT: bi_windup (outbufD.1866, bi_bufD.1867) { unsigned intD.3 storetmp.21D.1918; long unsigned intD.4 t3D.1872; long unsigned intD.4 t2D.1871; long unsigned intD.4 t1D.1870; unsigned intD.3 * D.1878; unsigned intD.3 * D.1877; long unsigned intD.4 D.1876; unsigned intD.3 * D.1875; unsigned intD.3 * D.1874; long unsigned intD.4 D.1873; # BLOCK 2 freq:10000 # PRED: ENTRY [100.0%] (fallthru,exec) # VUSE <outcntD.1863_1>; t1D.1870_2 = outcntD.1863; D.1873_3 = t1D.1870_2 * 4; D.1874_4 = (unsigned intD.3 *) D.1873_3; D.1875_6 = D.1874_4 + outbufD.1866_5; # TMT.4D.1900_16 = V_MAY_DEF <TMT.4D.1900_15>; *D.1875_6 = bi_bufD.1867_7; if (t1D.1870_2 == 16384) goto <L0>; else goto <L2>; # SUCC: 3 [26.2%] (true,exec) 5 [73.8%] (false,exec) # BLOCK 5 freq:7378 # PRED: 2 [73.8%] (false,exec) <L2>:; goto <bb 4> (<L1>); # SUCC: 4 [100.0%] (fallthru) # BLOCK 3 freq:2622 # PRED: 2 [26.2%] (true,exec) <L0>:; # outcntD.1863_18 = V_MAY_DEF <outcntD.1863_1>; # TMT.4D.1900_19 = V_MAY_DEF <TMT.4D.1900_16>; flush_outbuf (); # SUCC: 4 [100.0%] (fallthru,exec) # BLOCK 4 freq:10000 # PRED: 5 [100.0%] (fallthru) 3 [100.0%] (fallthru,exec) # TMT.4D.1900_14 = PHI <TMT.4D.1900_16(5), TMT.4D.1900_19(3)>; # outcntD.1863_13 = PHI <outcntD.1863_1(5), outcntD.1863_18(3)>; <L1>:; # VUSE <outcntD.1863_13>; t3D.1872_9 = outcntD.1863; D.1876_10 = t3D.1872_9 * 4; D.1877_11 = (unsigned intD.3 *) D.1876_10; D.1878_12 = D.1877_11 + outbufD.1866_5; # TMT.4D.1900_17 = V_MAY_DEF <TMT.4D.1900_14>; *D.1878_12 = bi_bufD.1867_7; return; # SUCC: EXIT [100.0%] }
With an even more modified test case, load PRE does happen: unsigned long outcnt; extern void flush_outbuf(void); void bi_windup(unsigned int *outbuf, unsigned int bi_buf, unsigned long *outcnt) { unsigned long t1 = *outcnt; outbuf[t1] = bi_buf; unsigned long t2 = *outcnt; if (t2 == 16384) flush_outbuf(); unsigned long t3 = *outcnt; outbuf[t3] = bi_buf; } --> .pre tree dump bi_windup (outbufD.1866, bi_bufD.1867, outcntD.1868) { unsigned intD.3 * prephitmp.26D.1925; unsigned intD.3 * pretmp.25D.1924; long unsigned intD.4 prephitmp.24D.1923; long unsigned intD.4 pretmp.23D.1922; unsigned intD.3 storetmp.22D.1921; long unsigned intD.4 t3D.1873; long unsigned intD.4 t2D.1872; long unsigned intD.4 t1D.1871; unsigned intD.3 * D.1879; unsigned intD.3 * D.1878; long unsigned intD.4 D.1877; unsigned intD.3 * D.1876; unsigned intD.3 * D.1875; long unsigned intD.4 D.1874; # BLOCK 2 freq:10000 # PRED: ENTRY [100.0%] (fallthru,exec) # VUSE <TMT.4D.1902_15>; t1D.1871_2 = *outcntD.1868_1; D.1874_3 = t1D.1871_2 * 4; D.1875_4 = (unsigned intD.3 *) D.1874_3; D.1876_6 = D.1875_4 + outbufD.1866_5; # TMT.5D.1903_17 = V_MAY_DEF <TMT.5D.1903_16>; *D.1876_6 = bi_bufD.1867_7; if (t1D.1871_2 == 16384) goto <L0>; else goto <L2>; # SUCC: 3 [26.2%] (true,exec) 5 [73.8%] (false,exec) # BLOCK 5 freq:7378 # PRED: 2 [73.8%] (false,exec) <L2>:; goto <bb 4> (<L1>); # SUCC: 4 [100.0%] (fallthru) # BLOCK 3 freq:2622 # PRED: 2 [26.2%] (true,exec) <L0>:; # TMT.4D.1902_19 = V_MAY_DEF <TMT.4D.1902_15>; # TMT.5D.1903_20 = V_MAY_DEF <TMT.5D.1903_17>; flush_outbuf (); # VUSE <TMT.4D.1902_19>; pretmp.23D.1922_23 = *outcntD.1868_1; pretmp.23D.1922_25 = pretmp.23D.1922_23 * 4; pretmp.25D.1924_27 = (unsigned intD.3 *) pretmp.23D.1922_25; pretmp.25D.1924_29 = outbufD.1866_5 + pretmp.25D.1924_27; # SUCC: 4 [100.0%] (fallthru,exec) # BLOCK 4 freq:10000 # PRED: 5 [100.0%] (fallthru) 3 [100.0%] (fallthru,exec) # prephitmp.26D.1925_30 = PHI <D.1876_6(5), pretmp.25D.1924_29(3)>; # prephitmp.26D.1925_28 = PHI <D.1875_4(5), pretmp.25D.1924_27(3)>; # prephitmp.24D.1923_26 = PHI <D.1874_3(5), pretmp.23D.1922_25(3)>; # prephitmp.24D.1923_24 = PHI <t1D.1871_2(5), pretmp.23D.1922_23(3)>; # TMT.5D.1903_14 = PHI <TMT.5D.1903_17(5), TMT.5D.1903_20(3)>; # TMT.4D.1902_13 = PHI <TMT.4D.1902_15(5), TMT.4D.1902_19(3)>; <L1>:; t3D.1873_9 = prephitmp.24D.1923_24; D.1877_10 = prephitmp.24D.1923_26; D.1878_11 = prephitmp.26D.1925_28; D.1879_12 = prephitmp.26D.1925_30; # TMT.5D.1903_18 = V_MAY_DEF <TMT.5D.1903_14>; *D.1879_12 = bi_bufD.1867_7; return; # SUCC: EXIT [100.0%] }
> Ehm, wouldn't "unsigned char *" alias everything? GCSE doesn't do load PRE for > me on the original test case, either. Yes, I was checking if load PRE removes the redundant load of t3 if flush_outbuf is not called. See comment #4.
Load PRE on globals is going to take a while, because we do some stupid things in hashing (like hashing tcc_declaration's by pointer, instead of DECL_UID).
Re. comment #9, I don't understand how the way we hash would make load PRE harder. Could you elaborate a bit on what is missing or done "wrong" for load PRE of globals??
The thread starting at http://gcc.gnu.org/ml/gcc-patches/2006-01/msg00373.html addresses my question in comment #10.
*** Bug 35286 has been marked as a duplicate of this bug. ***
*** Bug 25553 has been marked as a duplicate of this bug. ***
*** Bug 35287 has been marked as a duplicate of this bug. ***
Patch here: http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00430.html We should re-evaluate the need for gcse.c load-PRE after Daniel's patch goes in. The last time I looked at what loads gcse.c load-PRE would move, it only touched loads from globals, so it may be a redundant pass when tree-ssa-pre moves them.
Subject: Bug 23455 Author: dberlin Date: Tue Jul 8 16:11:06 2008 New Revision: 137631 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=137631 Log: 2008-07-05 Daniel Berlin <dberlin@dberlin.org> Fix PR tree-optimization/23455 Fix PR tree-optimization/35286 Fix PR tree-optimization/35287 * Makefile.in (OBJS-common): Remove tree-vn.o. tree-vn.o: Remove. * dbgcnt.def: Add treepre_insert debug counter. * gcc/tree-flow.h (add_to_value): Updated for other changes. (debug_value_expressions): Ditto. (print_value_expressions): Ditto. * tree-pretty-print.c (dump_generic_node): Updated for VALUE_HANDLE removal. * tree-ssa-dom.c (record_equality): Ditto. (cprop_operand): Ditto. (lookup_avail_expr): Ditto. * tree-ssa-threadedge.c (record_temporary_equivalences_from_stmts_at_dest): Ditto. (simplify_control_stmt_condition): Ditto. * tree.c (tree_code_size): Ditto. (tree_node_structure): Ditto. (iterative_hash_expr): Ditto. * tree.def: Ditto. * tree.h (VALUE_HANDLE_ID): Ditto. (VALUE_HANDLE_EXPR_SET): Ditto. (struct tree_value_handle): Ditto. (union tree_node): Ditto. * treestruct.def: Ditto. * tree-vn.c: Removed. * tree-ssa-pre.c: Rewritten entirely. * tree-ssa-sccvn.c (constant_to_value_id): New hashtable. (constant_value_ids): Ditto. (vn_nary_op_t): Moved to header. (vn_phi_t): Ditto. (vn_reference_op_t): Ditto (vn_reference_t): Ditto. (next_value_id): New variable. (VN_INFO): Add an assert. (vn_constant_eq): New function. (vn_constant_hash): Ditto. (get_or_alloc_constant_value_id): Ditto. (value_id_constant_p): Ditto. (vn_reference_compute_hash): De-staticify. (copy_reference_ops_from_ref): Don't use get_callee_fndecl. Disable some code with a FIXME. Remove VALUE_HANDLE use. (valueize_refs): Update opcode if it changes from ssa name to constant. (vn_reference_lookup_1): Add new argument. (vn_reference_lookup): Ditto. (vn_reference_lookup_pieces): New function. (vn_reference_insert): Add return type. Modify to deal with value ids. (vn_reference_insert_pieces): New function. (vn_nary_op_compute_hash): De-staticify. (vn_nary_op_eq): Ditto. (vn_nary_op_lookup_pieces): New function. (vn_nary_op_lookup): Add new argument. (vn_nary_op_insert_pieces): New function. (vn_nary_op_insert): Add return type. Modify to deal with value ids. (vn_phi_insert): Ditto. (visit_unary_op): Update for callee changes. (visit_binary_op): Ditto. (visit_reference_op_load): Ditto. (visit_reference_op_store): Ditto. (init_scc_vn): Init next_value_id, constant_to_value_id and constant_value_ids. (free_scc_vn): Free them. (set_hashtable_value_ids): New function. (run_scc_vn): Use it. (get_max_value_id): New function. (get_next_value_id): Ditto. (expressions_equal_p): Moved from tree-vn.c (sort_vuses): Ditto. (sort_vuses_heap): Ditto. * tree-ssa-sccvn.h: Structures moved from tree-ssa-sccvn.c (noted above). * tree.c (iterative_hash_hashval_t): Made non-static * tree.h (iterative_hash_hashval_t): Declare it. Added: trunk/gcc/testsuite/gcc.c-torture/compile/20080704-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr23455.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr35286.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr35287.c Removed: trunk/gcc/tree-vn.c Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/dbgcnt.def trunk/gcc/testsuite/gcc.dg/tree-ssa/loadpre24.c trunk/gcc/tree-flow.h trunk/gcc/tree-pretty-print.c trunk/gcc/tree-ssa-dom.c trunk/gcc/tree-ssa-pre.c trunk/gcc/tree-ssa-sccvn.c trunk/gcc/tree-ssa-sccvn.h trunk/gcc/tree-ssa-threadedge.c trunk/gcc/tree.c trunk/gcc/tree.def trunk/gcc/tree.h trunk/gcc/treestruct.def
Fixed.