typedef int nl_item; extern char *nl_langinfo (nl_item __item) __attribute__ ((__nothrow__)); char * xtermEnvEncoding(void) { static char *result; if (result == 0) { result = nl_langinfo(1); ; } return result; } Compile the above code with -march=i686 -O2 4.1 generates extra reads to the static result.1282. xtermEnvEncoding: pushl %ebp movl %esp, %ebp subl $8, %esp movl result.1282, %eax testl %eax, %eax je .L6 movl result.1282, %eax <--- this one leave ret .p2align 4,,7 .L6: movl $1, (%esp) call nl_langinfo movl %eax, result.1282 movl result.1282, %eax <-- and this one leave ret 4.0 does not generate those instructions. This is one of the reasons for the code size regression in PR23153.
Confirmed. There are two issues here. First the regression was caused by: 2005-07-30 Jan Hubicka <jh@suse.cz> * expr.c (expand_expr_real_1): Do not load mem targets into register. * i386.c (ix86_fixup_binary_operands): Likewise. (ix86_expand_unary_operator): Likewise. (ix86_expand_fp_absneg_operator): Likewise. * optabs.c (expand_vec_cond_expr): Validate dest. Just like all of your other regressions. Second this really should be done on the tree level. I think this is load PRE but I could be wrong.
Further reduced testcase: int xtermEnvEncoding(void) { static int result; if (result == 0) result = 2; return result; } Basicially the issue is that the if is emitted as cmp 0, result instead of what it was before as set regtmp, result cmp regtmp, 0 so GCSE/PRE does not see the load. And yes this is load PRE which means this is most likely be fixed for 4.2 on the tree level.
I am surprised that gcse.c load PRE can't already handle this situation. Is something broken, or is it really just not able to handle this? What does the RTL look like when we are in GCSE?
(In reply to comment #3) > I am surprised that gcse.c load PRE can't already handle this situation. Is > something broken, or is it really just not able to handle this? What does the > RTL look like when we are in GCSE? (insn 11 9 12 0 (set (reg:CCZ 17 flags) (compare:CCZ (mem/i:SI (symbol_ref:SI ("result.1279") [flags 0x2] <var_decl 0xb7cb0108 result>) [2 result+0 S4 A32]) (const_int 0 [0x0]))) 0 {*cmpsi_ccno_1} (nil) (nil)) Which is unlike what we got in 4.0.0: (insn 11 9 12 0 (set (reg:SI 59 [ result ]) (mem/i:SI (symbol_ref:SI ("result.1132") [flags 0x2] <var_decl 0xb7ce7a8c result>) [2 result+0 S4 A32])) 35 {*movsi_1} (nil) (nil)) (insn 12 11 13 0 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 59 [ result ]) (const_int 0 [0x0]))) 0 {*cmpsi_ccno_1} (nil) (nil)) so either GCSE needs to be improved to recongized that or Honza's patch should be reverted. We are in stage3, I would say the latter one untill this gets improved on the tree level.
It's strange that the load(*) does not get optimized, given that it's in the same BB as the store that precedes it... movl %eax, result.1282 (*) movl result.1282, %eax
I'm going to look at this a bit more...
I don't have time to work on these (new job), so unassigning.
For 4.2, this should be fixed on the tree level by load PRE.
Changing the summary to give a better idea what is going on here.
Leaving as P2.
There is no size regression here with -Os. The only thing which will help here is having load PRE on the tree level. Which makes the trees look like: int f(void) { static int i; int i1; i1 = i; if (i1 == 0) i = i1 = 2; return i1; }
It seems to me that really only solution is working load PRE on TREEs. Since this is out of 4.1 reach we can either revert my patch or retarget this for 4.2. I must say I am inclined to the second - the patch has positive effect in overall SPEC testing and I tend to believe that this problem is not fully critical... Honza
Hmm, we actually produce better code on the mainline for my example in comment #2: _xtermEnvEncoding1: movl _result.1525, %edx movl $2, %eax pushl %ebp movl %esp, %ebp popl %ebp testl %edx, %edx cmovne _result.1525, %eax movl %eax, _result.1525 ret But that is does not fix the problem in comment #0. Load PRE on the tree level is not working because this is not an indirect reference to the variable. That is a known issue too.
Changing GVN PRE for adding decl as references is harder than I thought. :(.
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
*** Bug 26537 has been marked as a duplicate of this bug. ***
(In reply to comment #5) > It's strange that the load(*) does not get optimized, given that it's in the > same BB as the store that precedes it... > > movl %eax, result.1282 > (*) movl result.1282, %eax This is because the copying of the trace is happening at the very end of the optimization phase so it does not optimized at all.
(In reply to comment #17) > (In reply to comment #5) > > It's strange that the load(*) does not get optimized, given that it's in the > > same BB as the store that precedes it... > > > > movl %eax, result.1282 > > (*) movl result.1282, %eax > > This is because the copying of the trace is happening at the very end of the > optimization phase so it does not optimized at all. Right, the copying happens in .bbro (as shown in PR26537). gcc-4.0 did the same kind of copying in .bbro, but it did not generate the redundant mov.
(In reply to comment #18) > Right, the copying happens in .bbro (as shown in PR26537). > gcc-4.0 did the same kind of copying in .bbro, but it did not generate the > redundant mov. And that is because load PRE happened.
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
Subject: Bug 23488 Author: dberlin Date: Sat Jul 7 03:25:29 2007 New Revision: 126434 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126434 Log: 2007-07-06 Daniel Berlin <dberlin@dberlin.org> Fix PR tree-optimization/23488 * tree-ssa-sccvn.c (expr_has_constants): Handle tcc_declaration. (try_to_simplify): Ditto. (visit_use): Ditto. * tree-vn.c (set_value_handle): Use decl_vh_map for decl value handles. * tree-flow-inline.h (get_value_handle): Ditto. * tree-ssa-pre.c (decl_vh_map): New. (decl_node_pool): New. (can_value_number_operation): Support DECL_P. (can_PRE_operation): Ditto. (create_expression_by_pieces): Ditto. (find_existing_value_expr): Modify to differnetiate between addressing and top level. (create_value_handle_for_expr): Handle DECL's. (poolify_tree): Ditto. (make_values_for_phi): Don't insert into PHI_GEN during FRE. (make_values_for_stmt): Handle DECL's properly. (init_pre): Reorg to not init useless things during FRE. (fini_pre): Ditto. * tree-flow.h: Include pointer-set.h. (decl_vh_map): Declare. * Makefile.in (TREE_FLOW_H): Add pointer-set.h Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-6.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-17.c Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-flow-inline.h trunk/gcc/tree-flow.h trunk/gcc/tree-ssa-pre.c trunk/gcc/tree-ssa-sccvn.c trunk/gcc/tree-vn.c
Subject: Bug 23488 Author: dberlin Date: Sat Jul 7 22:23:26 2007 New Revision: 126449 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126449 Log: 2007-07-07 Daniel Berlin <dberlin@dberlin.org> Revert (note the sccvn portions are *not* reverted) 2007-07-06 Daniel Berlin <dberlin@dberlin.org> Fix PR tree-optimization/23488 * tree-vn.c (set_value_handle): Use decl_vh_map for decl value handles. * tree-flow-inline.h (get_value_handle): Ditto. * tree-ssa-pre.c (decl_vh_map): New. (decl_node_pool): New. (can_value_number_operation): Support DECL_P. (can_PRE_operation): Ditto. (create_expression_by_pieces): Ditto. (find_existing_value_expr): Modify to differnetiate between addressing and top level. (create_value_handle_for_expr): Handle DECL's. (poolify_tree): Ditto. (make_values_for_phi): Don't insert into PHI_GEN during FRE. (make_values_for_stmt): Handle DECL's properly. (init_pre): Reorg to not init useless things during FRE. (fini_pre): Ditto. * tree-flow.h: Include pointer-set.h. (decl_vh_map): Declare. * Makefile.in (TREE_FLOW_H): Add pointer-set.h Removed: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-17.c Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-flow-inline.h trunk/gcc/tree-flow.h trunk/gcc/tree-ssa-pre.c trunk/gcc/tree-vn.c
Load PRE does still not optimize the test cases of comment #0 and comment #2.
Implementing this kind of load PRE for RTL again is more trouble than it is worth. It would require tracking of addresses in MEM rtx'en, hashing the MEM and the MEM address independently but solving the dataflow equations with some kind of fake dependency (e.g. to avoid hoisting the a load but not the address of the load), and validating that an address replacement is correct.