Bug 23488 - [4.1/4.2/4.3 Regression] GCSE load PRE does not work with non sets (or missing load PRE with plain decls)
Summary: [4.1/4.2/4.3 Regression] GCSE load PRE does not work with non sets (or missin...
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.1.0
: P2 minor
Target Milestone: 4.1.3
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 26537 (view as bug list)
Depends on:
Blocks: 23153
  Show dependency treegraph
 
Reported: 2005-08-19 21:39 UTC by Dan Nicolaescu
Modified: 2007-11-05 22:29 UTC (History)
3 users (show)

See Also:
Host:
Target: i686-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-09-08 19:37:41


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Nicolaescu 2005-08-19 21:39:51 UTC
typedef int nl_item;

extern char *nl_langinfo (nl_item __item) __attribute__ ((__nothrow__));

char *
xtermEnvEncoding(void)
{
  static char *result;

  if (result == 0) {

    result = nl_langinfo(1);
    ;
  }
  return result;
}

Compile the above code with -march=i686 -O2 
4.1 generates extra reads to the static result.1282.


xtermEnvEncoding:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        movl    result.1282, %eax
        testl   %eax, %eax
        je      .L6
        movl    result.1282, %eax   <--- this one
        leave
        ret
        .p2align 4,,7
.L6:
        movl    $1, (%esp)
        call    nl_langinfo
        movl    %eax, result.1282   
        movl    result.1282, %eax  <-- and this one
        leave
        ret


4.0 does not generate those instructions.
This is one of the reasons for the code size regression in PR23153.
Comment 1 Andrew Pinski 2005-08-19 21:50:53 UTC
Confirmed.  There are two issues here.  First the regression was caused by:
2005-07-30  Jan Hubicka  <jh@suse.cz>

        * expr.c (expand_expr_real_1): Do not load mem targets into register.
        * i386.c (ix86_fixup_binary_operands): Likewise.
        (ix86_expand_unary_operator): Likewise.
        (ix86_expand_fp_absneg_operator): Likewise.
        * optabs.c (expand_vec_cond_expr): Validate dest.
Just like all of your other regressions.

Second this really should be done on the tree level.  I think this is load PRE but I could be wrong.
Comment 2 Andrew Pinski 2005-08-19 22:35:55 UTC
Further reduced testcase:
int
xtermEnvEncoding(void)
{
  static int result;
  if (result == 0)
    result = 2;
  return result;
}

Basicially the issue is that the if is emitted as cmp 0, result instead of what it was before as
set regtmp, result
cmp regtmp, 0
so GCSE/PRE does not see the load.
And yes this is load PRE which means this is most likely be fixed for 4.2 on the tree level.
Comment 3 Steven Bosscher 2005-08-19 23:26:56 UTC
I am surprised that gcse.c load PRE can't already handle this situation.  Is 
something broken, or is it really just not able to handle this?  What does the 
RTL look like when we are in GCSE? 
 
 
Comment 4 Andrew Pinski 2005-08-19 23:31:10 UTC
(In reply to comment #3)
> I am surprised that gcse.c load PRE can't already handle this situation.  Is 
> something broken, or is it really just not able to handle this?  What does the 
> RTL look like when we are in GCSE? 
(insn 11 9 12 0 (set (reg:CCZ 17 flags)
        (compare:CCZ (mem/i:SI (symbol_ref:SI ("result.1279") [flags 0x2] <var_decl 0xb7cb0108 result>) 
[2 result+0 S4 A32])
            (const_int 0 [0x0]))) 0 {*cmpsi_ccno_1} (nil)
    (nil))


Which is unlike what we got in 4.0.0:
(insn 11 9 12 0 (set (reg:SI 59 [ result ])
        (mem/i:SI (symbol_ref:SI ("result.1132") [flags 0x2] <var_decl 0xb7ce7a8c result>) [2 result+0 S4 
A32])) 35 {*movsi_1} (nil)
    (nil))

(insn 12 11 13 0 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:SI 59 [ result ])
            (const_int 0 [0x0]))) 0 {*cmpsi_ccno_1} (nil)
    (nil))


so either GCSE needs to be improved to recongized that or Honza's patch should be reverted.  We are in 
stage3, I would say the latter one untill this gets improved on the tree level.
Comment 5 Dan Nicolaescu 2005-08-19 23:37:05 UTC
It's strange that the load(*) does not get optimized, given that it's in the
same BB as the store that precedes it... 

           movl    %eax, result.1282   
(*)        movl    result.1282, %eax
Comment 6 Steven Bosscher 2005-08-20 00:54:12 UTC
I'm going to look at this a bit more... 
Comment 7 Steven Bosscher 2005-10-07 21:21:06 UTC
I don't have time to work on these (new job), so unassigning.
Comment 8 Andrew Pinski 2005-10-27 00:50:09 UTC
For 4.2, this should be fixed on the tree level by load PRE.
Comment 9 Andrew Pinski 2005-10-27 18:30:11 UTC
Changing the summary to give a better idea what is going on here.
Comment 10 Mark Mitchell 2005-10-31 05:13:05 UTC
Leaving as P2.
Comment 11 Andrew Pinski 2005-11-03 01:11:29 UTC
There is no size regression here with -Os.  The only thing which will help here is having load PRE on the tree level.
Which makes the trees look like:
int f(void)
{
  static int i;
  int i1;
  i1 = i;
  if (i1 == 0)
    i = i1 = 2;
  return i1;
}
Comment 12 Jan Hubicka 2005-11-03 20:19:35 UTC
It seems to me that really only solution is working load PRE on TREEs.  Since this is out of 4.1
reach we can either revert my patch or retarget this for 4.2.  I must say I am inclined
to the second - the patch has positive effect in overall SPEC testing and I tend
to believe that this problem is not fully critical... 

Honza
Comment 13 Andrew Pinski 2006-01-18 20:26:34 UTC
Hmm, we actually produce better code on the mainline for my example in comment #2:
_xtermEnvEncoding1:
        movl    _result.1525, %edx
        movl    $2, %eax
        pushl   %ebp
        movl    %esp, %ebp
        popl    %ebp
        testl   %edx, %edx
        cmovne  _result.1525, %eax
        movl    %eax, _result.1525
        ret

But that is does not fix the problem in comment #0.  Load PRE on the tree level is not working because this is not an indirect reference to the variable.  That is a known issue too.
Comment 14 Andrew Pinski 2006-01-20 16:42:49 UTC
Changing GVN PRE for adding decl as references is harder than I thought. :(.
Comment 15 Mark Mitchell 2006-02-24 00:26:16 UTC
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
Comment 16 Andrew Pinski 2006-03-03 02:05:14 UTC
*** Bug 26537 has been marked as a duplicate of this bug. ***
Comment 17 Andrew Pinski 2006-03-03 02:10:54 UTC
(In reply to comment #5)
> It's strange that the load(*) does not get optimized, given that it's in the
> same BB as the store that precedes it... 
> 
>            movl    %eax, result.1282   
> (*)        movl    result.1282, %eax

This is because the copying of the trace is happening at the very end of the optimization phase so it does not optimized at all.

Comment 18 Dan Nicolaescu 2006-03-03 02:14:32 UTC
(In reply to comment #17)
> (In reply to comment #5)
> > It's strange that the load(*) does not get optimized, given that it's in the
> > same BB as the store that precedes it... 
> > 
> >            movl    %eax, result.1282   
> > (*)        movl    result.1282, %eax
> 
> This is because the copying of the trace is happening at the very end of the
> optimization phase so it does not optimized at all.

Right, the copying happens in .bbro (as shown in PR26537). 
gcc-4.0 did the same kind of copying in .bbro, but it did not generate the redundant mov.
Comment 19 Andrew Pinski 2006-03-03 02:16:04 UTC
(In reply to comment #18)
> Right, the copying happens in .bbro (as shown in PR26537). 
> gcc-4.0 did the same kind of copying in .bbro, but it did not generate the
> redundant mov.
And that is because load PRE happened.
Comment 20 Mark Mitchell 2006-05-25 02:33:18 UTC
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
Comment 21 Daniel Berlin 2007-07-07 03:25:43 UTC
Subject: Bug 23488

Author: dberlin
Date: Sat Jul  7 03:25:29 2007
New Revision: 126434

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126434
Log:
2007-07-06  Daniel Berlin  <dberlin@dberlin.org>

	Fix PR tree-optimization/23488

	* tree-ssa-sccvn.c (expr_has_constants): Handle tcc_declaration.
	(try_to_simplify): Ditto.
	(visit_use): Ditto.
	* tree-vn.c (set_value_handle): Use decl_vh_map for decl value
	handles.
	* tree-flow-inline.h (get_value_handle): Ditto.
	* tree-ssa-pre.c (decl_vh_map): New.
	(decl_node_pool): New.
	(can_value_number_operation): Support DECL_P.
	(can_PRE_operation): Ditto.
	(create_expression_by_pieces): Ditto.
	(find_existing_value_expr): Modify to differnetiate between
	addressing and top level.
	(create_value_handle_for_expr): Handle DECL's.
	(poolify_tree): Ditto.
	(make_values_for_phi): Don't insert into PHI_GEN during FRE.
	(make_values_for_stmt): Handle DECL's properly.
	(init_pre): Reorg to not init useless things during FRE.
	(fini_pre): Ditto.
	* tree-flow.h: Include pointer-set.h.
	(decl_vh_map): Declare.
	* Makefile.in (TREE_FLOW_H): Add pointer-set.h


Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-6.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-17.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-flow-inline.h
    trunk/gcc/tree-flow.h
    trunk/gcc/tree-ssa-pre.c
    trunk/gcc/tree-ssa-sccvn.c
    trunk/gcc/tree-vn.c

Comment 22 Daniel Berlin 2007-07-07 22:23:37 UTC
Subject: Bug 23488

Author: dberlin
Date: Sat Jul  7 22:23:26 2007
New Revision: 126449

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126449
Log:
2007-07-07  Daniel Berlin  <dberlin@dberlin.org>

	Revert (note the sccvn portions are *not* reverted)
	2007-07-06  Daniel Berlin  <dberlin@dberlin.org>

	Fix PR tree-optimization/23488

	* tree-vn.c (set_value_handle): Use decl_vh_map for decl value
	handles.
	* tree-flow-inline.h (get_value_handle): Ditto.
	* tree-ssa-pre.c (decl_vh_map): New.
	(decl_node_pool): New.
	(can_value_number_operation): Support DECL_P.
	(can_PRE_operation): Ditto.
	(create_expression_by_pieces): Ditto.
	(find_existing_value_expr): Modify to differnetiate between
	addressing and top level.
	(create_value_handle_for_expr): Handle DECL's.
	(poolify_tree): Ditto.
	(make_values_for_phi): Don't insert into PHI_GEN during FRE.
	(make_values_for_stmt): Handle DECL's properly.
	(init_pre): Reorg to not init useless things during FRE.
	(fini_pre): Ditto.
	* tree-flow.h: Include pointer-set.h.
	(decl_vh_map): Declare.
	* Makefile.in (TREE_FLOW_H): Add pointer-set.h


Removed:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-17.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-flow-inline.h
    trunk/gcc/tree-flow.h
    trunk/gcc/tree-ssa-pre.c
    trunk/gcc/tree-vn.c

Comment 23 Steven Bosscher 2007-09-08 19:37:40 UTC
Load PRE does still not optimize the test cases of comment #0 and comment #2.
Comment 24 Steven Bosscher 2007-11-05 22:29:35 UTC
Implementing this kind of load PRE for RTL again is more trouble than it is worth. It would require tracking of addresses in MEM rtx'en, hashing the MEM and the MEM address independently but solving the dataflow equations with some kind of fake dependency (e.g. to avoid hoisting the a load but not the address of the load), and validating that an address replacement is correct.