This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/60172] [4.9/4.10 Regression] ARM performance regression from trunk at 207239
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 15 May 2014 08:01:37 +0000
- Subject: [Bug tree-optimization/60172] [4.9/4.10 Regression] ARM performance regression from trunk at 207239
- Auto-submitted: auto-generated
- References: <bug-60172-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60172
--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Thomas Preud'homme from comment #16)
> Hi Richard,
>
> could you expand on what you said in comment #13? I don't see how reassoc
> could help cse here. From what I understood, reassoc tries to group per
> rank. In our case, we have (view of the source with loop unrolling):
>
> Arr_2_Par_Ref [Int_Loc] [Int_Loc] = Int_Loc;
> /* some stmts */
> Arr_2_Par_Ref [Int_Loc+10] [Int_Loc] = Arr_1_Par_Ref [Int_Loc];
>
> If I'm not mistaken, in the first case you'd have:
>
> Int_Loc * 4
> Int_Loc * 100
> Arr_2_Par_Ref
>
> that would be added together with several statements. However in the second
> case you'd have:
>
> Int_Loc * 4
> Int_Loc * 100
> 1000
> Arr_2_Par_Ref
>
> that would be added together with several statements. I don't see how could
> 1000 being added first or last, it seems to me that it's always going to be
> in an intermediate statement and thus not all redanduncy would be eliminated
> by CSE.
>
> Please let me know if my reasonning is flawed so that I can progress toward
> a solution.
Citing myself:
On the GIMPLE level before expansion we have
+40 = Arr_2_Par_Ref_22(D) + (_41 + pretmp_20);
_51 = Arr_2_Par_Ref_22(D) + (_41 + (pretmp_20 + 1000));
so if _51 were Arr_2_Par_Ref_22(D) + ((_41 + pretmp_20) + 1000);
then _41 + pretmp_20 would be fully redundant with the expression needed
by _40.
Note that IIRC one issue with TER is that it is no longer happening as
there are dead stmts around that confuse its has_single_use logic. Thus
placing a dce pass right before expand would fix that and might be a good
idea anyway (see comment #3). Implementing a "proper" poor-mans SSA-based
DCE would be a good way out (out-of-SSA already has one to remove dead
PHIs).