This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/63537] [4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 15 Oct 2014 08:38:45 +0000
- Subject: [Bug tree-optimization/63537] [4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate
- Auto-submitted: auto-generated
- References: <bug-63537-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed| |2014-10-15
Known to work| |4.7.3
Target Milestone|--- |4.9.2
Summary|Missed optimization: Loop |[4.9/5 Regression] Missed
|unrolling adds extra copy |optimization: Loop
|when returning aggregate |unrolling adds extra copy
| |when returning aggregate
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is because the outer loop is unrolled only after SRA gets a chance to
scalarize away the local aggregate. With GCC 4.7 we unroll the loop during
early unrolling even at -O2.
With 4.9 we conclude:
Estimating sizes for loop 1
BB: 4, after_exit: 0
size: 2 if (i_1 <= 2)
Exit condition will be eliminated in peeled copies.
BB: 3, after_exit: 1
size: 1 _4 = lhs.n[i_1];
size: 1 _6 = _4 * rhs_5(D);
size: 1 ret.n[i_1] = _6;
size: 1 i_8 = i_1 + 1;
Induction variable computation will be folded away.
size: 6-3, last_iteration: 2-0
Loop size: 6
Estimated size after unrolling: 7
Not unrolling loop 1: size would grow.
while 4.7 had:
Estimating sizes for loop 1
BB: 4, after_exit: 0
size: 2 if (i_1 <= 2)
Exit condition will be eliminated.
BB: 3, after_exit: 1
size: 1 D.1593_3 = lhs.n[i_1];
size: 1 D.1594_5 = D.1593_3 * rhs_4(D);
size: 1 ret.n[i_1] = D.1594_5;
size: 1 i_6 = i_1 + 1;
Induction variable computation will be folded away.
size: 6-3, last_iteration: 2-2
Loop size: 6
Estimated size after unrolling: 6
so the difference is in last_iteration handling.
Honza?
Otherwise this is a optimization pass ordering issue.
Eventually a simple pass could handle
<retval> = ret;
ret ={v} {CLOBBER};
return <retval>;
and back-propagate <retval> into all stores/loads of ret.