This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/63537] [4.9/5 Regression] Missed optimization: Loop unrolling adds extra copy when returning aggregate


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63537

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-10-15
      Known to work|                            |4.7.3
   Target Milestone|---                         |4.9.2
            Summary|Missed optimization: Loop   |[4.9/5 Regression] Missed
                   |unrolling adds extra copy   |optimization: Loop
                   |when returning aggregate    |unrolling adds extra copy
                   |                            |when returning aggregate
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is because the outer loop is unrolled only after SRA gets a chance to
scalarize away the local aggregate.  With GCC 4.7 we unroll the loop during
early unrolling even at -O2.

With 4.9 we conclude:

Estimating sizes for loop 1
 BB: 4, after_exit: 0
  size:   2 if (i_1 <= 2)
   Exit condition will be eliminated in peeled copies.
 BB: 3, after_exit: 1
  size:   1 _4 = lhs.n[i_1];
  size:   1 _6 = _4 * rhs_5(D);
  size:   1 ret.n[i_1] = _6;
  size:   1 i_8 = i_1 + 1;
   Induction variable computation will be folded away.

size: 6-3, last_iteration: 2-0
  Loop size: 6
  Estimated size after unrolling: 7
Not unrolling loop 1: size would grow.

while 4.7 had:

Estimating sizes for loop 1
 BB: 4, after_exit: 0
  size:   2 if (i_1 <= 2)
   Exit condition will be eliminated.
 BB: 3, after_exit: 1
  size:   1 D.1593_3 = lhs.n[i_1];
  size:   1 D.1594_5 = D.1593_3 * rhs_4(D);
  size:   1 ret.n[i_1] = D.1594_5;
  size:   1 i_6 = i_1 + 1;
   Induction variable computation will be folded away.
size: 6-3, last_iteration: 2-2
  Loop size: 6
  Estimated size after unrolling: 6

so the difference is in last_iteration handling.

Honza?

Otherwise this is a optimization pass ordering issue.

Eventually a simple pass could handle

  <retval> = ret;
  ret ={v} {CLOBBER};
  return <retval>;

and back-propagate <retval> into all stores/loads of ret.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]