This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/67606] Missing optimization: load possible return value before early-out test


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67606

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-09-17
                 CC|                            |matz at gcc dot gnu.org
          Component|c                           |middle-end
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
So for the main part of this PR we actually expand from

  <bb 7>:
  # count_16 = PHI <count_1(6), 0(2)>
  return count_16;

so it is a matter of coalescing and where we put that copy from zero.

We coalesce the following way:

Coalesce list: (1)count_1 & (15)count_15 [map: 0, 7] : Success -> 0
Coalesce list: (3)ivtmp.6_3 & (13)ivtmp.6_13 [map: 2, 6] : Success -> 2
Coalesce list: (1)count_1 & (11)count_11 [map: 0, 5] : Success -> 0
Coalesce list: (1)count_1 & (16)count_16 [map: 0, 8] : Success -> 0
Coalesce list: (2)ivtmp.6_2 & (13)ivtmp.6_3 [map: 1, 2] : Success -> 2

so 'count' is fully coalesced but of course the constant is still there
and we insert a copy on the 2->7 edge.

Inserting a value copy on edge BB3->BB4 : PART.0 = 0
Inserting a value copy on edge BB2->BB7 : PART.0 = 0

which also looks good (we use the correct partition for this).  Note that
the zero init isn't partially redundant so GCSE isn't able to optimize
here and RTL code hoisting isn't very advanced.

I'm also sure the RA guys will say it's not the RAs job of doing the
hoisting.

So with my usual GIMPLE hat on I'd say it would have been nice to help
things by placing the value copy more intelligently.  We have a pass
that is supposed to help here - uncprop.  We're faced with

  <bb 2>:
  if (length_4(D) > 0)
    goto <bb 3>;
  else
    goto <bb 7>;

  <bb 3>:
...

  <bb 4>:
  # count_15 = PHI <0(3), count_1(6)>
...

  <bb 6>:
  # count_1 = PHI <count_15(4), count_11(5)>
  ivtmp.6_3 = ivtmp.6_13 + 4;
  if (ivtmp.6_3 != _25)
    goto <bb 4>;
  else
    goto <bb 7>;

  <bb 7>:
  # count_16 = PHI <count_1(6), 0(2)>
  return count_16;

which might be a good enough pattern to detect (slight complication with
the forwarder BB3).  Note that we'd increase register pressure throughout
BB3 and that for the whole thing to work we'd still need to be able to
make sure we can coalesce all of count and the register we init with zero.

Given the interaction with coalescing I wonder whether it makes sense to
do "uncprop" together with coalescing or to make emitting and placing
value-copies work on GIMPLE, exposing the partitions explicitely somehow.

Well, trying to improve uncprop for this particular testcase would work
and shouldn't be too hard (you'd extend it from detecting edge equivalencies
to existing vargs to do PHI value hoisting).  There is also other code
trying to improve coalescing - rewrite_out_of_ssa has insert_backedge_copies
so we could also detect the pattern here and insert copies from the
common constant in the common dominator.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]