[PATCH, PR 10474] Shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue

Jeff Law law@redhat.com
Wed Apr 24 22:50:00 GMT 2013

On 04/24/2013 12:24 PM, Martin Jambor wrote:
> Here they are.  First, I simply looked at how many instructions would
> be changed by a second run of the pass in its current position during
> C and C++ bootstrap:
>      |                                     | Insns changed |      % |
>      |-------------------------------------+---------------+--------|
>      | Trunk - only pass in original place |        172608 | 100.00 |
>      | First pass before pro/eipilogue     |        170322 |  98.68 |
>      | Second pass in the original place   |          8778 |   5.09 |
> 5% was worth investigating more.  The 20 source files with highest
> number of affected instructions by the second run were:
>        939 mine/src/libgcc/config/libbid/bid_binarydecimal.c
>        909 mine/src/libgcc/config/libbid/bid128_div.c
>        813 mine/src/libgcc/config/libbid/bid64_div.c
>        744 mine/src/libgcc/config/libbid/bid128_compare.c
>        615 mine/src/libgcc/config/libbid/bid128_to_int32.c
>        480 mine/src/libgcc/config/libbid/bid128_to_int64.c
>        450 mine/src/libgcc/config/libbid/bid128_to_uint32.c
>        408 mine/src/libgcc/config/libbid/bid128_fma.c
>        354 mine/src/libgcc/config/libbid/bid128_to_uint64.c
>        327 mine/src/libgcc/config/libbid/bid128_add.c
>        246 mine/src/libgcc/libgcc2.c
>        141 mine/src/libgcc/config/libbid/bid_round.c
>        129 mine/src/libgcc/config/libbid/bid64_mul.c
>        117 mine/src/libgcc/config/libbid/bid64_to_int64.c
>         96 mine/src/libsanitizer/tsan/tsan_interceptors.cc
>         96 mine/src/libgcc/config/libbid/bid64_compare.c
>         87 mine/src/libgcc/config/libbid/bid128_noncomp.c
>         84 mine/src/libgcc/config/libbid/bid64_to_bid128.c
>         81 mine/src/libgcc/config/libbid/bid64_to_uint64.c
>         63 mine/src/libgcc/config/libbid/bid64_to_int32.c
The first thing that jumps out at me here is there's probably some idiom 
used in the BID code that is triggering.

> I have manually examined some of the late opportunities for
> propagation in mine/src/libgcc/config/libbid/bid_binarydecimal.c and
> majority of them was a result of peephole2.
I can pretty easily see how peep2 may expose opportunities for 
hard-cprop.  Of course, those opportunities may actually be undoing some 
of the benefit of the peep2 patterns.

> So next time I measured only the number of instructions changed during
> make stage2-bubble with multilib disabled.  In order to find out where
> do the new opportunities come from, I added scheduled
> pass_cprop_hardreg after every pass between
> pass_branch_target_load_optimize1 and pass_fast_rtl_dce and counted
> how many instructions are modified (relative to just having the pass
> where it is now):
Thanks.  That's a real interesting hunk of data.  Interesting that we 
have so many after {pro,epi}logue generation, a full 33% of the changed 
insns stem from here and I can't think of why that should be the case. 
Perhaps there's some second order effect that shows itself after the 
first pass of cprop-hardreg.

I can see several ways jump2 could open new propagation possibilities. 
As I noted earlier in this message, the opportunities after peep2 may 
actually be doing more harm than good.

It's probably not worth the work involved, but a more sensible 
visitation order for reg-cprop would probably be good.  Similarly we 
could have the capability to mark interesting blocks and just reg-cprop 
the interesting blocks after threading the prologue/epilogue.

> I'm not sure what the conclusion is.  Probably that there are cases
> where doing propagation late can be a good thing but these do not
> occur that often.  And that more measurements should probably be done.
> Anyway, I'll look into alternatives before (see below) pushing this
> further.
Knowing more about those opportunities would be useful.  The most 
interesting ones to me would be those right after the prologue/epilogue. 
  Having just run the cprop, then attached the prologue/epilogue, I 
wouldn't expect there to be many propagation opportunities.

> I have looked at the patch Vlad suggested (most things are new to me
> in RTL land and so almost everything takes me ages) and I'm certainly
> willing to try and mimic some of it in order to (hopefully) get the
> same effect that propagating and shrink-wrapping preparation moves can
> do.  Yes, this is not enough to deal with parameters loaded from stack
> but unlike latest insertion, it could also work when the parameters
> are also used on the fast path, which is often the case.  In fact,
> propagation helps exactly because they are used in the entry BB.
> Hopefully they will end up in a caller-saved register on the fast path
> and we'll flip it over to the callee-saved problematic one only on
> (slow) paths going through calls.
> Of course, the two approaches are not mutually exclusive and load
> sinking might help too.
Note that sinking copies is formulated as sink copies one at a time in 
Morgan's text.  Not sure that's needed in this case since we're just 
sinking a few, well defined copies.

And I agree, the approaches are not mutually exclusive; sinking a load 
out of the prologue and out of a hot path has a lot of value.  But 
sinking the loads is much more constrained than just sinking the 
argument copies.


More information about the Gcc-patches mailing list