This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, PR 10474] Shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue
- From: Martin Jambor <mjambor at suse dot cz>
- To: Jeff Law <law at redhat dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Fri, 19 Apr 2013 00:09:50 +0200
- Subject: Re: [PATCH, PR 10474] Shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue
- References: <20130417154935 dot GC3656 at virgil dot suse> <516EED6F dot 9030007 at redhat dot com>
Hi,
On Wed, Apr 17, 2013 at 12:43:59PM -0600, Jeff Law wrote:
> On 04/17/2013 09:49 AM, Martin Jambor wrote:
> >
> >The reason why it helps so much is that before register allocation
> >there are instructions moving the value of actual arguments from
> >"originally hard" register (e.g. SI, DI, etc.) to a pseudo at the
> >beginning of each function. When the argument is live across a
> >function call, the pseudo is likely to be assigned to a callee-saved
> >register and then also accessed from that register, even in the first
> >BB, making it require prologue, though it could be fetched from the
> >original one. When we convert all uses (at least in the first BB) to
> >the original register, the preparatory stage of shrink wrapping is
> >often capable of moving the register moves to a later BB, thus
> >creating fast paths which do not require prologue and epilogue.
> I noticed similar effects when looking at range splitting. Being
> able to move those calls into a deeper control level in the CFG
> would definitely be an improvement.
>
> >
> >We believe this change in the pipeline should not bring about any
> >negative effects. During gcc bootstrap, the number of instructions
> >changed by pass_cprop_hardreg dropped but by only 1.2%. We have also
> >ran SPEC 2006 CPU benchmarks on recent Intel and AMD hardware and all
> >run time differences could be attributed to noise. The changes in
> >binary sizes were also small:
> Did anyone ponder just doing the hard register propagation on
> argument registers prior the prologue/epilogue handling, then the
> full blown propagation pass in its current location in the pipeline?
I did not because I did not think it would be substantially faster
than running the pass as-is twice. I may be wrong but it would still
had to look at all statements and examine them at very similar level
of detail (to look for clobbers and manage value_data_entry chains)
and it would not really do that much less work fiddling with its own
data structures.
What would very likely be a working alternative for shrink-wrapping is
to have shrink-wrapping preparation invoke copyprop_hardreg_forward_1
on the first BB and the few BBs it tries to move stuff across. But of
course that would be a bit ugly and so I think we should do it only if
there is a reason not to move the pass (or schedule it twice).
I also have not tried scheduling the hard register copy propagation
pass twice and measuring the impact on compile times. Any suggestion
what might be a good testcase for that?
Thanks,
Martin
>
> That would get you the benefit you're seeking and minimize other
> effects. Of course if you try that and get effectively the same
> results as moving the full propagation pass before prologue/epilogue
> handling then the complexity of only propagating argument registers
> early is clearly not needed and we'd probably want to go with your
> patch as-is.
>
>
> jeff
>