This is the mail archive of the
mailing list for the GCC project.
Re: Live range shrinkage in pre-reload scheduling
- From: Vladimir Makarov <vmakarov at redhat dot com>
- To: rdsandiford at googlemail dot com
- Cc: Kyrill Tkachov <kyrylo dot tkachov at arm dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, Maxim Kuvyrkov <maxim dot kuvyrkov at linaro dot org>
- Date: Wed, 14 May 2014 23:13:21 -0400
- Subject: Re: Live range shrinkage in pre-reload scheduling
- Authentication-results: sourceware.org; auth=none
- References: <5371F395 dot 8050208 at arm dot com> <53736CFD dot 6030402 at redhat dot com> <87d2fgco2v dot fsf at talisman dot default>
On 2014-05-14, 12:38 PM, Richard Sandiford wrote:
Vladimir Makarov <firstname.lastname@example.org> writes:
On 2014-05-13, 6:27 AM, Kyrill Tkachov wrote:
In haifa-sched.c (in rank_for_schedule) I notice that live range
shrinkage is not performed when SCHED_PRESSURE_MODEL is used and the
comment mentions that it results in much worse code.
Could anyone elaborate on this? Was it just empirically noticed on x86_64?
It was empirically noticed on SPEC2000. The practice is a single
criteria for heuristic optimizations. Sometimes a new heuristic
optimization might look promising but the reality might be quite different.
Hey, I resent that. You make it sound I came up with SCHED_PRESSURE_MODEL
on a whim without any evidence to back it up. I implemented it because
it gave better EEMBC results on ARM, at least at the time that I wrote
it, and it didn't effect SPEC2000 for ARM much one way or the other.
It also produced better results for s390x on SPEC2006 at the time it
was tested, which is why it was turned on by default there too.
For anyone interested in the background and rationale, the original
posting was here: https://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html
I'm not claiming it's a great heuristic or anything. There's bound to
be room for improvement. But it was based on "reality" and real results.
Of course, if it turns out not be a win for ARM or s390x any more then it
should be disabled.
In this relation I am remembering a story told me by Bob Morgan about
bin packing RA invention. It was just a quick and simple first RA
implementation for a new compiler. After that DEC compiler team tried
many times to improve the RA implementing more complicated optimizations
but the first bin packing RA was always better.
You make it sound like your original -fsched-pressure is unlikely
to be beaten, in the way that you think bin packing wasn't beaten.
Richard, I did not really mean it. Quite opposite, I was glad that you
added your implementation as I believed that the most important what I
did was an infrastructure for implementing register-pressure scheduling
(more accurate register pressure evaluation). The more people use it,
the better it for me.
Saying that, I am not satisfied as you with how GCC resolves 1st insn
scheduler and RA conflict. Ideally, I'd like to see that 1st insn
scheduler (with some register pressure heuristics or better
communication with RA) improves code for x86/x86-64. This goal is still
far away and I am not sure how to achieve this. Probably I'll finish an
active big development of RA and insn scheduler and switch to something
else during this year.
But both versions of -fsched-pressure are off by default on most
targets for a reason. (AFAIK the only two targets that enable it by
default are the two that use SCHED_PRESSURE_MODEL: arm and s390x.)
I think this is still an area that could be improved. I don't mind
whether that's through improving one of the two existing heuristics
or doing something different, but it seems pessimistic to say that
scheduling based on register pressure is always going to be the optional
feature that it is now.
E.g. tracking pressure classes isn't always the right thing for
targets like PowerPC where only part of the vector register set
can be used for floating-point operations.