This is the mail archive of the
mailing list for the GCC project.
Re: Live range shrinkage in pre-reload scheduling
- From: Richard Sandiford <rdsandiford at googlemail dot com>
- To: ramrad01 at arm dot com
- Cc: Vladimir Makarov <vmakarov at redhat dot com>, Kyrill Tkachov <kyrylo dot tkachov at arm dot com>, "gcc\ at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, Maxim Kuvyrkov <maxim dot kuvyrkov at linaro dot org>
- Date: Thu, 15 May 2014 08:11:29 +0100
- Subject: Re: Live range shrinkage in pre-reload scheduling
- Authentication-results: sourceware.org; auth=none
- References: <5371F395 dot 8050208 at arm dot com> <53736CFD dot 6030402 at redhat dot com> <87d2fgco2v dot fsf at talisman dot default> <CAJA7tRZyXSamdb_jbnX+-HywZEajB=+Y-SzCyHzyTJDyfnhQJw at mail dot gmail dot com>
Ramana Radhakrishnan <firstname.lastname@example.org> writes:
> On Wed, May 14, 2014 at 5:38 PM, Richard Sandiford
> <email@example.com> wrote:
>> Hey, I resent that. You make it sound I came up with SCHED_PRESSURE_MODEL
>> on a whim without any evidence to back it up. I implemented it because
>> it gave better EEMBC results on ARM, at least at the time that I wrote
>> it, and it didn't effect SPEC2000 for ARM much one way or the other.
>> It also produced better results for s390x on SPEC2006 at the time it
>> was tested, which is why it was turned on by default there too.
>> For anyone interested in the background and rationale, the original
>> posting was here: https://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html
> I no longer have those results,
I don't have mine either unfortunately.
> the reason we turned them on were
> because IIRC there were significant improvements on A8 and A9 for this
> new weighted algorithm and it seemed to work well.
>> I'm not claiming it's a great heuristic or anything. There's bound to
>> be room for improvement. But it was based on "reality" and real results.
>> Of course, if it turns out not be a win for ARM or s390x any more then it
>> should be disabled.
> The current situation that Kyrill is investigating is a case where we
> notice the first scheduler pass being a bit too aggressive with
> creating ILP opportunities with the A15 scheduler that causes
> performance differences with not turning on the first scheduler pass
> vs using the defaults.
Ah, OK. "model" was a deliberate attempt to be less conservative
than "weighted", but it sounds like it went too far in your A15 case.
Have you got a testcase you can share?
>>> In this relation I am remembering a story told me by Bob Morgan about
>>> bin packing RA invention. It was just a quick and simple first RA
>>> implementation for a new compiler. After that DEC compiler team tried
>>> many times to improve the RA implementing more complicated optimizations
>>> but the first bin packing RA was always better.
>> You make it sound like your original -fsched-pressure is unlikely
>> to be beaten, in the way that you think bin packing wasn't beaten.
>> But both versions of -fsched-pressure are off by default on most
>> targets for a reason. (AFAIK the only two targets that enable it by
>> default are the two that use SCHED_PRESSURE_MODEL: arm and s390x.)
>> I think this is still an area that could be improved. I don't mind
>> whether that's through improving one of the two existing heuristics
>> or doing something different, but it seems pessimistic to say that
>> scheduling based on register pressure is always going to be the optional
>> feature that it is now.
>> E.g. tracking pressure classes isn't always the right thing for
>> targets like PowerPC where only part of the vector register set
>> can be used for floating-point operations.
> Is there another post that deals with this particular case ? I tried
> digging through the archives but couldn't find anything easily enough.
Not sure I ever sent it in the end. I got pulled off Linaro before I
could properly finish this stuff off. But IIRC the problem was that
both algorithms use IRA pressure classes to calculate the pressure.
If VSX is enabled then the pressure class for both scalar floating-point
and vector operations will be VSX_REGS. But the scalar operations can
only use half of those registers.
So in float-heavy code, the heuristics would assume that there are
twice as many registers available as there really are. This hurt
SCHED_PRESSURE_MODEL much more than SCHED_PRESSURE_WEIGHTED because
*_MODEL was supposed to be less conservative. I.e. *_MODEL would
encourage ILP to the point of using twice the number of actual registers.
IIRC the biggest reason that this didn't affect *_WEIGHTED as much
was that it deliberately treated a register as being live indefinitely
if the register is used more than once. This is the thing I mentioned
in the write-up that Vlad said had also improved x86_64.