This is the mail archive of the
mailing list for the GCC project.
Re: Live range shrinkage in pre-reload scheduling
- From: Kyrill Tkachov <kyrylo dot tkachov at arm dot com>
- To: Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>
- Cc: Ramana Radhakrishnan <ramana dot gcc at googlemail dot com>, Maxim Kuvyrkov <maxim dot kuvyrkov at linaro dot org>, Charles Baylis <charles dot baylis at linaro dot org>, Vladimir Makarov <vmakarov at redhat dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, Richard Sandiford <rdsandiford at googlemail dot com>
- Date: Fri, 16 May 2014 09:35:23 +0100
- Subject: Re: Live range shrinkage in pre-reload scheduling
- Authentication-results: sourceware.org; auth=none
- References: <5371F395 dot 8050208 at arm dot com> <53736CFD dot 6030402 at redhat dot com> <87d2fgco2v dot fsf at talisman dot default> <CAJA7tRZyXSamdb_jbnX+-HywZEajB=+Y-SzCyHzyTJDyfnhQJw at mail dot gmail dot com> <D06FF0B8-30A3-4D9B-BDAF-7EDBA2C01B4E at linaro dot org> <CAJA7tRZZCjAp-KSyyNk0qMkn9qjmiE6RX2t4yugj_+xjZ_=B_A at mail dot gmail dot com>
On 15/05/14 09:52, Ramana Radhakrishnan wrote:
On Thu, May 15, 2014 at 8:36 AM, Maxim Kuvyrkov
On May 15, 2014, at 6:46 PM, Ramana Radhakrishnan <email@example.com> wrote:
I'm not claiming it's a great heuristic or anything. There's bound to
be room for improvement. But it was based on "reality" and real results.
Of course, if it turns out not be a win for ARM or s390x any more then it
should be disabled.
The current situation that Kyrill is investigating is a case where we
notice the first scheduler pass being a bit too aggressive with
creating ILP opportunities with the A15 scheduler that causes
performance differences with not turning on the first scheduler pass
vs using the defaults.
Charles has a work-in-progress patch that fixes a bug in SCHED_PRESSURE_MODEL that causes the above symptoms. The bug causes 1st scheduler to unnecessarily increase live ranges of pseudo registers when there are a lot of instructions in the ready list.
Is this something that you've seen shows up in general integer code as
well ? Do you or Charles have an example for us to look at ? I'm not
sure what "lot of instructions in the ready list" really means here.
The specific case Kyrill's been looking into is Dhrystone Proc_8 when
tuned for a Cortex-A15 with neon and float-abi=hard but I am not sure
if that has "too many instructions" :) .
Kyrill, could you also look into the other cases we have from SPEC2k
where we see this as well and come back with any specific testcases
that Charles / Richard could also take a look into.
From what I can see the most significant regression from this pre-regalloc
scheduling on SPEC2k is in 171.swim. It seems to suffer from similar symptoms to
Proc_8 (lots of extra spills on the stack)
Looking forward to the patch :). Let me know if I can help with any
Charles, can you finish your patch in the next several days and post it for review?
I think we'll await this but we'll go look into some of the benchmarks.