This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Delay slot filling - what still matters, and what doesn't matter so much anymore?

From: Jeff Law <law at redhat dot com>
To: Steven Bosscher <stevenb dot gcc at gmail dot com>
Cc: Eric Botcazou <ebotcazou at adacore dot com>, hp at gcc dot gnu dot org, John David Anglin <danglin at gcc dot gnu dot org>, rsandifo at gcc dot gnu dot org, Bernd Schmidt <bernds at codesourcery dot com>, GCC Mailing List <gcc at gcc dot gnu dot org>
Date: Wed, 17 Apr 2013 22:22:35 -0600
Subject: Re: Delay slot filling - what still matters, and what doesn't matter so much anymore?
References: <CABu31nOWzc8rs=16MTmwtavPyM56Kjzy_o7yv6=B7u1Q=GZVBQ at mail dot gmail dot com>

On 04/17/2013 03:52 PM, Steven Bosscher wrote:

First of all: What is still important to handle?

It's clear that the expectations in reorg.c are "anything goes" but
modern RISCs (everything since the PA-8000, say) probably have some
limitations on what is helpful to have, or not have, in a delay slot.
According to the comments in pa.h about MASK_JUMP_IN_DELAY, having
jumps in delay slots of other jumps is one such thing: They don't
bring benefit to the PA-8000 and they don't work with DWARF2 CFI. As
far as I know, SPARC and MIPS don't allow jumps in delay slots, SH
looks like it doesn't allow it either, and CRIS can do it for short
branches but doesn't do because the trade-off between benefit and
machine description complexity comes out negative.

Note that sparc and/or mips might use the adjust the return pointertrick. I know it wasn't my idea when I added it to the PA.

Now the PA really can do jumps in the delay slot of another jump, butthe semantics are such that it's not all that helpful and we've nevertried to model it. You effectively get a single instruction executed atthe first branch target, then you transfer to the second branch targetIIRC. It's actually pretty natural semantics once you look at the pcqueues work on the PA.




 On the scheduler

implementation side: Branches as delayed insns in delay slots of other
branches is impossible to express in the CFG (at least in GCC, but I
think in general it can't be done cleanly). Therefore I want to drop
support for branches in delay slots. What do you think about this?

Certainly no need to support it in the generic case. The only questionis whether or not it's worth supporting the adjust the return pointer inthe delay slot stuff. Given an target without call/ret predictor stack,it can be a singificant advantage. Such things might exist in theembedded space.


What about multiple delay slots? It looks like reorg.c has code to
handle insns with multiple delay slots, but there currently are no GCC
targets in the FSF tree that have insns with multiple delay slots and
that use define_delay.

Ping Hans, I think he was the last person who tried to deal with reorgand multiple delay slots (c4x?). I certainly wouldn't lose any sleep ifwe killed the limit support for multiple delay slots.



Another thing I completely fail to grasp, is how the pipeline
scheduler and delay slots interact. Doesn't dbr_schedule destroy all
the good work schedule_insns has tried to do? If so, how much does
that hurt on modern RISCs?

It really depends on how the slot is filled and how far in the insnchain you had to look. You're usually just ask likely to improve theschedule as you are to muck it up. Also remember you're dealing stuffat block boundaries, where the scheduler really isn't helping much anyway.

There's always a tradeoff here. It could always be improved by havingthe scheduler mark insns which are good candidates (scheduling-wise) forfilling slots. I certainly pondered this a couple decades ago when Icared about delay slot filling on in-order targets :-) Oh yea, thosehints have to be directional since it may be good to move an insnearlier to fill a path leading to the insn, but it may not be good tomove it later to fill a branch after the insn.



Related question: What, if anything, currently prevents dbr_schedule
from causing pipeline stalls by stuffing a long-latency insn in a
delay slot? I'm currently using a cost function using:

This has generally been left to ports to sort out. My experience wasthat loads/stores were often OK to put into a delay slot. A large partof the reason for this is when we fill via the backwards walk, we're notdoing anything speculatively.

A nullified slot is different in that it's usually implemented bycancelling out the last stage in the pipeline. So even if you nullify,you still have to go through the entire pipeline. For something like anfpsqrt or fpdiv, that's *really* bad.


What do you think will be a good strategy to deal with this (short of
integrating delay slot filling in the scheduler proper)? Should I try
to find cost==0 delay slot candidates, and only fill slots with cost>0
candidates if nothing cheap is available? Prefer a nop over cost>0
candidates? Ignore insn_default_latency?

It's really been left to the backends to deal with. So for example, onthe PA anything which touched the FPU was disallowed in a nullified slot.



Another thing I noticed about targets with delay slots that can be
nullified, is that at least some of the ifcvt.c transformations could
be applied to fill more delay slots (obviously if_case_1 and
if_case_2. In reorg.c, optimize_skip does some kind of if-conversion.
Has anyone looked at whether optimize_skip still does something, and
derived a test case for that?

I doubt anyone has looked at it recently. It pre-dates ourif-conversion code by a decade or more.



Jeff

Follow-Ups:
- Re: Delay slot filling - what still matters, and what doesn't matter so much anymore?
  - From: Steven Bosscher

References:
- Delay slot filling - what still matters, and what doesn't matter so much anymore?
  - From: Steven Bosscher

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]