This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: reorg branch displacement fix


Jeff Law wrote:
> 
> In message <Pine.LNX.4.21.0211011443240.4972-100000@mail.kloo.net>, tm writes:
>  >It sounds like there should be a machine-independent branch-lengthening
>  >pass for machines which have limited branch displacements. There must be
>  >other processors which have this same problem?

I think it would actually be worth-while to marry the local constant pool
generation (which is currently in machine_dependent_reorg, because it must be
sub-function granularity to ensure that the offsets reach) with an early
branch shortening pass.  This would allow us to put the costants for far
branches into larger constant pools, thus improving Icache / Dcache locality
and saving some bytes of alignment for 32 bit branches.  Also, we should
be able to increase the coverage range of unconditional branches as targets
for conditional branches, thus saving some more space.
	
> Or the SH port should use the model that all the other ports with
> variable length insns + limited branch displacements + delay slots use.

That would be the fr30.  And the model it uses is to pretend it has
arbitrary long branches, and then it takes 8 or 10 bytes per branch
when it gets out of range.
Looking in google for fr30, it appears as if Fujitsu doesn't care about
gcc at the moment; they talk about their 'Softune' IDE, but no word about
GNU tools.  Also the fr30 gdb port has been obsoleted...

In detail:
The SH has 9 bit signed offsets for conditional branches.  Longer branches
are synthesized using unconditional branches; these come in 13, 16 and 32
bit signed offset variants.  The 13 bit unconditional branches always have
a delay slot, the 16 and 32 bit ones have one if register scavenging
(which also happens in machine_dependent_reorg) has been sucessful.
13 bit jumps take 2 bytes, and if register scavenging is successful
(which it is most of the times, since there is usually a dead gpr
 at the start of a basic block) 16 bit ones take 6 bytes + slot, and 32
bit ones take 8-10 bytes + slot; if not, it's 10 bytes and 12-14 bytes,
respectively.

I find that most branches can be done with 9 or 13 bit offset, and almost
all of the rest can be handled with 16 bit.  So let's define
'limited branch displacements' for the sake of this discussion as
'less than 16 bits'; things that happen very infrequently are not
interesting for optimization (as long as we get them right).

grep shows that the list of ports that mention define_delay at all is
rather limited:
config/arc/arc.md:2
config/c4x/c4x.md:5
config/cris/cris.md:1
config/d30v/d30v.md:8
config/fr30/fr30.md:1
config/frv/frv.md:8
config/m88k/m88k.md:2	
config/mips/mips.md:3
config/pa/pa.md:6
config/romp/romp.md:1
config/sh/sh.md:4
config/sparc/sparc.md:4

Now let's have a look at these ports:
	
- arc and m88k have (or pretend to have) uniform size branches.
- The c4x and cris do not even have a length attribute.
- d30v and frv mention define_delay only in comments.
- The fr30 has 9 bit conditional branch offsets.  It can annul-true
  the delay slots, just like the SH, but that is not described in the
  fr30.md file.  Short unconditional jumps only have a 9 bit offset,
  and there is only one other type in use for any given compilation,
  which takes 6 bytes for the small memory model, and 8 bytes for the
  large model.  Because of the small range for the 2 byte unconditional
  branch, the fr30 suffers even more from lack of intelligent branch
  splitting than the SH did, but apparently, no-one cares enough about
  the fr30 gcc code to make it generate decent code.
- mips: 'short' branches are 4 bytes long and have a 17 bit offset.
- pa: 'short' branches have a length of 4 and 14 bit offset; there
  is also a medium length variant available that takes 8 bytes and
  has a 19 bit offset, and has still just one delay slot with also
  has the same annull direction as the one for the short branch.
  When the 19 bit offset is exceeded, and optimization is enabled,
  the compiler aborts.  Huh, what a nice role model.
- romp: branches have lengths of 2 (with 9 bit offset) or 4; again,
  the delay slot stays the same irrespective of insn length.
  The md file seems to assume that delay slots for 2 byte insns don't
  work, and that they can be disabled by testing the length attribute
  (that doesn't work because you'll always see the default length of 4
  during delay slot scheduling).  Maybe that is some remnant of gcc 1?
- sparc: 'short' branches have a length of 4 and 19 bit offset for
  integer and fp comparisons, 16 bit for sparc v9 register comparisons.
  When more offset is needed, a sequence with an unused second delay
  slot is used.

> That model is we don't worry about such things in reorg and deal with
> branch shortening entirely in the branch shortening pass.

This is inappropriate when a considerable amount of your delay slots
depend on branch shortening.
	
-- 
--------------------------
SuperH (UK) Ltd.
2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
T:+44 1454 465658


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]