This is the mail archive of the
gcc-prs@gcc.gnu.org
mailing list for the GCC project.
Re: target/7856: [arm] invalid offset in constant pool reference
- From: Richard Earnshaw <rearnsha at cambridge dot arm dot com>
- To: nobody at gcc dot gnu dot org
- Cc: gcc-prs at gcc dot gnu dot org,
- Date: 9 Sep 2002 14:16:01 -0000
- Subject: Re: target/7856: [arm] invalid offset in constant pool reference
- Reply-to: Richard Earnshaw <rearnsha at cambridge dot arm dot com>
The following reply was made to PR target/7856; it has been noted by GNATS.
From: Richard Earnshaw <rearnsha@cambridge.arm.com>
To: gcc-gnats@gcc.gnu.org
Cc: Richard.Earnshaw@arm.com
Subject: Re: target/7856: [arm] invalid offset in constant pool reference
Date: Mon, 09 Sep 2002 15:12:33 +0100
stupid gnats setup...
------- Forwarded Message
Date: Mon, 09 Sep 2002 14:56:39 +0100
From: Richard Earnshaw <rearnsha@arm.com>
To: Philip Blundell <pb@nexus.co.uk>
cc: rearnsha@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org,
nickc@redhat.com, nobody@gcc.gnu.org, rearnsha@arm.com
Subject: Re: target/7856: [arm] invalid offset in constant pool reference
> On Mon, 2002-09-09 at 11:56, rearnsha@gcc.gnu.org wrote:
> > However, I'm not yet convinced that Nick's change is for the
> > best overall.
>
> Well, mmm, it's tricky to know which gives you better code on average.
> XScale makes it worse, because the penalty for using ldm/stm is greater
> on those devices - you pay something like a two-cycle startup cost, plus
> one cycle for every register transferred. So pushing LR unnecessarily
> could cost you three cycles at entry and the same at exit, if you only
> had one other register to save.
>
> Presumably we could avoid the particular problem at hand by making
> use_return_insn() detect this situation - it already does this for
> interworking, which is the other main case where
> output_return_instruction would generate a 2-instruction sequence.
Ok, lets try a bit more detailed analysis:
For XScale:
scenario 1:
b<cond> return_seqence @ cost 5/1 (predict NOT taken)
return_sequence:
ldr sl, [sp], #4 @ cost 1
bx lr @ cost 5
Condition true Total = 11 cycles
Condition false total = 1 cycle
Scenario 2:
ldr<cond> sl, [sp], #4 @ cost 1/1
bx<cond> lr @ cost 5/1
Condition true total = 6 cycles
Condition false total = 2 cycles
Scenario 3:
ldm<cond> sp!, {sl, pc} @ cost 9/2? (Not sure on no-exec cost)
Condition true total = 9 cycles
Condition false total = 2 cycles
For arm10
Scenario 1:
b<cond> return_sequence @ cost 4/0 (predict NOT taken)
return_sequence:
ldr sl, [sp], #4 @ cost 1
bx lr @ cost 4
Condition true total = 9 cycles
Condition flase total = 0 cycle
Scenario 2:
ldr<cond> sl, [sp], #4 @ cost 1/1
bx<cond> lr @ cost 4/2
Condition true total = 5 cycles
Condition false total = 3 cycles
Scenario 3:
ldm<cond> sp!, {sl, pc} @ cost 7/2
Condition true total = 7 cycles
Condition false total = 2 cycles
For arm9e
Scenario 1:
b<cond> return_sequence @ cost 3/1
return_sequence:
ldr sl, [sp], #4 @ cost 1
bx lr @ cost 3
Condition true total = 7 cycles
Condition flase total = 1 cycle
Scenario 2:
ldr<cond> sl, [sp], #4 @ cost 1/1
bx<cond> lr @ cost 3/1
Condition true total = 4 cycles
Condition false total = 2 cycles
Scenario 3:
ldm<cond> sp!, {sl, pc} @ cost 6/1
Condition true total = 6 cycles
Condition false total = 1 cycles
For arm7tdmi
Scenario 1:
b<cond> return_sequence @ cost 3/1
return_sequence:
ldr sl, [sp], #4 @ cost 3
bx lr @ cost 3
Condition true total = 9 cycles
Condition flase total = 1 cycle
Scenario 2:
ldr<cond> sl, [sp], #4 @ cost 3/1
bx<cond> lr @ cost 3/1
Condition true total = 6 cycles
Condition false total = 2 cycles
Scenario 3:
ldm<cond> sp!, {sl, pc} @ cost 6/1
Condition true total = 6 cycles
Condition false total = 1 cycles
So it's fairly clear from this, that with the possible exception of the
7tdmi, we do not want to use an ldm instruction. Now for both arm10 and
XScale, where branch prediction starts to kick in, it's also clear that we
want to use a branch instruction instead of a conditionally executed exit
sequence; this is particularly so inside a loop when we would like to
benefit from the branch predictor eliminating the branch entirely.
However, this does increase the cost of returning quite significantly in
all cases.
It's also fairly clear that scenario 2 is almost never the 'best' sequence
(except when we think a conditional return is very likely).
So I think the conclusion from all this is that we should make
use_return_insn () return false whenever a return sequence would be ldr
<reg> followed by a mov pc, lr. We should probably still allow an
unconditional return sequence to be up to two instructions long (I don't
think there's much to be gained from allowing it to be longer), and we
should then adjust the length attribute of the "return" insn to be 8
(currently 4) to indicate the longest sequence possible (we could make it
more accurate if we really needed, but there seems to be little point).
R.
------- End of Forwarded Message