This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: modifying the ARM generation behavior?
- To: Josh Fryman <fryman at cc dot gatech dot edu>
- Subject: Re: modifying the ARM generation behavior?
- From: Richard Earnshaw <rearnsha at arm dot com>
- Date: Mon, 24 Sep 2001 12:00:25 +0100
- cc: gcc at gcc dot gnu dot org, Richard dot Earnshaw at arm dot com
- Organization: ARM Ltd.
- Reply-To: Richard dot Earnshaw at arm dot com
> here's the problem. gcc generates code with "data" values interspersed
> in the text segment. that is, it might generate something like this:
>
> in C
> ----
> void foo( void )
> {
> int myvar=42;
> printf("myvar=%d\n",myvar);
> }
>
> in ARM-ASM (edited to make shorter)
> ----------
> .section .rodata
> .LC0:
> .ascii "myvar=%d\012\000"
> .text
> foo:
> .....
> ldr r0, .L4
> ldr r1, [fp, #-16]
> bl printf
> b .L3
> .L4:
> .word .LC0
> ,word myvar
> .L3:
> ldmea fp, {fp, sp, pc}
Hm, which compiler release are you using? The latest release should at
least move that section outside of the code-stream for this example.
Something like
.text
foo:
.....
ldr r0, .L4
ldr r1, [fp, #-16]
bl printf
ldmea fp, {fp, sp, pc}
.L4:
.word .LC0
.word myvar
(in fact, this particular example will now tail-call if the optimizer is
on).
>
> note how gcc has "stuck" into the middle of the instruction stream
> some data values at .L4. it would make my life much easier (for research
> purposes) if i could move these little random scattered data segments
> into the main data segment or an alternate data segment...
>
> i realize that the ARM doesn't support a decent load-immediate size
> (only 12 bits signed) for addresses or data, and that was probably
> why this approach was taken. however ... i'd like to make a fundamental
> change.
Hm, what you are describing is a poition-independent data model (in ARM's
ATPCS parlance, RWPI -- read-write position independent), but taken to the
extreme that even constants are pushed into the global data tables.
Analysis has shown that this is typically 3-4% less efficient than the
current model used (see the ATPCS document on ARM's web pages). Note that
in order to make this work you would also need support of the linker,
since you wouldn't know the offsets from your base register until link
time. Further, any moderately large program is going to exceed the 4k
offset range of your base register, meaning that you will either need to
create one base value per module (= more code at the start of each module
to set the base register up) or you will have to compile on the assumption
that a single ldr can't load a constant, something like
add Rtmp, Rbase, #OFFSET_HIGH(offset)
ldr Rx, [Rtmp, #OFFSET_LOW(offset)]
For really large programs you might even need two add instructions to get
all the data. In either case the linker would then have to be able to fix
up such sequences once the offset was finally known.
> with a few registers already pinned for other uses, like lr, sp, etc,
> i'd like to reserve "another" register for being a pointer to a
> special data segment of these values - say r11. then, at the very
> beginning of the program, r11 gets loaded with a pointer to the data
> segment containing all these address offsets, and we no longer have to
> mix data into the instruction stream. this is almost what happens now
> with "r3" throughout the program. it spends most of its life as a
> pointer to a block of these variable addresses...
Hm, so on the ARM we currently have 16 registers (well, 15 really, since
one is the PC). Of these 5 are call clobbered (r0-r3,ip) and one more
(lr) is effectively call-clobbered since it holds the return address.
That leaves 9 registers that are call-saved. But of these we have 3
(sometimes 4) that already have designated fixed uses (sp is the stack, fp
(r11) is needed for a frame pointer and r10 is used as the pic register --
on some compilation models, r9 is the pic register and r10 is stack-limit
register). That leaves 6, sometimes 5, registers that are call-saved for
normal use. You can't use r11 since it is already used, so you would have
to use r9 (or for some compilations r8), that would use up 15-20% of the
remaining call-saved registers -- that's likely to have a significant
effect on the efficiency of the rest of your code, since the compiler will
now have to spill more often.
> we also avoid having the dozens of "ldr r3,<some-var-block>" throughout
> the code generation. this would make for more efficient code.
Please show me a real example where we get dozens of such accesses that
would be avoided by your model; the existing model makes use of the PC as
an effective base register, you would loose that benefit with your
approach.
> i'd be happy to field any queries on more specifics or suggestsions on
> existing ways to get around this...
>
I think it probable that code compiled the way you suggest could be made
to work, but I very much doubt that it would be more efficient.
R.