This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: modifying the ARM generation behavior?



> here's the problem.  gcc generates code with "data" values interspersed 
> in the text segment.  that is, it might generate something like this:
> 
>       in C
>       ----
>       void foo( void )
>       {
>          int myvar=42;
>          printf("myvar=%d\n",myvar);
>       }
> 
>       in ARM-ASM  (edited to make shorter)
>       ----------
>       .section        .rodata
>       .LC0:
>               .ascii  "myvar=%d\012\000"
>       .text
>       foo:
>          .....
>               ldr     r0, .L4
>               ldr     r1, [fp, #-16]
>               bl      printf
>               b       .L3
>       .L4:
>               .word   .LC0
>               ,word   myvar
>       .L3:      
>               ldmea   fp, {fp, sp, pc}

Hm, which compiler release are you using?  The latest release should at 
least move that section outside of the code-stream for this example.  
Something like

      .text
      foo:
         .....
              ldr     r0, .L4
              ldr     r1, [fp, #-16]
              bl      printf
              ldmea   fp, {fp, sp, pc}
      .L4:
              .word   .LC0
              .word   myvar

(in fact, this particular example will now tail-call if the optimizer is 
on).

> 
> note how gcc has "stuck" into the middle of the instruction stream 
> some data values at .L4. it would make my life much easier (for research 
> purposes) if i could move these little random scattered data segments 
> into the main data segment or an alternate data segment...
> 
> i realize that the ARM doesn't support a decent load-immediate size 
> (only 12 bits signed) for addresses or data, and that was probably
> why this approach was taken.  however ... i'd like to make a fundamental
> change.

Hm, what you are describing is a poition-independent data model (in ARM's 
ATPCS parlance, RWPI -- read-write position independent), but taken to the 
extreme that even constants are pushed into the global data tables.  
Analysis has shown that this is typically 3-4% less efficient than the 
current model used (see the ATPCS document on ARM's web pages).  Note that 
in order to make this work you would also need support of the linker, 
since you wouldn't know the offsets from your base register until link 
time.  Further, any moderately large program is going to exceed the 4k 
offset range of your base register, meaning that you will either need to 
create one base value per module (= more code at the start of each module 
to set the base register up) or you will have to compile on the assumption 
that a single ldr can't load a constant, something like

	add	Rtmp, Rbase, #OFFSET_HIGH(offset)
	ldr	Rx, [Rtmp, #OFFSET_LOW(offset)]

For really large programs you might even need two add instructions to get 
all the data.  In either case the linker would then have to be able to fix 
up such sequences once the offset was finally known.

> with a few registers already pinned for other uses, like lr, sp, etc,
> i'd like to reserve "another" register for being a pointer to a 
> special data segment of these values - say r11.  then, at the very 
> beginning of the program, r11 gets loaded with a pointer to the data
> segment containing all these address offsets, and we no longer have to
> mix data into the instruction stream.  this is almost what happens now
> with "r3" throughout the program.  it spends most of its life as a 
> pointer to a block of these variable addresses...

Hm, so on the ARM we currently have 16 registers (well, 15 really, since 
one is the PC).  Of these 5 are call clobbered (r0-r3,ip) and one more 
(lr) is effectively call-clobbered since it holds the return address.  
That leaves 9 registers that are call-saved.  But of these we have 3 
(sometimes 4) that already have designated fixed uses (sp is the stack, fp 
(r11) is needed for a frame pointer and r10 is used as the pic register -- 
on some compilation models, r9 is the pic register and r10 is stack-limit 
register).  That leaves 6, sometimes 5, registers that are call-saved for 
normal use.  You can't use r11 since it is already used, so you would have 
to use r9 (or for some compilations r8), that would use up 15-20% of the 
remaining call-saved registers -- that's likely to have a significant 
effect on the efficiency of the rest of your code, since the compiler will 
now have to spill more often.

> we also avoid having the dozens of "ldr r3,<some-var-block>" throughout
> the code generation.  this would make for more efficient code.

Please show me a real example where we get dozens of such accesses that 
would be avoided by your model; the existing model makes use of the PC as 
an effective base register, you would loose that benefit with your 
approach.

> i'd be happy to field any queries on more specifics or suggestsions on
> existing ways to get around this...
> 

I think it probable that code compiled the way you suggest could be made 
to work, but I very much doubt that it would be more efficient.

R.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]