This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Extending constraints using register subclasses

Jamie Prescott schrieb:
On Mon, May 11, 2009 at 4:45 PM, Jamie Prescott wrote:

I wanted to add finer (one per) register subclasses, so that I can more



the register placement inside the inline assembly.

You don't need that. You can just use asm("registername") on variables. like so:

int f(int a)
register int r0 __asm__("r0");
asm("use %0": "+r"(r0) );

Andrew Pinski

That works with gcc 3.x but causes trouble and won't work smooth with gcc 4.x.

Here are just two examples that will crash for target AVR:

// ice.c ////////////////////////////////////////////////
typedef union
   unsigned char  asByte[2];
   unsigned short asWord;
} data16_t;

data16_t mac10 (data16_t);

void put_sfrac16 (data16_t q)
   unsigned char i;
   for (i=0; i < 2; i++)
       register data16_t digit asm ("r24");
       digit.asByte[1] = 0;

       digit.asByte[0] = q.asByte[0];
       digit = mac10 (digit);
       q.asByte[0] = digit.asByte[0];

compiling this with

avr-gcc ice.c -Os -S

rund into ICE in fwprop:

ice.c: In function 'put_sfrac16':
ice.c:21: internal compiler error: in propagate_rtx, at fwprop.c:469

GNU C (WinAVR 20090313) version 4.3.2 (avr) compiled by GNU C
version 3.4.5 (mingw-vista special r3), GMP version 4.2.3,
MPFR version 2.4.0.

The second example shows that global register variables are pretty much useless in gcc 4.x. I am using global registers in gcc 3.4.6 and it work smooth. Without global regs I would have to pay considerable performance penalties, so I am still using avr-gcc 3.4.6 (which produces code that is both faster and smaller than code from any avr-gcc 4.x I tested so far: from 4.1 up to 4.5). Here it is:

// bug.c ///////////////////////////////////////////////
// avr-gcc bug.c -Os -S -ffixed-2 -ffixed-3

register unsigned int currData asm ("r2");

// from typedef unsigned int uint8_t __attribute__((__mode__(__QI__)));

// from #define PINB (*(volatile uint8_t *)(0x16 + 0x20))
#define PORTB (*(volatile uint8_t *)(0x18 + 0x20))

// from void __vector_1 (void) __attribute__ ((signal,used,externally_visible));
void __vector_1 (void)
// Nothing special, just an ISR that depends on currData
if (currData & 0x4000)
PORTB |= 2;
PORTB &= ~2;

   currData = currData << 1;

void foo (void)
   for (;;)
       while (PINB & 1);
       currData = 0x2123;
       while (!(PINB & 1));

CSE decides that currData in foo is dead. As everyone can see, the produced assembler does not touch the global register R3:R2. Making global reg vars volatile is iseless (gcc goesn' even warn about that and gcc doesn't even have a flag to tag a global reg volatile).

There are just (conditional) branches left that jump around and poll SFRs:

/* prologue: function */
/* frame size = 0 */
   sbic 54-32,0     ;  12    *sbix_branch    [length = 2]
   rjmp .L13
   sbis 54-32,0     ;  22    *sbix_branch    [length = 2]
   rjmp .L10
   rjmp .L13     ;  50    jump    [length = 1]
   .size    foo, .-foo

Global register variables can boost performance in embedded applications. It's a pitty that this is pretty much useless in "modern" gcc 4.x.

Oh, this is bad news :/
Is the problem only with global register allocations, or even with local ones?
Wouldn't the approach of having register subclasses and handling them with ad-hoc contraints
(like the road I took before) be a more universally working solution?

I didn't try that yet.

If fact, I introduced fine grained register classes (one class per GPR) for a new port of gcc 4.3.3 just for the same reason Jamie Prescott did: To have better control in inline asm. However, I didn't yet test how these fine grained classes work in inline assembler (so little time...).

All what I can say is that register allocation works fine and there are no performance losses due to the bulk of reg least with libs and testsuite, the conpiler never ran on real-world code.

But I am sure that this approach won't help with the problems above like ICE in fwprop or optimizing away fixed registers.

One-class-per-reg could be helpful in the following scenario:

Suppose there is an assembler function which you want to call from C code. Providing a prototype and calling that function will make gcc emit some kind of call insn. The result is that every call-used reg must be assumed to be clobbered, but often such asm functions clobber much less regs. The idea is then to do a "transparent call", i.e. doing the call in inline asm so that gcc does not emit a call insn. Instead, you would list just the regs which the function clobbers and would have to pass/receive values in the GPRs the ABI tells.

This technique is used in many gcc backends, namely AVR. But it is not possible to use that approach from C+inline asm. Note that local register variables will occupy the register for *all* of the function, which is not what you want. You just want a constraint to tell which value to go where.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]