Poor code x86 code generation with constrained assembler
Kevin.Hughes
Kevin.Hughes
Fri Sep 17 08:03:00 GMT 1999
I have discovered some poor x86 code generation when we use inline assembler
routines for accessing store on 2.95.1
The simple C test program is:
extern unsigned long long x;
extern unsigned long base;
static inline unsigned long long ReadLong1(const unsigned Offset)
{
register unsigned long long temp;
asm volatile ("movl %%fs:4(%1), %%eax
bswap %%eax
movl (%1), %%edx
bswap %%edx"
: "=A"(temp)
: "c"(Offset));
return temp;
}
void testf1(const unsigned long offset)
{
x=ReadLong1(offset+base);
}
static inline const unsigned long long ReadLong2(const unsigned Offset)
{
register unsigned long long temp;
asm volatile ("movl %%fs:4(%1), %%eax
bswap %%eax
movl (%1), %%edx
bswap %%edx"
: "=&A"(temp)
: "q"(Offset));
return temp;
}
void testf2(const unsigned long offset)
{
x=ReadLong2(offset+base);
}
The only difference between testf1 and testf2 is that they call different
versions of ReadLong which only differ in the constraint of the address
used - one uses any register ("q") the other explicitly uses ecx ("c"). The
change the the output (temp) is to ensure that the compiler chooses a
different register for the offset than it uses for temp.
The resulting assembler code is
.file "test_asm.c"
gcc2_compiled.:
___gnu_compiled_cplusplus:
.text
.align 2
.globl _testf1__FUl
_testf1__FUl:
LFB1:
movl _base,%ecx
addl 4(%esp),%ecx
/APP
movl %fs:4(%ecx), %eax
bswap %eax
movl (%ecx), %edx
bswap %edx
/NO_APP
movl %eax,_x
movl %edx,_x+4
ret
LFE1:
.align 2
.globl _testf2__FUl
_testf2__FUl:
LFB2:
movl _base,%ecx
pushl %ebx
LCFI0:
addl 8(%esp),%ecx
/APP
movl %fs:4(%ecx), %eax
bswap %eax
movl (%ecx), %edx
bswap %edx
/NO_APP
movl %eax,%ecx
movl %edx,%ebx
movl %ecx,_x
movl %ebx,_x+4
popl %ebx
ret
The problem is that the code where the compiler can choose its own register
for the offset results in 2 extra instructions move eax/edx into edc/ebx.
This also causes ebx to be pushed and poped.
The code is better than produced by 2.7.2 which insisted on using eax for
the address calculation and then copying it to ecx - which is why the
constraints in ReadLong1 are as they are.
I would very much appreciate any ideas as to how to make the compiler sort
this out.
Kevin
Kevin
More information about the Gcc-bugs
mailing list