Poor code x86 code generation with constrained assembler

Kevin.Hughes Kevin.Hughes
Fri Sep 17 08:03:00 GMT 1999


I have discovered some poor x86 code generation when we use inline assembler
routines for accessing store on 2.95.1 

The simple C test program is:

extern unsigned long long x;
extern unsigned long base;

static inline unsigned long long ReadLong1(const unsigned Offset) 
{
	register unsigned long long temp;

	asm volatile ("movl	%%fs:4(%1), %%eax
	bswap	%%eax
	movl	(%1), %%edx
	bswap	%%edx"
	: "=A"(temp)
	: "c"(Offset));
	return temp;
}

void testf1(const unsigned long offset)
{
	x=ReadLong1(offset+base);
}

static inline const unsigned long long ReadLong2(const unsigned Offset) 
{
	register unsigned long long temp;
	asm volatile ("movl	%%fs:4(%1), %%eax
	bswap	%%eax
	movl	(%1), %%edx
	bswap	%%edx"
	: "=&A"(temp)
	: "q"(Offset));
	return temp;
}

void testf2(const unsigned long offset)
{
	x=ReadLong2(offset+base);
}

The only difference between testf1 and testf2 is that they call different
versions  of ReadLong which only differ in the constraint of the address
used - one uses any register ("q") the other explicitly uses ecx ("c"). The
change the the output (temp) is to ensure that the compiler chooses a
different register for the offset than it uses for temp.

The resulting assembler code is

	.file	"test_asm.c"
gcc2_compiled.:
___gnu_compiled_cplusplus:
.text
	.align 2
.globl _testf1__FUl
_testf1__FUl:
LFB1:
	movl _base,%ecx
	addl 4(%esp),%ecx
/APP
	movl	%fs:4(%ecx), %eax
	bswap	%eax
	movl	(%ecx), %edx
	bswap	%edx
/NO_APP
	movl %eax,_x
	movl %edx,_x+4
	ret
LFE1:
	.align 2
.globl _testf2__FUl
_testf2__FUl:
LFB2:
	movl _base,%ecx
	pushl %ebx
LCFI0:
	addl 8(%esp),%ecx
/APP
	movl	%fs:4(%ecx), %eax
	bswap	%eax
	movl	(%ecx), %edx
	bswap	%edx
/NO_APP
	movl %eax,%ecx
	movl %edx,%ebx
	movl %ecx,_x
	movl %ebx,_x+4
	popl %ebx
	ret

The problem is that the code where the compiler can choose its own register
for the offset results in 2 extra instructions move eax/edx into edc/ebx.
This also causes ebx to be pushed and poped. 

The code is better than produced by 2.7.2 which insisted on using eax for
the address calculation and then copying it to ecx - which is why the
constraints in ReadLong1 are as they are. 

I would very much appreciate any ideas as to how to make the compiler sort
this out.



Kevin





Kevin


More information about the Gcc-bugs mailing list