This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC asm block optimizations on x86_64

Rask Ingemann Lambertsen wrote:
On Mon, Aug 27, 2007 at 06:11:04AM +0100, Darryl L. Miles wrote:
	#define U64_DIVIDE_ASM(quotient, remainder, dividend, divisor, overflow)	do {	\
		__asm__ __volatile__(						\
			"\n\t"							\
			"xorl %0,%0\n\t"					\
			"divq %5\n\t"						\
			"jnc 1f\n\t"						\
			"incl %0\n"						\
			"1:\n\t"						\
			"movq %%rax,%2\n\t"					\
			"movq %%rdx,%1\n\t"					\
			: "=&g" (overflow),		/* return */		\
			  "=g" (*remainder),					\
			  "=g" (*quotient)					\
			: "d" (0),			/* argument */		\
			  "a" ((*dividend)),					\
			  "g" ((*divisor))					\
			/*: "rax", "rdx", you'd think you need this to */	\
			/* describe these registers as no longer containing */	\
			/* the assigned input values after asm block */		\
			/* execution, but will not compile witht them set. */	\

I think you want

: "=&r" (overflow),	/* return */
  "=d" (*remainder),
  "=a" (*quotient)
: "1" (0),		/* argument */
  "2" (*dividend),
  "rm" (*divisor)

so the compiler knows that %rax and %rdx are modified.

Yes and when doing that I can remove the two "movq" insns from the asm block as the compiler will generate them for me.

I also kept the double parenthesis due to the macro, it seems (*(dividend)) is correct, the idea is to allow for (*(&foo->bar.fubar)). I also made overflow an input constrint initilized to the value zero and the most perfect __asm__ block I could churn out AFAIKS became:

pseudo-prototype: extern void U64_DIVIDE_ASM(u_int64_t *quotient, u_int64_t *remainder, const u_int64_t *dividend, const u_int64_t *divisor, int &overflow);

#define U64_DIVIDE_ASM(quotient, remainder, dividend, divisor, overflow) do { \
__asm__ __volatile__( \
"\n\t" \
"divq %6\n\t" \
"jnc 1f\n\t" \
"inc %0\n" \
"1:\n\t" \
: "=r" (overflow), /* return */ \
"=d" (*(remainder)), \
"=a" (*(quotient)) \
: "0" (0), \
"1" (0), /* argument */ \
"2" (*(dividend)), \
"rm" (*(divisor)) \
/*: "rax", "rdx"*/ /* no side effects */ \
); \
} while(0)

This gives the compiler the most options for code-gen. Results in a free standing function of:

        movq    (%rdi), %rdi
        xorl    %r8d, %r8d
        movq    %rdx, %r9
        movl    %r8d, %edx
        movq    %rdi, %rax

        divq (%rsi)
        jnc 1f
        inc %r8d

        movq    %rdx, (%rcx)
        movq    %rax, (%r9)
        movl    %r8d, %eax

But this doesn't demonstrate the original possibilities available to the compiler that the compiler didn't see which the constraints allowed for in the original example.

That indicates to me a clear test case to make an improvement upon; then what else might improve as a result of that work. Maybe the problem is that GCC treats the "Setup of inputs" and "Allocation of extra registers" as a single phase to be done together so in the example it did not see that %r8d was free for use because it was made busy in helping initialize the input %rdx.

Thank you for your interest in this matter.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]