This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC asm block optimizations on x86_64


Rask Ingemann Lambertsen wrote:
On Mon, Aug 27, 2007 at 06:11:04AM +0100, Darryl L. Miles wrote:
	#define U64_DIVIDE_ASM(quotient, remainder, dividend, divisor, overflow)	do {	\
		__asm__ __volatile__(						\
			"\n\t"							\
			"xorl %0,%0\n\t"					\
			"divq %5\n\t"						\
										\
			"jnc 1f\n\t"						\
			"incl %0\n"						\
			"1:\n\t"						\
			"movq %%rax,%2\n\t"					\
			"movq %%rdx,%1\n\t"					\
			: "=&g" (overflow),		/* return */		\
			  "=g" (*remainder),					\
			  "=g" (*quotient)					\
			: "d" (0),			/* argument */		\
			  "a" ((*dividend)),					\
			  "g" ((*divisor))					\
			/*: "rax", "rdx", you'd think you need this to */	\
			/* describe these registers as no longer containing */	\
			/* the assigned input values after asm block */		\
			/* execution, but will not compile witht them set. */	\

I think you want


: "=&r" (overflow),	/* return */
  "=d" (*remainder),
  "=a" (*quotient)
: "1" (0),		/* argument */
  "2" (*dividend),
  "rm" (*divisor)

so the compiler knows that %rax and %rdx are modified.

Yes and when doing that I can remove the two "movq" insns from the asm block as the compiler will generate them for me.


I also kept the double parenthesis due to the macro, it seems (*(dividend)) is correct, the idea is to allow for (*(&foo->bar.fubar)). I also made overflow an input constrint initilized to the value zero and the most perfect __asm__ block I could churn out AFAIKS became:


pseudo-prototype: extern void U64_DIVIDE_ASM(u_int64_t *quotient, u_int64_t *remainder, const u_int64_t *dividend, const u_int64_t *divisor, int &overflow);


#define U64_DIVIDE_ASM(quotient, remainder, dividend, divisor, overflow) do { \
__asm__ __volatile__( \
"\n\t" \
"divq %6\n\t" \
\
"jnc 1f\n\t" \
"inc %0\n" \
"1:\n\t" \
: "=r" (overflow), /* return */ \
"=d" (*(remainder)), \
"=a" (*(quotient)) \
: "0" (0), \
"1" (0), /* argument */ \
"2" (*(dividend)), \
"rm" (*(divisor)) \
/*: "rax", "rdx"*/ /* no side effects */ \
); \
} while(0)


This gives the compiler the most options for code-gen. Results in a free standing function of:

u64_divide:
        movq    (%rdi), %rdi
        xorl    %r8d, %r8d
        movq    %rdx, %r9
        movl    %r8d, %edx
        movq    %rdi, %rax
#APP

        divq (%rsi)
        jnc 1f
        inc %r8d
1:

#NO_APP
        movq    %rdx, (%rcx)
        movq    %rax, (%r9)
        movl    %r8d, %eax
        ret


But this doesn't demonstrate the original possibilities available to the compiler that the compiler didn't see which the constraints allowed for in the original example.


That indicates to me a clear test case to make an improvement upon; then what else might improve as a result of that work. Maybe the problem is that GCC treats the "Setup of inputs" and "Allocation of extra registers" as a single phase to be done together so in the example it did not see that %r8d was free for use because it was made busy in helping initialize the input %rdx.

Thank you for your interest in this matter.


Darryl



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]