This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
Re: GCC asm block optimizations on x86_64
On Tue, Aug 28, 2007 at 11:02:49PM +0100, Darryl Miles wrote:
> Thanks for the note on the peephole, can the peephole substitute
> sequences when there is overlapping lifetimes of various processor
> features. For example the 'flags' bits, you can't peephole a sequence
> that does a compare (setting flag bits) then loads a register with zero
> (not affecting flag bits) then does a branch based on flag bits,
> replacing the loads a register with zero with 'xor' on i386 would
> destroy the flags.
Peephole definitions check for cases like this and won't do the
optimization clobbering the flags register if the flags register is live at
that point.
> 0000000000000090 <u64_divide>:
> 00: 49 89 d1 mov %rdx,%r9 <<- [1] save %rdx in
> %r9 for arg-as-return
> 03: 48 8b 07 mov (%rdi),%rax
> 06: ?? ?? ?? xor %edx,%edx <<- implicit zero of
> high 32bits, would accept xorq %rdx,%rdx
Right, that's why I suggest using "gcc -S -dp" because then it clearly
shows if it's a 32-bit (*movsi_xxx) or a 64-bit (*movdi_xxx) instruction (as
seen from GCC's point of view, since the actual CPU instruction is the same
in this and several other cases).
> 09: ?? ?? xor %r8d,%r8d
Likewise.
> 0b: 48 f7 36 divq (%rsi)
> 0e: 73 02 jae 12 <u64_divide+0x12>
> 10: ?? ?? inc %r8d
Can't you substitute the "jae; inc %r8d" sequence with "adcl $0, %r8d"?
> 12: 49 89 01 mov %rax,(%r9) <<- [1] use saved
> %rdx to return argument
> 15: 48 89 11 mov %rdx,(%rcx)
> 18: ?? ?? mov %r8d,%eax
> 1a: c3 retq
> I also did not say which version of GCC I was using, it was 4.0.2, but
> I've just tried with 4.2.1 and the same code is generated, although -O6
> appears to try and inline things further which lead me to find an
> invalid constraint "g" ((*divisor)) should be "r" ((*divisor)). Since
> it tried to use a constant, although a register or memory via indirected
> register is valid here.
You can use "rm" for such a constraint.
> Another concern that occurs to me is that if the __asm__ constraints are
> not 100% perfect is there anyway to test/permutate every possible way
> for the compiler might generate the code.
I suppose you could write a script which outputs "calls" to the asm
construct with a constant, local variable (which we assume will end up in a
register) or global variable for each operand in turn, then try compiling
and assembling (i.e. -c) the resulting code.
--
Rask Ingemann Lambertsen