[REVISED][PATCH/RFT] Fix PR middle-end/PR28690, modify swap_commutative_operands_p

Wed Jun 20 06:20:00 GMT 2007

On Tue, Jun 19, 2007 at 05:38:14PM +0200, Paolo Bonzini wrote:
> 
> >>>  So instead of writing
> >>>
> >>>    .p2align 4,,7
> >>>
> >>>can't you write the sequence:
> >>>
> >>>    .p2align 4,,7
> >>>    .p2align 3,,7
> >>>
> >>>which will have the exact effect you're asking for there?
> >>Yeah, that's the affect I'm looking for.  However, my comment was
> >>more of a hypothetical "Why doesn't it work that way?" type of
> >>question, rather than a "I'm really interested in this and am
> >>going to fix this!".  I guess I just thought it curious.
> >
> >Well, why should it work that way?  It's designed to give you a
> >tradeoff between alignment and saving code space.  As long as you can
> >specify precisely what you want--and, as Dave shows, you can--it
> >should let you do that, rather than guessing what you might want.
> 
> I think Peter is reading ".p2align 4,,7" as "give me *as much alignment 
> as you can* (up to 2^4) with a 7-byte sequence".  Instead, it is "give 
> me 2^4 alignment *if you can* with a 7-byte sequence".  Two remarks:
> 
> 1) I guess emitting an additional ".p2align 3,,7" is better anyway, 
> because it would improve performance.

For .p2align, the assembler after 2006-06-23 will generate a single
nop up to 10 bytes:

0x66,0x2e,0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00

by default for 64bit and with -march=i686 or above for 32bit while
the older assembler can only generate a single nop up to 7 bytes:

0x8d,0xb4,0x26,0x00,0x00,0x00,0x00

I think we should use ".p2align 4,,10" instead of ".p2align 4,,7"
for 64bit. It will optimize for more cases for 64bit.

H.J.