[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

vvv at ru dot ru gcc-bugzilla@gcc.gnu.org
Wed May 13 11:43:00 GMT 2009



------- Comment #19 from vvv at ru dot ru  2009-05-13 11:42 -------
(In reply to comment #18)
> No, .p2align is the right thing to do, given that GCC doesn't have 100%
> accurate information about instruction sizes (for e.g. inline asms it can't
> have, for
> stuff where branch shortening can decrease the size doesn't have it until the
> shortening branch phase which is too late for this machine reorg, and in other
> cases the lengths are just upper bounds).  Say .p2align 16,,5 says
> insert a nop up to 5 bytes if you can reach the 16-byte boundary with it,
> otherwise don't insert anything.  But that necessarily means that there were
> less than 11 bytes in the same 16 byte page and if the lower bound insn size
> estimation determined that in 11 bytes you can't have 3 branch changing
> instructions, you are fine.  Breaking of fused compare and jump (32-bit code
> only) is unfortunate, but inserting it before the cmp would mean often
> unnecessarily large padding.

You are rigth, if padding required for every 16-byte page with 4 branches on
it. But Intel writes about "16-byte chunk", not "16-byte page".

Quote from Intel 64 and IA-32 Architectures Optimization Reference Manual:

Assembly/Compiler Coding Rule 10. (M impact, L generality) Do not put
more than four branches in a 16-byte chunk.

IMHO, here chunk - memory range from x to x+10h, where x - _any_ address. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39942



More information about the Gcc-bugs mailing list