This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq
- From: "vvv at ru dot ru" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 12 May 2009 16:40:51 -0000
- Subject: [Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq
- References: <bug-39942-17483@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #17 from vvv at ru dot ru 2009-05-12 16:40 -------
(In reply to comment #16)
> Created an attachment (id=17783)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17783&action=view) [edit]
> gcc45-pr39942.patch
> Patch that attempts to take into account .p2align directives that are emitted
> for (some) CODE_LABELs and also the gen_align insns that the pass itself
> inserts. For a CODE_LABEL, say .p2align 16,,10 means either that the .p2align
> directive starts a new 16 byte page (then insns before it are never
> interesting), or nothing was skipped because more than 10 bytes would need to
> be skipped. But that means the current group could contain only 5 or less
> bytes of instructions before the label, so again, we don't have to look at
> instructions not in the last 5 bytes.
> Another fix is that for MAX_SKIP < 7, ASM_OUTPUT_MAX_SKIP_ALIGN shouldn't emit
> the second .p2align 3, which might (and often does) skip more than MAX_SKIP
> bytes (up to 7).
Nice path. Code looks better. It checked on Linux kernel 2.6.29.2.
But 2 notes:
1.There is no garanty that .p2align will be translated to NOPs. Example:
# cat test.c
void f(int i)
{
if (i == 1) F(1);
if (i == 2) F(2);
if (i == 3) F(3);
if (i == 4) F(4);
if (i == 5) F(5);
}
# gcc -o test.s test.c -O2 -S
# cat test.s
.file "test.c"
.text
.p2align 4,,15
.globl f
.type f, @function
f:
.LFB0:
.cfi_startproc
cmpl $1, %edi
je .L7
cmpl $2, %edi
je .L7
cmpl $3, %edi
je .L7
cmpl $4, %edi
.p2align 4,,5 <------- attempt of padding
je .L7
cmpl $5, %edi
je .L7
rep
ret
.p2align 4,,10
.p2align 3
.L7:
xorl %eax, %eax
jmp F
.cfi_endproc
.LFE0:
.size f, .-f
.ident "GCC: (GNU) 4.5.0 20090512 (experimental)"
.section .note.GNU-stack,"",@progbits
# gcc -o test.out test.s -O2 -c
# objdump -d test.out
0000000000000000 <f>:
0: 83 ff 01 cmp $0x1,%edi
3: 74 1b je 20 <f+0x20>
5: 83 ff 02 cmp $0x2,%edi
8: 74 16 je 20 <f+0x20>
a: 83 ff 03 cmp $0x3,%edi
d: 74 11 je 20 <f+0x20>
f: 83 ff 04 cmp $0x4,%edi
12: 74 0c je 20 <f+0x20> <---- no NOP here
14: 83 ff 05 cmp $0x5,%edi
17: 74 07 je 20 <f+0x20>
19: f3 c3 repz retq
IMHO, better to insert not .p2align, but NOPs directly. ( I mean line -
emit_insn_before (gen_align (GEN_INT (padsize)), insn); )
2. IMHO, it's bad idea to insert somthing between CMP and conditional jmp.
Quote from Intel 64 and IA-32 Architectures Optimization Reference Manual
>> 3.4.2.2 Optimizing for Macro-fusion
>> Macro-fusion merges two instructions to a single μop. Intel Core Microarchitecture
>> performs this hardware optimization under limited circumstances.
>> The first instruction of the macro-fused pair must be a CMP or TEST instruction. This
>> instruction can be REG-REG, REG-IMM, or a micro-fused REG-MEM comparison. The
>> second instruction (adjacent in the instruction stream) should be a conditional
>> branch.
So if we need to insert NOPs, better to do it _before_ CMP.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39942