This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq



------- Comment #17 from vvv at ru dot ru  2009-05-12 16:40 -------
(In reply to comment #16)
> Created an attachment (id=17783)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17783&action=view) [edit]
> gcc45-pr39942.patch
> Patch that attempts to take into account .p2align directives that are emitted
> for (some) CODE_LABELs and also the gen_align insns that the pass itself
> inserts.  For a CODE_LABEL, say .p2align 16,,10 means either that the .p2align
> directive starts a new 16 byte page (then insns before it are never
> interesting), or nothing was skipped because more than 10 bytes would need to
> be skipped.  But that means the current group could contain only 5 or less
> bytes of instructions before the label, so again, we don't have to look at
> instructions not in the last 5 bytes.
> Another fix is that for MAX_SKIP < 7, ASM_OUTPUT_MAX_SKIP_ALIGN shouldn't emit
> the second .p2align 3, which might (and often does) skip more than MAX_SKIP
> bytes (up to 7).

Nice path. Code looks better. It checked on Linux kernel 2.6.29.2.
But 2 notes:

1.There is no garanty that .p2align will be translated to NOPs. Example:

# cat test.c
void f(int i)
{
        if (i == 1) F(1);
        if (i == 2) F(2);
        if (i == 3) F(3);
        if (i == 4) F(4);
        if (i == 5) F(5);
}
# gcc -o test.s test.c -O2 -S
# cat test.s
        .file   "test.c"
        .text
        .p2align 4,,15
.globl f
        .type   f, @function
f:
.LFB0:
        .cfi_startproc
        cmpl    $1, %edi
        je      .L7
        cmpl    $2, %edi
        je      .L7
        cmpl    $3, %edi
        je      .L7
        cmpl    $4, %edi
        .p2align 4,,5    <------- attempt of padding
        je      .L7
        cmpl    $5, %edi
        je      .L7
        rep
        ret
        .p2align 4,,10
        .p2align 3
.L7:
        xorl    %eax, %eax
        jmp     F
        .cfi_endproc
.LFE0:
        .size   f, .-f
        .ident  "GCC: (GNU) 4.5.0 20090512 (experimental)"
        .section        .note.GNU-stack,"",@progbits

# gcc -o test.out test.s -O2 -c
# objdump -d test.out
0000000000000000 <f>:
   0:   83 ff 01                cmp    $0x1,%edi
   3:   74 1b                   je     20 <f+0x20>
   5:   83 ff 02                cmp    $0x2,%edi
   8:   74 16                   je     20 <f+0x20>
   a:   83 ff 03                cmp    $0x3,%edi
   d:   74 11                   je     20 <f+0x20>
   f:   83 ff 04                cmp    $0x4,%edi
  12:   74 0c                   je     20 <f+0x20>      <---- no NOP here 
  14:   83 ff 05                cmp    $0x5,%edi
  17:   74 07                   je     20 <f+0x20>
  19:   f3 c3                   repz retq 

IMHO, better to insert not .p2align, but NOPs directly. ( I mean line -
emit_insn_before (gen_align (GEN_INT (padsize)), insn); )

2. IMHO, it's bad idea to insert somthing between CMP and conditional jmp.
Quote from Intel 64 and IA-32 Architectures Optimization Reference Manual

>> 3.4.2.2       Optimizing for Macro-fusion
>> Macro-fusion merges two instructions to a single μop. Intel Core Microarchitecture
>> performs this hardware optimization under limited circumstances.
>> The first instruction of the macro-fused pair must be a CMP or TEST instruction. This
>> instruction can be REG-REG, REG-IMM, or a micro-fused REG-MEM comparison. The
>> second instruction (adjacent in the instruction stream) should be a conditional
>> branch.

So if we need to insert NOPs, better to do it _before_ CMP.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39942


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]