Bug 23448 - .p2align before a jump instruction
Summary: .p2align before a jump instruction
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.0.2
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-17 21:43 UTC by H.J. Lu
Modified: 2005-08-17 23:53 UTC (History)
1 user (show)

See Also:
Host: i686-pc-linux-gnu
Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:


Attachments
A testcase for gcc 4.0 (722 bytes, application/octet-stream)
2005-08-17 21:46 UTC, H.J. Lu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2005-08-17 21:43:22 UTC
Gcc 4.0 and 4.1 generate .p2align before a jump instruction. minloc1_8_r8.o
in libgfortran has codes like

        movl    $1, 12(%ecx)
        .p2align 4,,2
        jmp     .L19
Comment 1 H.J. Lu 2005-08-17 21:46:24 UTC
Created attachment 9522 [details]
A testcase for gcc 4.0

Here is the testcase for gcc 4.0. x.s is generated with "-O2". x86-64
has the similar problem.
Comment 2 Andrew Pinski 2005-08-17 21:54:08 UTC
Not a bug, it is aligning the loop:
.L5:
        incl    %edx
        cmpl    %edx, %ecx
        je      .L6
        incl    %edx
        cmpl    %edx, %ecx
        .p2align 4,,5
        jne     .L5
Comment 3 Andrew Pinski 2005-08-17 21:57:34 UTC
And next time don't attach a tar file as it is much harder to get at the testcase.
Comment 4 H.J. Lu 2005-08-17 23:02:24 UTC
Were you suggesting

.L5:
        incl    %edx
        cmpl    %edx, %ecx
        je      .L6
        incl    %edx
        cmpl    %edx, %ecx
        jne     .L5

was slower? Where does this information come from?
Comment 5 Andrew Pinski 2005-08-17 23:07:02 UTC
(note 81 50 85 NOTE_INSN_LOOP_END)

(note 85 81 105 [bb 6] NOTE_INSN_BASIC_BLOCK)

(insn 105 85 91 (unspec_volatile [
            (const_int 4 [0x4])
        ] 68) -1 (nil)
    (nil))

Comment 6 Andrew Pinski 2005-08-17 23:11:28 UTC
  if (TARGET_FOUR_JUMP_LIMIT && optimize && !optimize_size)
    ix86_avoid_jump_misspredicts ();

/* Some CPU cores are not able to predict more than 4 branch instructions in
   the 16 byte window.  */
const int x86_four_jump_limit = m_PPRO | m_ATHLON_K8 | m_PENT4 | m_NOCONA;


So this is not a bug.
Comment 7 Andrew Pinski 2005-08-17 23:13:02 UTC
The alignment is so the stupid processor (yes stupid) will not mis predict the jump.
Comment 8 Andrew Pinski 2005-08-17 23:53:36 UTC
from the gcc-patches (since the archives look broken):

looking on recent copy of Intel optimization manual, it has the same
hint as AMD manual about 4 jumps per cache line.
I did SPEC run on the P4 and there is no change except for bzip2 that
improves by about 3%, that is quite expected as the scenario where 5
jumps happens to be in same window is very rare.