23448 – .p2align before a jump instruction

Bug 23448 - .p2align before a jump instruction

Summary: .p2align before a jump instruction

Status:	RESOLVED INVALID

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.0.2

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-08-17 21:43 UTC by H.J. Lu
Modified:	2005-08-17 23:53 UTC (History)
CC List:	1 user (show)

See Also:
Host:	i686-pc-linux-gnu
Target:	i686-pc-linux-gnu
Build:	i686-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:

Attachments
A testcase for gcc 4.0 (722 bytes, application/octet-stream) 2005-08-17 21:46 UTC, H.J. Lu	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description H.J. Lu 2005-08-17 21:43:22 UTC

Gcc 4.0 and 4.1 generate .p2align before a jump instruction. minloc1_8_r8.o
in libgfortran has codes like

        movl    $1, 12(%ecx)
        .p2align 4,,2
        jmp     .L19

Comment 1 H.J. Lu 2005-08-17 21:46:24 UTC

Created attachment 9522 [details]
A testcase for gcc 4.0

Here is the testcase for gcc 4.0. x.s is generated with "-O2". x86-64
has the similar problem.

Comment 2 Andrew Pinski 2005-08-17 21:54:08 UTC

Not a bug, it is aligning the loop:
.L5:
        incl    %edx
        cmpl    %edx, %ecx
        je      .L6
        incl    %edx
        cmpl    %edx, %ecx
        .p2align 4,,5
        jne     .L5

Comment 3 Andrew Pinski 2005-08-17 21:57:34 UTC

And next time don't attach a tar file as it is much harder to get at the testcase.

Comment 4 H.J. Lu 2005-08-17 23:02:24 UTC

Were you suggesting

.L5:
        incl    %edx
        cmpl    %edx, %ecx
        je      .L6
        incl    %edx
        cmpl    %edx, %ecx
        jne     .L5

was slower? Where does this information come from?

Comment 5 Andrew Pinski 2005-08-17 23:07:02 UTC

(note 81 50 85 NOTE_INSN_LOOP_END)

(note 85 81 105 [bb 6] NOTE_INSN_BASIC_BLOCK)

(insn 105 85 91 (unspec_volatile [
            (const_int 4 [0x4])
        ] 68) -1 (nil)
    (nil))

Comment 6 Andrew Pinski 2005-08-17 23:11:28 UTC

  if (TARGET_FOUR_JUMP_LIMIT && optimize && !optimize_size)
    ix86_avoid_jump_misspredicts ();

/* Some CPU cores are not able to predict more than 4 branch instructions in
   the 16 byte window.  */
const int x86_four_jump_limit = m_PPRO | m_ATHLON_K8 | m_PENT4 | m_NOCONA;


So this is not a bug.

Comment 7 Andrew Pinski 2005-08-17 23:13:02 UTC

The alignment is so the stupid processor (yes stupid) will not mis predict the jump.

Comment 8 Andrew Pinski 2005-08-17 23:53:36 UTC

from the gcc-patches (since the archives look broken):

looking on recent copy of Intel optimization manual, it has the same
hint as AMD manual about 4 jumps per cache line.
I did SPEC run on the P4 and there is no change except for bzip2 that
improves by about 3%, that is quite expected as the scenario where 5
jumps happens to be in same window is very rare.