Bug 84627 - powerpc excess padding generated for POWER9 target
Summary: powerpc excess padding generated for POWER9 target
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 8.0.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2018-03-01 03:28 UTC by Nicholas Piggin
Modified: 2018-03-05 22:42 UTC (History)
1 user (show)

See Also:
Host:
Target: powerpc
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nicholas Piggin 2018-03-01 03:28:19 UTC
gcc version 8.0.1 20180207 (experimental) [trunk revision 257435] (Debian 8-20180207-2):

Test case:

void test(void (*fn)(void), unsigned long i)
{ 
        while (i--)
                fn();
}

Generates code with -O2 -mcpu=power9:

  2c:   18 00 41 f8     std     r2,24(r1)
  30:   00 00 00 60     nop
  34:   00 00 00 60     nop
  38:   00 00 00 60     nop
  3c:   00 00 42 60     ori     r2,r2,0
  40:   78 f3 cc 7f     mr      r12,r30
  44:   a6 03 c9 7f     mtctr   r30
  48:   ff ff ff 3b     addi    r31,r31,-1
  4c:   21 04 80 4e     bctrl
  50:   18 00 41 e8     ld      r2,24(r1)
  54:   ff ff bf 2f     cmpdi   cr7,r31,-1
  58:   e8 ff 9e 40     bne     cr7,40 <test+0x40>

On power9, I wonder if nops and ori should be avoided? Aligning to quad word boundary is reasonable for fetch, but is there any advantage for dispatch to add the extra padding?
Comment 1 Andrew Pinski 2018-03-01 03:34:06 UTC
Is this little endian or big-endian?  32bit or 64bit?
Comment 2 Nicholas Piggin 2018-03-01 03:35:29 UTC
Ah sorry, target is powerpc64le
Comment 3 Segher Boessenkool 2018-03-02 16:40:40 UTC
The nops are to align the following code (a jump target) to the correct
alignment.  GAS used an ori 2,2,0 for the last nop, to make sure a new
dispatch group starts after it, for p8.  Does your GAS support power9?
Does binutils trunk behave this way too?  Plain nop is ever so slightly
more efficient, so it is prefered.

The generated code is

        std 2,24(1)
        .p2align 5,,31
.L3:
        mr 12,30
        mtctr 30
        addi 31,31,-1
        bctrl

so there is nothing GCC can do about this.  Please follow up to binutils
bugzilla if necessary.  Thanks!
Comment 4 Nicholas Piggin 2018-03-05 22:42:48 UTC
After some more discussion and testing, it was determined that for POWER9, 16 bytes of padding is sufficient for these small loops and was adjusted.

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=258260

Author: 	segher
Date: 	Mon Mar 5 19:11:54 2018 UTC (3 hours, 27 minutes ago)
Log Message:

rs6000: Don't align tiny loops to 32 bytes for POWER9

For POWER4..POWER8 we align loops of 5..8 instructions to 32 bytes
(instead of to 16 bytes) because that executes faster.  This is no
longer the case on POWER9, so we can just as well only align to 16
bytes.


	* config/rs6000/rs6000.c (rs6000_loop_align): Don't align tiny loops
	to 32 bytes when compiling for POWER9.