Bug 97329 - POWER9 default cache and line sizes appear to be wrong
Summary: POWER9 default cache and line sizes appear to be wrong
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 11.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-08 03:35 UTC by Kip Warner
Modified: 2021-03-25 05:01 UTC (History)
5 users (show)

See Also:
Host:
Target: power9
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-10-08 00:00:00


Attachments
Autoconf configuration log on POWER9. (6.66 KB, text/plain)
2020-10-08 23:53 UTC, Kip Warner
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kip Warner 2020-10-08 03:35:42 UTC
While investigating the memory hierarchy on my Romulus POWER9 (CPU revision 2.2) I discovered GCC's default L1 cache and line sizes on POWER9 are not correct. 

I think whoever specified the default cache size of 64KB may not have realized the L1 cache is banked, and not unified. On POWER9 that 64KB is split between separate instructions and data spaces. Only 32KB is actually available for data.

GCC's documentation specifies that the l1-cache-size parameter is suppose to
refer to data only, but not instructions.

Further, the default l1-cache-line-size is wrong. It's currently set at 32 bytes. The correct value is actually four times that at 128 bytes.

As things are right now, the resulting generated code may not be properly optimized because the optimizer plans around the wrong parameters.

When this happens the generated program may have a higher than necessary cache miss rate. This could be a big deal since the CPU retrieving data from the L1 may only take one or two cycles, but a cache miss could mean several hundred while the block is transferred.
Comment 1 Kip Warner 2020-10-08 05:31:54 UTC
Just tested with Git head (11.0.0 20201008) and same issue:

$ gcc --version
gcc (GCC) 11.0.0 20201008 (experimental)
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -Q -mcpu=power9 --help=param | grep -i cache
  --param=l1-cache-line-size= 		128
  --param=l1-cache-size=      		32
  --param=l2-cache-size=      		512

$ getconf -a | grep CACHE
LEVEL1_ICACHE_SIZE                 32768
LEVEL1_ICACHE_ASSOC                32
LEVEL1_ICACHE_LINESIZE             128
LEVEL1_DCACHE_SIZE                 32768
LEVEL1_DCACHE_ASSOC                32
LEVEL1_DCACHE_LINESIZE             128
LEVEL2_CACHE_SIZE                  524288
LEVEL2_CACHE_ASSOC                 2048
LEVEL2_CACHE_LINESIZE              32
LEVEL3_CACHE_SIZE                  10485760
LEVEL3_CACHE_ASSOC                 40960
LEVEL3_CACHE_LINESIZE              32
LEVEL4_CACHE_SIZE                  0
LEVEL4_CACHE_ASSOC                 0
LEVEL4_CACHE_LINESIZE              0

$ cat /proc/cpuinfo
processor       : 0
cpu             : POWER9, altivec supported
clock           : 2933.000000MHz
revision        : 2.2 (pvr 004e 1202)

(...)

processor       : 143
cpu             : POWER9, altivec supported
clock           : 2166.000000MHz
revision        : 2.2 (pvr 004e 1202)

timebase        : 512000000
platform        : PowerNV
model           : 0000000000000000
machine         : PowerNV 0000000000000000
firmware        : OPAL
MMU             : Radix
Comment 2 Kip Warner 2020-10-08 05:39:53 UTC
Sorry, not same issue. It appears as though this was fixed in gcc-11.
Comment 3 Segher Boessenkool 2020-10-08 19:41:36 UTC
At least as far back as GCC 5 we report D-L1 size 64kB (for most CPUs,
not just p9).  Confirmed.
Comment 4 Kip Warner 2020-10-08 20:10:47 UTC
I'm going to do some more testing tonight and report back after.
Comment 5 Segher Boessenkool 2020-10-08 20:11:13 UTC
So both the cache line size and the cache size are wrong for GCC 10
and before, but okay on trunk, on all compiler I tested (I tested on
Linux only so far).
Comment 6 Kip Warner 2020-10-08 23:53:37 UTC
Created attachment 49333 [details]
Autoconf configuration log on POWER9.
Comment 7 Kip Warner 2020-10-08 23:53:52 UTC
So it looks like even with GCC 11 in trunk it's still sometimes wrong on power9.

Wrong L2 cache size when no -mcpu specified:

$ gcc -Q --help=param | grep -i cache
  --param=l1-cache-line-size= 		128
  --param=l1-cache-size=      		32
  --param=l2-cache-size=      		256

Correct when manually specifying native (power9) cpu:

$ gcc -Q -mcpu=native --help=param | grep -i cache
  --param=l1-cache-line-size= 		128
  --param=l1-cache-size=      		32
  --param=l2-cache-size=      		512

Correct when manually specifying power9 cpu:

$ gcc -Q -mcpu=power9 --help=param | grep -i cache
  --param=l1-cache-line-size= 		128
  --param=l1-cache-size=      		32
  --param=l2-cache-size=      		512

Wrong L2 cache size when powerpc64le is selected in place of power9:

$ gcc -Q -mcpu=powerpc64le --help=param | grep -i cach
  --param=l1-cache-line-size= 		128
  --param=l1-cache-size=      		32
  --param=l2-cache-size=      		256

Looks like this might be a clue. GCC did not identify the host/build/target as power9 automatically:

$ gcc -dumpmachine
powerpc64le-unknown-linux-gnu

I built it from trunk last night on a power9 machine. I've attached my config.log.

$ gcc --version
gcc (GCC) 11.0.0 20201008 (experimental)
(...)
Comment 8 Segher Boessenkool 2020-10-09 12:55:25 UTC
The default -mcpu= for a compiler targeting powerpc64le-linux is
normally power8 (you can change this with the --with-cpu= configure
option though).  -mcpu=powerpc64le is also (currently) equal to
-mcpu=power8.  But the numbers for Power8 (in power8_cost) are
wrong it seems: it has a 64kB L1-D cache, and a 512kB L2 cache (it
looks like we have simply copied the Power7 numbers here; 32 and
256 is correct for Power7).
Comment 9 Xionghu Luo (luoxhu@gcc.gnu.org) 2021-03-23 06:54:39 UTC
Yes, it seems a copy paste error for Power8 from Power7.  Is this supposed to be fix by gcc-12 stage1? And any performance evaluation required?


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 616dae35bae..34c4edae20e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1055,7 +1055,7 @@ struct processor_costs power8_cost = {
   COSTS_N_INSNS (17),  /* ddiv */
   128,                 /* cache line size */
   32,                  /* l1 cache */
-  256,                 /* l2 cache */
+  512,                 /* l2 cache */
   12,                  /* prefetch streams */
   COSTS_N_INSNS (3),   /* SF->DF convert */
 };
Comment 10 Segher Boessenkool 2021-03-23 22:44:10 UTC
GCC 11 stage 4 will be fine.

I doubt you can ever measure a difference, but you can try :-)
Comment 11 Xionghu Luo (luoxhu@gcc.gnu.org) 2021-03-25 01:05:51 UTC
Fixed with r11-7821-g08103e4d6ada9b57366f2df2a2b745babfab914c.
Comment 12 GCC Commits 2021-03-25 04:53:48 UTC
The releases/gcc-10 branch has been updated by Xiong Hu Luo <luoxhu@gcc.gnu.org>:

https://gcc.gnu.org/g:52eacca2455f7468d7ddb990259e8583028c5185

commit r10-9541-g52eacca2455f7468d7ddb990259e8583028c5185
Author: Xionghu Luo <luoxhu@linux.ibm.com>
Date:   Wed Mar 24 23:45:58 2021 -0500

    rs6000: Correct Power8 cost of l2 cache size [PR97329]
    
    This patch is a backport to gcc 10 from master.
    L2 cache size for Power8 is 512kB, it was copied from Power7 before
    public.  Tested no performance change for SPEC2017.
    
    gcc/
    2021-03-25  Xionghu Luo  <luoxhu@linux.ibm.com>
    
            PR target/97329
            * config/rs6000/rs6000.c (power8_costs): Change l2 cache
            from 256 to 512.
    
    (cherry picked from commit 08103e4d6ada9b57366f2df2a2b745babfab914c)
Comment 13 GCC Commits 2021-03-25 05:01:11 UTC
The releases/gcc-9 branch has been updated by Xiong Hu Luo <luoxhu@gcc.gnu.org>:

https://gcc.gnu.org/g:48354138267e0682f61866003b67a9851d3be3a2

commit r9-9307-g48354138267e0682f61866003b67a9851d3be3a2
Author: Xionghu Luo <luoxhu@linux.ibm.com>
Date:   Wed Mar 24 23:45:58 2021 -0500

    rs6000: Correct Power8 cost of l2 cache size [PR97329]
    
    This patch is a backport to gcc 9 from master.
    L2 cache size for Power8 is 512kB, it was copied from Power7 before
    public.  Tested no performance change for SPEC2017.
    
    gcc/
    2021-03-25  Xionghu Luo  <luoxhu@linux.ibm.com>
    
            PR target/97329
            * config/rs6000/rs6000.c (power8_costs): Change l2 cache
            from 256 to 512.
    
    (cherry picked from commit 08103e4d6ada9b57366f2df2a2b745babfab914c)