Bug 86448 - GCC 9 compiler generates slower code for spec 2006 milc on a power9 using -mcpu=power9 than using -mcpu=power8
Summary: GCC 9 compiler generates slower code for spec 2006 milc on a power9 using -mc...
Status: RESOLVED WORKSFORME
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2018-07-09 20:22 UTC by Michael Meissner
Modified: 2019-02-22 22:57 UTC (History)
5 users (show)

See Also:
Host:
Target: powerpc
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Meissner 2018-07-09 20:22:02 UTC
I built spec 2006 on a DD2.2 power9 and ran it.  I noticed that the milc benchmark was 2% slower using either the trunk or the GCC 8 branch (subversion id 262483) if I compiled the code using -mcpu=power9 compared to -mcpu=power8.
Comment 1 kelvin 2018-07-23 23:50:14 UTC
Using trunk on a dedicated DD2.2 power9, I get the following performance comparisons:

	 -mcpu=power8	 -mcpu=power9		
	     28.92	28.97		
	     28.37	28.99		
	     28.13	28.26		
	     29.06	28.12		
	     28.8	28.23		
	     28.9	28.69		
	     28.37	28.48		
	     28.3	28.08		
			        delta	Percent
average	28.60625	28.4775	0.12875	0.45%
Comment 2 kelvin 2018-07-24 18:15:54 UTC
Using the GCC8 branch, svn version id 262483 (the same version tested by Michael), I'm getting the following results:

	 -mcpu=power8	 -mcpu=power9		
	        28.57	28.79		
               	28.41	28.61		
               	28.54	28.21		
	        28.53	28.55		
	        29.02	28.59		
	        28.54	27.34		
	        28.25	26.63		
	        28.56	29.13		
		                	delta	Percent
 average	28.5525	28.23125	0.32125	1.13%

As with my trunk measurements, I'm not seeing a 2% difference.  Rnd I am seeing that targeting power9 produces slightly better performance than targeting power8.

It may be that we're running with different optimization flags.  I used

OPTIMIZE        = -Ofast -mcpu=power9   (or -mcpu=power8)                  
LDOPT           = -m64 -Wl,-q  -Wl,-rpath=%{BASE_DIR}/lib64

I'm inclined to close this issue unless Michael can point me to a different set of options to explore...
Comment 3 Michael Meissner 2018-08-01 23:34:20 UTC
The options I use for spec are:
-O3 -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -msave-toc-indirect -mno-pointers-to-nested-functions -fno-aggressive-loop-optimizations -ffast-math -mveclibabi=mass -mrecip=rsqrt -mcpu=power<x>

For C files I use: -fgnu89-inline
For C++ files I use: -std=gnu++98
For Fortran files I use: -fstack-arrays

I use -fno-strict-aliasing on milc (and perlbench) due to it playing pointer games that earlier compilers would generate the wrong code for.  If memory serves, the -fno-strict-aliasing may not show the bug on power{7,8,9} systems.  I know in the perlbench case, the code in spec violates the ISO C standard.  I don't recall what the milc code is.

I use -fno-aggressive-loop-optimizations because some of the benchmarks as written go beyond the end of arrays, and GCC over-optimizes these.

I use version 8.1.3 of the MASS library.  However, milc is not one of the benchmarks that heavily use the math library, so you can omit using MASS and -mveclibabi=mass.
Comment 4 kelvin 2018-08-02 20:46:31 UTC
There are aspects of Michael's recent comment that I may not fully understand.

I checked the source for milc, and it is C, so I added -fgnu89-inline to the list of OPTIMIZE options.  Then I reran my tests with gcc8 (svn version 262483) on a DD2.2 power9 machine.


OPTIMIZE        = -O3 -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-\
model -fno-strict-aliasing -msave-toc-indirect -mno-pointers-to-nested-function\
s -fno-aggressive-loop-optimizations -ffast-math -mveclibabi=mass -mrecip=rsqrt\
 -fgnu89-inline -mcpu=power9 (vs. -mcpu=power8)
LDOPT           = -m64 -Wl,-q  -Wl,-rpath=%{BASE_DIR}/lib64

I'm still not seeing the performance degradation Michael saw.  Here are my most recent results:

	gcc8		gcc9			
						
	28.79		28.14			
	29.01		28.84			
	28.51		28.5			
	28.55		28.39			
	29.02		29.07			
	29.1		28.51			
					delta	% delta
average	28.83		28.575		0.255	0.88%

Does anyone see anything I may be doing wrong?
Comment 5 kelvin 2018-08-08 14:49:35 UTC
I apologize for an error in the previous comment.  The two columns should have been labeled -mcpu=power8 (left) and -mcpu=power9 (right) instead of gcc8 and gcc9.
Comment 6 kelvin 2018-08-08 15:52:55 UTC
I should also clarify regarding all of the above comments that the numbers I have been reporting are the spec ratios.  I had misunderstood that these ratios were encoded such that smaller values represented better performance.  So some of my "interpretation remarks" are incorrect.  Still, my measurements do not show the 2% difference that Michael observed, so there remains a question of whether there is enough of a performance change to merit further exploration.
Comment 7 Bill Schmidt 2019-02-22 22:57:55 UTC
Not confirmed at this time.  Let's close it until we have something more definitive to look at.