86448 – GCC 9 compiler generates slower code for spec 2006 milc on a power9 using -mcpu=power9 than using -mcpu=power8

Bug 86448 - GCC 9 compiler generates slower code for spec 2006 milc on a power9 using -mcpu=power9 than using -mcpu=power8

Summary: GCC 9 compiler generates slower code for spec 2006 milc on a power9 using -mc...

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	9.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:	spec
	Show dependency tree / graph

Reported:	2018-07-09 20:22 UTC by Michael Meissner
Modified:	2019-02-22 22:57 UTC (History)
CC List:	5 users (show)

See Also:
Host:
Target:	powerpc
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Meissner 2018-07-09 20:22:02 UTC

I built spec 2006 on a DD2.2 power9 and ran it.  I noticed that the milc benchmark was 2% slower using either the trunk or the GCC 8 branch (subversion id 262483) if I compiled the code using -mcpu=power9 compared to -mcpu=power8.

Comment 1 kelvin 2018-07-23 23:50:14 UTC

Using trunk on a dedicated DD2.2 power9, I get the following performance comparisons:

	 -mcpu=power8	 -mcpu=power9		
	     28.92	28.97		
	     28.37	28.99		
	     28.13	28.26		
	     29.06	28.12		
	     28.8	28.23		
	     28.9	28.69		
	     28.37	28.48		
	     28.3	28.08		
			        delta	Percent
average	28.60625	28.4775	0.12875	0.45%

Comment 2 kelvin 2018-07-24 18:15:54 UTC

Using the GCC8 branch, svn version id 262483 (the same version tested by Michael), I'm getting the following results:

	 -mcpu=power8	 -mcpu=power9		
	        28.57	28.79		
               	28.41	28.61		
               	28.54	28.21		
	        28.53	28.55		
	        29.02	28.59		
	        28.54	27.34		
	        28.25	26.63		
	        28.56	29.13		
		                	delta	Percent
 average	28.5525	28.23125	0.32125	1.13%

As with my trunk measurements, I'm not seeing a 2% difference.  Rnd I am seeing that targeting power9 produces slightly better performance than targeting power8.

It may be that we're running with different optimization flags.  I used

OPTIMIZE        = -Ofast -mcpu=power9   (or -mcpu=power8)                  
LDOPT           = -m64 -Wl,-q  -Wl,-rpath=%{BASE_DIR}/lib64

I'm inclined to close this issue unless Michael can point me to a different set of options to explore...

Comment 3 Michael Meissner 2018-08-01 23:34:20 UTC

The options I use for spec are:
-O3 -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-model -msave-toc-indirect -mno-pointers-to-nested-functions -fno-aggressive-loop-optimizations -ffast-math -mveclibabi=mass -mrecip=rsqrt -mcpu=power<x>

For C files I use: -fgnu89-inline
For C++ files I use: -std=gnu++98
For Fortran files I use: -fstack-arrays

I use -fno-strict-aliasing on milc (and perlbench) due to it playing pointer games that earlier compilers would generate the wrong code for.  If memory serves, the -fno-strict-aliasing may not show the bug on power{7,8,9} systems.  I know in the perlbench case, the code in spec violates the ISO C standard.  I don't recall what the milc code is.

I use -fno-aggressive-loop-optimizations because some of the benchmarks as written go beyond the end of arrays, and GCC over-optimizes these.

I use version 8.1.3 of the MASS library.  However, milc is not one of the benchmarks that heavily use the math library, so you can omit using MASS and -mveclibabi=mass.

Comment 4 kelvin 2018-08-02 20:46:31 UTC

There are aspects of Michael's recent comment that I may not fully understand.

I checked the source for milc, and it is C, so I added -fgnu89-inline to the list of OPTIMIZE options.  Then I reran my tests with gcc8 (svn version 262483) on a DD2.2 power9 machine.


OPTIMIZE        = -O3 -fpeel-loops -funroll-loops -ftree-vectorize -fvect-cost-\
model -fno-strict-aliasing -msave-toc-indirect -mno-pointers-to-nested-function\
s -fno-aggressive-loop-optimizations -ffast-math -mveclibabi=mass -mrecip=rsqrt\
 -fgnu89-inline -mcpu=power9 (vs. -mcpu=power8)
LDOPT           = -m64 -Wl,-q  -Wl,-rpath=%{BASE_DIR}/lib64

I'm still not seeing the performance degradation Michael saw.  Here are my most recent results:

	gcc8		gcc9			
						
	28.79		28.14			
	29.01		28.84			
	28.51		28.5			
	28.55		28.39			
	29.02		29.07			
	29.1		28.51			
					delta	% delta
average	28.83		28.575		0.255	0.88%

Does anyone see anything I may be doing wrong?

Comment 5 kelvin 2018-08-08 14:49:35 UTC

I apologize for an error in the previous comment.  The two columns should have been labeled -mcpu=power8 (left) and -mcpu=power9 (right) instead of gcc8 and gcc9.

Comment 6 kelvin 2018-08-08 15:52:55 UTC

I should also clarify regarding all of the above comments that the numbers I have been reporting are the spec ratios.  I had misunderstood that these ratios were encoded such that smaller values represented better performance.  So some of my "interpretation remarks" are incorrect.  Still, my measurements do not show the 2% difference that Michael observed, so there remains a question of whether there is enough of a performance change to merit further exploration.

Comment 7 Bill Schmidt 2019-02-22 22:57:55 UTC

Not confirmed at this time.  Let's close it until we have something more definitive to look at.