AVX generic mode tuning discussion.

harsha.jagasia@amd.com harsha.jagasia@amd.com
Tue Jul 12 22:26:00 GMT 2011


We would like to propose changing AVX generic mode tuning to generate 128-bit
AVX instead of 256-bit AVX. As per H.J's suggestion, we have reviewed the
various tuning choices made for generic mode with respect to AMD's upcoming
Bulldozer processor. At this moment, this is the most significant change we
have to propose. While we are willing to re-engineer generic mode, this
feature needs immediate discussion since the performance impact on Bulldozer
is significant.

Here is the relative CPU2006 performance data we have gathered using gcc on AMD
Bulldozer (BD) and Intel Sandybridge (SB) machines with "-Ofast -mtune=generic
-mavx".

		%gain/loss avx256 vs avx128
		(negative % indicates loss
		positive % indicates gain)

		AMD BD	Intel SB
410.bwaves	-2.34	-1.52   	   
416.gamess	-1.11	-0.30
433.milc	0.47	-1.75
434.zeusmp	-3.61	0.68
435.gromacs	-0.54	-0.38
436.cactusADM	-23.56	21.49
437.leslie3d	-0.44	1.56
444.namd	0.00	0.00
447.dealII	-0.36	-0.23
450.soplex	-0.43	-0.29
453.povray	0.50	3.63
454.calculix	-8.29	1.38
459.GemsFDTD	2.37	-1.54
465.tonto	0.00	0.00
470.lbm		0.00	0.21
481.wrf		-4.80	0.00
482.sphinx3	-10.20	-3.65
SpecINT		-3.29	1.01

400.perlbench	0.93	1.47
401.bzip2	0.60	0.00
403.gcc		0.00	0.00
429.mcf		0.00	-0.36
445.gobmk	-1.03	0.37
456.hmmer	-0.64	0.38
458.sjeng	1.74	0.00
462.libquantum	0.31	0.00
464.h264ref	0.00	0.00
471.omnetpp	-1.27	0.00
473.astar	0.00	0.46
483.xalancbmk	0.51	0.00
SpecFP	      	0.09	0.19

As per the data, the 1% performance gain for Intel Sandybridge on SpecFP is
eclipsed by a 3% degradation for AMD Bulldozer.

For the data above, generic mode splits both 256-bit misaligned loads and
stores, as is currently the case in trunk. 

Even if we disable 256-bit misaliged load splitting, AVX 256-bit performance
improves only by ~1.4% on SpecFP for AMD Bulldozer. On the other hand, AVX
256-bit performance drops by 0.12% on Intel Sandybridge. In this case with
AVX 256 load splitting disabled, a cumulative 0.9% performance gain for Intel
Sandybridge is reflected versus a 1.9% loss for AMD Bulldozer comparing AVX 256
to AVX 128 and hence AVX 256 is still not a fair choice for generic mode.

Please provide thoughts. It would be great if HJ can verify Intel Sandybridge
data.

Thanks,
Harsha




More information about the Gcc-patches mailing list