This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
AVX generic mode tuning discussion.
- From: <harsha dot jagasia at amd dot com>
- To: <gcc-patches at gcc dot gnu dot org>, <hubicka at ucw dot cz>, <ubizjak at gmail dot com>, <hjl dot tools at gmail dot com>, <Changpeng dot Fang at amd dot com>, <rth at redhat dot com>
- Cc: <harsha dot jagasia at amd dot com>
- Date: Tue, 12 Jul 2011 16:22:02 -0500
- Subject: AVX generic mode tuning discussion.
We would like to propose changing AVX generic mode tuning to generate 128-bit
AVX instead of 256-bit AVX. As per H.J's suggestion, we have reviewed the
various tuning choices made for generic mode with respect to AMD's upcoming
Bulldozer processor. At this moment, this is the most significant change we
have to propose. While we are willing to re-engineer generic mode, this
feature needs immediate discussion since the performance impact on Bulldozer
is significant.
Here is the relative CPU2006 performance data we have gathered using gcc on AMD
Bulldozer (BD) and Intel Sandybridge (SB) machines with "-Ofast -mtune=generic
-mavx".
%gain/loss avx256 vs avx128
(negative % indicates loss
positive % indicates gain)
AMD BD Intel SB
410.bwaves -2.34 -1.52
416.gamess -1.11 -0.30
433.milc 0.47 -1.75
434.zeusmp -3.61 0.68
435.gromacs -0.54 -0.38
436.cactusADM -23.56 21.49
437.leslie3d -0.44 1.56
444.namd 0.00 0.00
447.dealII -0.36 -0.23
450.soplex -0.43 -0.29
453.povray 0.50 3.63
454.calculix -8.29 1.38
459.GemsFDTD 2.37 -1.54
465.tonto 0.00 0.00
470.lbm 0.00 0.21
481.wrf -4.80 0.00
482.sphinx3 -10.20 -3.65
SpecINT -3.29 1.01
400.perlbench 0.93 1.47
401.bzip2 0.60 0.00
403.gcc 0.00 0.00
429.mcf 0.00 -0.36
445.gobmk -1.03 0.37
456.hmmer -0.64 0.38
458.sjeng 1.74 0.00
462.libquantum 0.31 0.00
464.h264ref 0.00 0.00
471.omnetpp -1.27 0.00
473.astar 0.00 0.46
483.xalancbmk 0.51 0.00
SpecFP 0.09 0.19
As per the data, the 1% performance gain for Intel Sandybridge on SpecFP is
eclipsed by a 3% degradation for AMD Bulldozer.
For the data above, generic mode splits both 256-bit misaligned loads and
stores, as is currently the case in trunk.
Even if we disable 256-bit misaliged load splitting, AVX 256-bit performance
improves only by ~1.4% on SpecFP for AMD Bulldozer. On the other hand, AVX
256-bit performance drops by 0.12% on Intel Sandybridge. In this case with
AVX 256 load splitting disabled, a cumulative 0.9% performance gain for Intel
Sandybridge is reflected versus a 1.9% loss for AMD Bulldozer comparing AVX 256
to AVX 128 and hence AVX 256 is still not a fair choice for generic mode.
Please provide thoughts. It would be great if HJ can verify Intel Sandybridge
data.
Thanks,
Harsha