This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Core 2/i7 tuning results and analysis
- From: Maxim Kuvyrkov <maxim at codesourcery dot com>
- To: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Cc: "H.J. Lu" <hongjiu dot lu at intel dot com>, Bernd Schmidt <bernds at codesourcery dot com>
- Date: Fri, 15 Oct 2010 14:08:57 +0400
- Subject: Core 2/i7 tuning results and analysis
[Resending without printed out version of the spreadsheet to fit into
gcc-patches@ size requirements.]
I've been investigating performance regressions for Core 2 and Core i7
processors. The impact of certain small tuning changes on x86
performance maybe interesting to a wider audience, so here is my results
and analysis.
Attached is a tar of the patch set I tested. Most of these patches are
dissections from earlier Bernd's work for Core 2/i7.
+ 0001-Basic-support-for-Core-i7.patch
+ 0002-Enable-Core-i7-architectural-features.patch
? 0003-Extend-Core-2-tune-features-to-Core-i7.patch
? 0004-Tweak-tuning-for-Core-i7.patch
+ 0005-Add-PROMOTE_HI_CONSTANTS-tuning.patch
+ 0006-Define-Core-i7-costs.patch
+ 0007-Use-64-bit-alignment-for-Core-i7-32-bit-mode.patch
+ 0008-Configure-bits-for-Core-i7.patch
+ 0009-Core-i7-DFA-model.patch
? 0010-Define-issue_rate-for-Core-i7.patch
+ 0011-Model-Core-i7-pipeline-domains.patch
- 0012-Update-Core-2-tuning.patch
+ 0013-Use-Core-2-DFA-model-for-Core-2.patch
+ 0014-Update-PentiumPro-tuning.patch
- 0015-Handle-privileged-insns.patch
+ 0016-Model-Core2-i7-decoder-bottleneck.patch
Some of these patches (marked with '+') tend to improve average
performance, while others (marked with '-') tend to regress it. We will
be posting the '+' patches for review once I get benchmark numbers
without the regressing patches.
Attached is an Excel spreadsheet with results for SPECCPU2000. The
interesting part is the graphs visualizing performance impact of each of
the patches. The "line" graph shows performance change in percent
relative to *baseline*, i.e., current -mtune=core2 for Core2 and
-mtune=generic[64] for Corei7. The "column" graph shows performance
change in percent relative to *previous* patch. I find the "column"
graph more interesting as it shows impact of individual changes on
performance. SPECint and SPECfp results are highlighted with,
respectively, purple and red on the column graph.
Tuning flags: -O2 -ffast-math -msse2 -mfpmath=sse -mtune={core2, corei7}
{-m32/-m64}
Patches that are no-ops from performance point of view for a particular
CPU are not included in the data. I did confirm that these patches
indeed do not affect performance in one of the test runs.
Now, analysis of the patches:
+ 0001-Basic-support-for-Core-i7.patch
Baseline.
The patch makes GCC recognize "corei7" for -mtune= and -march= options.
The patch sets tuning for Core i7 to that of -mtune=generic or
-mtune=generic64 depending on the {-m32/-m64} option. The generic CPU
is special in the sense that has different tuning for 32-bit and 64-bit
modes. The patch adds same capability to use different tuning for
different ABI for Core i7.
+ 0002-Enable-Core-i7-architectural-features.patch
Nearly noise from performance point of view.
Enable supported ISA extensions for Core i7.
? 0003-Extend-Core-2-tune-features-to-Core-i7.patch
Improves SPECfp a 32-bit mode, but degrades SPECint for 64-bit mode.
Set tuning for Core i7 to be the same as for Core 2.
? 0004-Tweak-tuning-for-Core-i7.patch
Regresses SPECint and SPECfp in 32-bit mode, but improves SPECint for
64-bit mode.
Adjust tuning for Core i7.
+ 0005-Add-PROMOTE_HI_CONSTANTS-tuning.patch
Improves SPECint. Add new tuning option to promote HI constants.
+ 0006-Define-Core-i7-costs.patch
Slightly regresses SPECint, but improves SPECfp. Define rtx costs for
Core i7.
The biggest regression is 164.gzip. We don't know why.
+ 0007-Use-64-bit-alignment-for-Core-i7-32-bit-mode.patch
Significantly improves Core i7 performance in 32-bit mode. Increase
alignment for 32-bit mode for Core i7 to match 64-bit mode.
+ 0008-Configure-bits-for-Core-i7.patch
Performance no-op. Add support for configure options --with-arch=,
etc., for Core i7.
+ 0009-Core-i7-DFA-model.patch
Improves SPECfp. DFA model for Core i7.
? 0010-Define-issue_rate-for-Core-i7.patch
Improves SPECint, regresses SPECfp. Increase issue_rate to 4 for Core i7.
This one-line change makes 200.sixtrack regress from +1.75% to -2.0% for
Core i7 32-bit mode. I spent a lot of time investigating and trying to
fix this regression, but didn't succeed. The slowdown can be tracked
down to a hot loop that fits on a screen, but the slowdown seems to be
evenly distributed all over the loop. The loop does floating-point
computations with around 6 variables and streams data from memory.
Instruction within the loop are all the same before and after the patch,
the only difference is in their order.
First I thought that the loop hits the decoder bottleneck, i.e.,
instructions that can be decoded only by D0 decoder get assigned to
secondary decoders. I implemented modeling of Core2/i7 decoder to make
scheduler aware of that (Model-Core2-i7-decoder-bottleneck.patch). That
didn't fix the regression, so now I'm suspecting that the register ports
may be responsible for the slowdown. I don't have a proof though.
May be it is worth trying setting issue rate to 3 for Core2/i7?
+ 0011-Model-Core-i7-pipeline-domains.patch
Improves SPECfp. Adjust scheduling costs for instructions that cross
Core i7 pipeline domains, i.e., an instruction generates uops for both
integer and floating-point domains that need to pass data between each
other.
- 0012-Update-Core-2-tuning.patch
No definitive result for 32-bit mode; SPECfp regresses in 64-bit mode.
Adjust tuning for Core 2.
+ 0013-Use-Core-2-DFA-model-for-Core-2.patch
Improves SPECfp for 64-bit mode; improves and regresses SPECint and
SPECfp in equal proportion for 32-bit mode. Switch DFA model for Core 2.
187.facerec regresses by 7% on 32-bit Core2 with this change.
+ 0014-Update-PentiumPro-tuning.patch
No data, but should be an improvement. Enable PROMOTE_HI_CONSTANTS
tuning for PentiumPro and, hence, -mtune=generic.
- 0015-Handle-privileged-insns.patch
Improves some tests, but regresses others, no conclusive result.
Attempt to make scheduler smarter about which instructions to
prioritize. The theory was that the scheduler should not distinguish
between the *first* instruction in the ready list and subsequent
instructions that are essentially the same as the first.
[Rank_for_schedule() is used to sort the ready list and it has several
tie-breaking checks to make the sort stable. From
choose_ready/max_issue perspective these tie-breaking checks decrease
optimization space for now good reason. Apparently, the theory does not
agree with experiment in this case.]
+ 0016-Model-Core2-i7-decoder-bottleneck.patch
Improves SPECint, though it was designed to fix regression in SPECfp's
200.sixtrack. The patch makes the scheduler aware of decoder
restrictions on Core 2/i7. New hooks to multipass scheduling allow the
backend to filter the search space from instructions that are no longer
able to be issued on current cycle, e.g., because they would not fit
into the rest of IFETCH block or could not be decoded by secondary decoders.
Strictly speaking, this is theoretically possible to model in DFA, but
it would require immensely more work and would not be nearly as
comprehensible as using target hooks.
Your comments [and patches fixing the regressions :)] are welcome.
Thank you,
--
Maxim Kuvyrkov
CodeSourcery
maxim@codesourcery.com
(650) 331-3385 x724
Attachment:
core2i7-patches.xlsx
Description: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Attachment:
0001-Basic-support-for-Core-i7.patch
Description: Text document
Attachment:
0002-Enable-Core-i7-architectural-features.patch
Description: Text document
Attachment:
0003-Extend-Core-2-tune-features-to-Core-i7.patch
Description: Text document
Attachment:
0004-Tweak-tuning-for-Core-i7.patch
Description: Text document
Attachment:
0005-Add-PROMOTE_HI_CONSTANTS-tuning.patch
Description: Text document
Attachment:
0006-Define-Core-i7-costs.patch
Description: Text document
Attachment:
0007-Use-64-bit-alignment-for-Core-i7-32-bit-mode.patch
Description: Text document
Attachment:
0008-Configure-bits-for-Core-i7.patch
Description: Text document
Attachment:
0009-Core-i7-DFA-model.patch
Description: Text document
Attachment:
0010-Define-issue_rate-for-Core-i7.patch
Description: Text document
Attachment:
0011-Model-Core-i7-pipeline-domains.patch
Description: Text document
Attachment:
0012-Update-Core-2-tuning.patch
Description: Text document
Attachment:
0013-Use-Core-2-DFA-model-for-Core-2.patch
Description: Text document
Attachment:
0014-Update-PentiumPro-tuning.patch
Description: Text document
Attachment:
0015-Handle-privileged-insns.patch
Description: Text document
Attachment:
0016-Model-Core2-i7-decoder-bottleneck.patch
Description: Text document