Core 2 and Core i7 tuning
H.J. Lu
hjl.tools@gmail.com
Mon Aug 23 15:31:00 GMT 2010
On Fri, Aug 20, 2010 at 1:07 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> Here's something I've been working on for a while. This adds a corei7
> processor type, a Core 2/Core i7 scheduling description, and twiddles a
> few of the x86 tuning flags. I'm not terribly happy with it yet due to
> the relatively small performance improvement, but I'd promised some
> folks I'd post it this week, so...
>
> The scheduling description is heavily based on ppro.md. There seems to
> be no publicly available, detailed information from Intel about the Core
> 2 pipeline, so this work is based on Agner Fog's manuals. It should be
> correct in the essentials, at least as well as ppro.md (we aren't really
> able to do a good job with the execution ports since we have no concept
> of the out-of-order core). I have not tried to implement latencies or
> port reservations for every last MMX or SSE instruction, since who knows
> whether the information is totally accurate anyway.
>
> The i386 port has a lot of tuning flags, and I've mostly been running
> SPEC2000 benchmarks for the last few weeks, trying to find a set of them
> that works well on these processors. This is slightly tricky since
> there's some inherent noise in the results.
>
> Not using the LEAVE instruction seemed to make a difference on my Penryn
> laptop in 64 bit mode, but that's probably moot now that
> -fomit-frame-pointer is the default. I've changed a few others, but
> mostly these attempts resulted in lower or unchanged performance, for
> example:
>
> * using push/pop insns more often (there are about six of these tuning
> flags). I would have expected this to be a win.
> * reusing the PentiumPro code in ix86_adjust_cost for Core 2 and i7
> * upping the branch cost to 5; initial results looked good for Core i7
> but in a full SPEC2000 run it seemed to be a slight loss, and a large
> loss on Core 2
> * using different string algorithms (from tune_generic)
> * enabling SPLIT_LONG_MOVES
> * enabling the flags related to partial reg stalls
> * reducing code alignments (based on a comment in Agner's manual that
> they aren't important anymore)
>
> I've implemented a new tuning flag, X86_TUNE_PROMOTE_HI_CONSTANTS, based
> on the recommendation in Agner's manual not to use operand size prefixes
> when they change the length of the instruction (i.e. if there's an
> immediate operand). That happens in the second of the following four
> instructions, and is said to cause a decoder stall:
>
> $ as
> orl $32768,%eax
> orw $32768,%ax
> orl $8,%eax
> orw $8,%ax
>
> 0: 0d 00 80 00 00 or $0x8000,%eax
> 5: 66 0d 00 80 or $0x8000,%ax
> 9: 83 c8 08 or $0x8,%eax
> c: 66 83 c8 08 or $0x8,%ax
>
> This didn't seem to have a large impact either however.
>
> On my last test run, I had
> SPECfp2000:
> -mtune=generic 3023
> -mtune=core2 3036
> SPECint2000:
> -mtune=generic 2774
> -mtune=core2 2794
>
> This is a Westmere Xeon, i.e. essentially a Core i7, in 32 bit mode.
> SPEC was locked to core 0 with schedtool, core 0 set to 3.2GHz manually
> with cpufreq-set (1 step below maximum, which seems to avoid turbo mode
> effectively).
> Compile flags were -O3 -mpc64 -frename-registers. The tree is a few
> weeks old so it doesn't have -fomit-frame-pointer by default. I also
> had -mtune=corei7 numbers, but they were a little lower since I was
> using that run for an experiment with higher branch costs.
>
> These numbers pretty much match the differences I was seeing on the Core
> 2 laptop during development. I'd welcome if other people would also run
> benchmarks.
>
Here are my results on Core 2 and Core i7 running Fedora 13. There are
many regressions and a few improvements.
--
H.J.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcc-r163419-core2-corei7.xlsx
Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 16244 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20100823/8ff49e3c/attachment.xlsx>
More information about the Gcc-patches
mailing list