This is the mail archive of the
mailing list for the GCC project.
Re: Core 2 and Core i7 tuning
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Bernd Schmidt <bernds at codesourcery dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Maxim Kuvyrkov <maxim at codesourcery dot com>, Paul Brook <paul at codesourcery dot com>
- Date: Fri, 20 Aug 2010 13:17:14 -0700
- Subject: Re: Core 2 and Core i7 tuning
- References: <4C6EE072.firstname.lastname@example.org>
On Fri, Aug 20, 2010 at 1:07 PM, Bernd Schmidt <email@example.com> wrote:
> Here's something I've been working on for a while. ?This adds a corei7
> processor type, a Core 2/Core i7 scheduling description, and twiddles a
> few of the x86 tuning flags. ?I'm not terribly happy with it yet due to
> the relatively small performance improvement, but I'd promised some
> folks I'd post it this week, so...
> The scheduling description is heavily based on ppro.md. ?There seems to
> be no publicly available, detailed information from Intel about the Core
> 2 pipeline, so this work is based on Agner Fog's manuals. ?It should be
> correct in the essentials, at least as well as ppro.md (we aren't really
> able to do a good job with the execution ports since we have no concept
> of the out-of-order core). ?I have not tried to implement latencies or
> port reservations for every last MMX or SSE instruction, since who knows
> whether the information is totally accurate anyway.
> The i386 port has a lot of tuning flags, and I've mostly been running
> SPEC2000 benchmarks for the last few weeks, trying to find a set of them
> that works well on these processors. ?This is slightly tricky since
> there's some inherent noise in the results.
> Not using the LEAVE instruction seemed to make a difference on my Penryn
> laptop in 64 bit mode, but that's probably moot now that
> -fomit-frame-pointer is the default. ?I've changed a few others, but
> mostly these attempts resulted in lower or unchanged performance, for
> ?* using push/pop insns more often (there are about six of these tuning
> ? flags). ?I would have expected this to be a win.
> ?* reusing the PentiumPro code in ix86_adjust_cost for Core 2 and i7
> ?* upping the branch cost to 5; initial results looked good for Core i7
> ? but in a full SPEC2000 run it seemed to be a slight loss, and a large
> ? loss on Core 2
> ?* using different string algorithms (from tune_generic)
> ?* enabling SPLIT_LONG_MOVES
> ?* enabling the flags related to partial reg stalls
> ?* reducing code alignments (based on a comment in Agner's manual that
> ? they aren't important anymore)
> I've implemented a new tuning flag, X86_TUNE_PROMOTE_HI_CONSTANTS, based
> on the recommendation in Agner's manual not to use operand size prefixes
> when they change the length of the instruction (i.e. if there's an
> immediate operand). ?That happens in the second of the following four
> instructions, and is said to cause a decoder stall:
> $ as
> orl $32768,%eax
> orw $32768,%ax
> orl $8,%eax
> orw $8,%ax
> ? 0: ? 0d 00 80 00 00 ? ? ? ? ?or ? ? $0x8000,%eax
> ? 5: ? 66 0d 00 80 ? ? ? ? ? ? or ? ? $0x8000,%ax
> ? 9: ? 83 c8 08 ? ? ? ? ? ? ? ?or ? ? $0x8,%eax
> ? c: ? 66 83 c8 08 ? ? ? ? ? ? or ? ? $0x8,%ax
> This didn't seem to have a large impact either however.
> On my last test run, I had
> ?-mtune=generic ?3023
> ?-mtune=core2 ? ?3036
> ?-mtune=generic ?2774
> ?-mtune=core2 ? ?2794
> This is a Westmere Xeon, i.e. essentially a Core i7, in 32 bit mode.
> SPEC was locked to core 0 with schedtool, core 0 set to 3.2GHz manually
> with cpufreq-set (1 step below maximum, which seems to avoid turbo mode
> Compile flags were -O3 -mpc64 -frename-registers. ?The tree is a few
> weeks old so it doesn't have -fomit-frame-pointer by default. ?I also
> had -mtune=corei7 numbers, but they were a little lower since I was
> using that run for an experiment with higher branch costs.
> These numbers pretty much match the differences I was seeing on the Core
> 2 laptop during development. ?I'd welcome if other people would also run
> Comments? ?Is this OK?
I will run SPEC CPU 2K/2006. It will take a while.