Core 2 and Core i7 tuning

H.J. Lu hjl.tools@gmail.com
Mon Aug 23 15:31:00 GMT 2010


On Fri, Aug 20, 2010 at 1:07 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> Here's something I've been working on for a while.  This adds a corei7
> processor type, a Core 2/Core i7 scheduling description, and twiddles a
> few of the x86 tuning flags.  I'm not terribly happy with it yet due to
> the relatively small performance improvement, but I'd promised some
> folks I'd post it this week, so...
>
> The scheduling description is heavily based on ppro.md.  There seems to
> be no publicly available, detailed information from Intel about the Core
> 2 pipeline, so this work is based on Agner Fog's manuals.  It should be
> correct in the essentials, at least as well as ppro.md (we aren't really
> able to do a good job with the execution ports since we have no concept
> of the out-of-order core).  I have not tried to implement latencies or
> port reservations for every last MMX or SSE instruction, since who knows
> whether the information is totally accurate anyway.
>
> The i386 port has a lot of tuning flags, and I've mostly been running
> SPEC2000 benchmarks for the last few weeks, trying to find a set of them
> that works well on these processors.  This is slightly tricky since
> there's some inherent noise in the results.
>
> Not using the LEAVE instruction seemed to make a difference on my Penryn
> laptop in 64 bit mode, but that's probably moot now that
> -fomit-frame-pointer is the default.  I've changed a few others, but
> mostly these attempts resulted in lower or unchanged performance, for
> example:
>
>  * using push/pop insns more often (there are about six of these tuning
>   flags).  I would have expected this to be a win.
>  * reusing the PentiumPro code in ix86_adjust_cost for Core 2 and i7
>  * upping the branch cost to 5; initial results looked good for Core i7
>   but in a full SPEC2000 run it seemed to be a slight loss, and a large
>   loss on Core 2
>  * using different string algorithms (from tune_generic)
>  * enabling SPLIT_LONG_MOVES
>  * enabling the flags related to partial reg stalls
>  * reducing code alignments (based on a comment in Agner's manual that
>   they aren't important anymore)
>
> I've implemented a new tuning flag, X86_TUNE_PROMOTE_HI_CONSTANTS, based
> on the recommendation in Agner's manual not to use operand size prefixes
> when they change the length of the instruction (i.e. if there's an
> immediate operand).  That happens in the second of the following four
> instructions, and is said to cause a decoder stall:
>
> $ as
> orl $32768,%eax
> orw $32768,%ax
> orl $8,%eax
> orw $8,%ax
>
>   0:   0d 00 80 00 00          or     $0x8000,%eax
>   5:   66 0d 00 80             or     $0x8000,%ax
>   9:   83 c8 08                or     $0x8,%eax
>   c:   66 83 c8 08             or     $0x8,%ax
>
> This didn't seem to have a large impact either however.
>
> On my last test run, I had
> SPECfp2000:
>  -mtune=generic  3023
>  -mtune=core2    3036
> SPECint2000:
>  -mtune=generic  2774
>  -mtune=core2    2794
>
> This is a Westmere Xeon, i.e. essentially a Core i7, in 32 bit mode.
> SPEC was locked to core 0 with schedtool, core 0 set to 3.2GHz manually
> with cpufreq-set (1 step below maximum, which seems to avoid turbo mode
> effectively).
> Compile flags were -O3 -mpc64 -frename-registers.  The tree is a few
> weeks old so it doesn't have -fomit-frame-pointer by default.  I also
> had -mtune=corei7 numbers, but they were a little lower since I was
> using that run for an experiment with higher branch costs.
>
> These numbers pretty much match the differences I was seeing on the Core
> 2 laptop during development.  I'd welcome if other people would also run
> benchmarks.
>

Here are my results on Core 2 and Core i7 running Fedora 13. There are
many regressions and a few improvements.

-- 
H.J.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcc-r163419-core2-corei7.xlsx
Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 16244 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20100823/8ff49e3c/attachment.xlsx>


More information about the Gcc-patches mailing list