This is the mail archive of the
mailing list for the GCC project.
Re: Core 2 and Core i7 tuning
- From: Bernd Schmidt <bernds at codesourcery dot com>
- To: Andi Kleen <andi at firstfloor dot org>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, "H.J. Lu" <hjl dot tools at gmail dot com>, Maxim Kuvyrkov <maxim at codesourcery dot com>, Paul Brook <paul at codesourcery dot com>
- Date: Mon, 23 Aug 2010 15:33:27 +0200
- Subject: Re: Core 2 and Core i7 tuning
- References: <4C6EE072.firstname.lastname@example.org> <email@example.com>
On 08/23/2010 03:17 PM, Andi Kleen wrote:
> First I'm surprised that you wrote that the pipeline description
> in the optimization manual wasn't good enough. Did you use
> 2.1 in http://www.intel.com/assets/pdf/manual/248966.pdf
> as a reference?
Not sure it's the same one, but I have an Intel optimization manual
which only seems to have general information about which instructions go
to which ports; the Agner Fog document has tables which at least try to
provide full information. In the end, it may not be relevant since I
doubt there's much to be gained from trying to get this 100% accurate.
> As a general comment Core i7 is not a good name to use here because
> it's a marketing name used for different micro architectures
> (already the case). I made this mistake in another project
> and still suffering from it :-)
Most of these points also apply to Core 2, which has two different
variants and a couple of Xeons with the same basic core.
> Comparing costs with my own model:
The i7 table is just copied from the Core 2 table for the moment. I've
only adjusted the L2 cache size.
>> + 2, /* cost of moving SSE register
> Too high?
Likely. I changed that in the pipeline description IIRC but this
probably needs changing as well.
> 1 now. Inter unit moves got a lot cheaper.
As far as I know there are still stalls?
>> + 32, /* size of l1 cache. */
>> + 256, /* size of l2 cache. */
> I used the L3 here. Makes more sense?
>> + 3, /* Branch cost */
>> + COSTS_N_INSNS (3), /* cost of FADD and FSUB insns. */
>> + COSTS_N_INSNS (5), /* cost of FMUL instruction. */
>> + COSTS_N_INSNS (32), /* cost of FDIV instruction. */
>> + COSTS_N_INSNS (1), /* cost of FABS instruction. */
>> + COSTS_N_INSNS (1), /* cost of FCHS instruction. */
>> + COSTS_N_INSNS (58), /* cost of FSQRT
>> instruction. */
> I suspect some of these costs are also outdated, but needs measurements.
FADD and FMUL are correct, I think, but Maxim pointed me at an earlier
patch from Vlad which got better results by changing them.
>> /* X86_TUNE_PAD_RETURNS */
>> - m_AMD_MULTIPLE | m_CORE2 | m_GENERIC,
>> + m_AMD_MULTIPLE | m_GENERIC,
> Not sure why?
Everything I looked at seemed to say this is an AMD-only thing.