This is the mail archive of the
mailing list for the GCC project.
Re: [patch] tuning gcc for Intel Core2
On Tue, Nov 14, 2006 at 09:05:07AM +0100, Jan Hubicka wrote:
> > On Mon, Nov 13, 2006 at 12:01:17PM -0500, Vladimir Makarov wrote:
> > > Here is the patch for tuning gcc for Intel Core2 processor. I did
> > > about 30 SPEC2000 runs to find good parameters which are practically
> > > the same what Intel gave and recommended in their optimization guide
> > > made public a few days ago.
> > >
> > > The patch increases SPECINT2000 score to 1963 from 1925 (for
> > > generic) or 1901 (for nocona). SPECFP2000 sore is the same as for
> > > generic 1875 (nocona has 1856). One benchmark (gcc) did particular
> > > well -- about 20% improvement (1788 for generic tuning vs 2210 for
> > > core2). The size of code generated for Core2 is smaller (0.46% for
> > > SPECInt and 0.54% SPECFp) than one for generic.
> > >
> > This patch is the first step for Core 2 optimization. But I am afraid
> > that it isn't very useful. I compared -mtune=generic and -mtune=core2
> > on Core 2. Tha main difference is in 176.gcc. What this patch does is
> > to turn on
> > x86_rep_movl_optimal
> > for Core 2, which will avoid external calls to memset. There
> > are known serious performance problems with x86-64 memory functions,
> > especially on Core 2. We are working on improving x86-64 memory
> > functions. A better external memset can improve gcc in SPEC CPU 2K by
> > more than 20%.
> BTW I still have patch for memcpy/memset generation that allows you to
> chose in between basic algorithm (rep/movq,rep/movl,loop, unrolled loop,
> library call) based on -mtune switch and expected size of copied block.
> It also has simple benchmark utility that allows you to set proper
> limits. I am happy to see -mtune=core2 on place as that patch contained
> also basic -mtune=core2 switch and I was basically holding it because I
> didn't had time to play with the ohter arguments curefuly enough and
> because the profile driven memcpy/memset is infrastructure is not at
> place yet.
> I will be sending it shortly (I have non-GCC deadline to meet at 23rd,
> so probably after that)
> What is the particular problem in x86-64 library string functions making
> core2 unhappy about them? 20% sounds quite serious and I don't remember
> anything particularly crazy about the implementation.
x86-64 memory functions were written for the first generation of
Opteron more than 4 years ago. For example, in memset.S, there are
/* This is somehow experimental and could made dependend on the cache
#define LARGE $120000
120000 is much smaller than the cache size of Nocona and Core 2. Better
memset/memcpy can help 176.gcc alot.
Here is what I got with gcc 4.3. Taking out gcc, -mtune=core2 doesn't
-mtune=generic vs. -mtune=core2
Est. SPECint_base2000 0.934579%
Est. SPECfp_base2000 -0.683177%