This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] tuning gcc for Intel Core2


On Tue, Nov 14, 2006 at 09:05:07AM +0100, Jan Hubicka wrote:
> > On Mon, Nov 13, 2006 at 12:01:17PM -0500, Vladimir Makarov wrote:
> > >  Here is the patch for tuning gcc for Intel Core2 processor.  I did
> > > about 30 SPEC2000 runs to find good parameters which are practically
> > > the same what Intel gave and recommended in their optimization guide
> > > made public a few days ago.
> > > 
> > >  The patch increases SPECINT2000 score to 1963 from 1925 (for
> > > generic) or 1901 (for nocona).  SPECFP2000 sore is the same as for
> > > generic 1875 (nocona has 1856).  One benchmark (gcc) did particular
> > > well -- about 20% improvement (1788 for generic tuning vs 2210 for
> > > core2).  The size of code generated for Core2 is smaller (0.46% for
> > > SPECInt and 0.54% SPECFp) than one for generic.
> > > 
> > 
> > This patch is the first step for Core 2 optimization. But I am afraid
> > that it isn't very useful. I compared -mtune=generic and -mtune=core2
> > on Core 2. Tha main difference is in 176.gcc. What this patch does is
> > to turn on
> > 
> > x86_rep_movl_optimal
> > 
> > for Core 2, which will avoid external calls to memset. There
> > are known serious performance problems with x86-64 memory functions,
> > especially on Core 2. We are working on improving x86-64 memory
> > functions. A better external memset can improve gcc in SPEC CPU 2K by
> > more than 20%.
> BTW I still have patch for memcpy/memset generation that allows you to
> chose in between basic algorithm (rep/movq,rep/movl,loop, unrolled loop,
> library call) based on -mtune switch and expected size of copied block.
> It also has simple benchmark utility that allows you to set proper
> limits.  I am happy to see -mtune=core2 on place as that patch contained
> also basic -mtune=core2 switch and I was basically holding it because I
> didn't had time to play with the ohter arguments curefuly enough and
> because the profile driven memcpy/memset is infrastructure is not at
> place yet.
> 
> I will be sending it shortly (I have non-GCC deadline to meet at 23rd,
> so probably after that)
> 
> What is the particular problem in x86-64 library string functions making
> core2 unhappy about them? 20% sounds quite serious and I don't remember
> anything particularly crazy about the implementation.
> 

x86-64 memory functions were written for the first generation of
Opteron more than 4 years ago. For example, in memset.S, there are

/* This is somehow experimental and could made dependend on the cache
   size.  */
#define LARGE $120000

120000 is much smaller than the cache size of Nocona and Core 2. Better
memset/memcpy can help 176.gcc alot.

Here is what I got with gcc 4.3.  Taking out gcc, -mtune=core2 doesn't
help much.



H.J.
------
		-mtune=generic vs. -mtune=core2
164.gzip 		 -0.723589%
175.vpr 		 0.700935%
176.gcc 		 14.9149%
181.mcf 		 -0.46608%
186.crafty 		 -1%
197.parser 		 0%
252.eon 		 -1.11155%
253.perlbmk 		 1.26084%
254.gap 		 0.0899685%
255.vortex 		 -1.7014%
256.bzip2 		 -0.301811%
300.twolf 		 0.707269%
Est. SPECint_base2000 	 0.934579%

168.wupwise 		 -1.25054%
171.swim 		 -0.770925%
172.mgrid 		 0.233236%
173.applu 		 -1.76429%
177.mesa 		 0.119%
178.galgel 		 0.726676%
179.art 		 -1.17735%
183.equake 		 0.211327%
187.facerec 		 -0.22805%
188.ammp 		 1.04167%
189.lucas 		 -2.85331%
191.fma3d 		 -0.365344%
200.sixtrack 		 -1.76991%
301.apsi 		 -1.48924%
Est. SPECfp_base2000 	 -0.683177%


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]