This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFA: patch - tuning gcc for Intel Nocona (64 bit).


> It is in 1st chapter of Intel's optimization manual.  It is 4 for int 
> and 12 for fp (up from 2 and 9 for northwood).  It is definetly designed 

I see, I've missed it.  Thanks.
Ineed, the numbers are quite a bit larger than what I've guessed.

> for higher frequency (intel's marketing drum. I wish them to solve heat 
> disipation problem for prescott/nocona which is much less problem for 
> AMD.  The goverments should give a rebate for processor with less heat 
> disipation as they do it for appliences).
> 
> >Why do you expect larger move ratios to give better results?
> > 
> >
> I meant MOVE_RATIO.  I've the same the same from K8/Athlon.  The bigger 
> blocks, the better behaviour for OOO processor (but we should remeber 
> about trace cache size).  There are a lot of parameters to play with it. 

I see, increasing move ratio makes sense.

> 
> I thought about this.  Earlier nocona meant only usage of SSE3.  I 
> checked the difference of -mtune=nocona vs -mtune=nocona -mno-sse3. 
> Only two test codes are different (perl and eon).  All SPECFP test 
> codes are the same.  So I beleive the results should be the same (at 
> least for SPECFP).

This is weird.  Did you used Redhat internal version by any chance?
Definitly -mtune=nocona even in previous incarnation made CPU tunning to
switch from K8 to pentium4.
This affects for instance way how constants are load from memory.
-march=k8 results in:
        movlpd  .LC1, %xmm0
while -march=nocona in:
        movsd   .LC1, %xmm0
so you should see difference in any program loading double precision
memory constant.  This happent even with the first version of patch that
added -march=nocona support.
> 
> >I dug out the results relative to K8 and they are consistent with yours
> >(tought I have only C part of SPECfp that is not that interesting).
> >
> >Main difference comes from K8 optimized SSE reg-reg and loads code
> >generation that prevents almost any OOO reordering in Pentium4 based
> >cores.  
> >If we got into busyness of generating code that works well on both
> >chips, I guess we can use just the natural move instructions (movsd for
> >loads/reg-reg moves) that has just moderate penalty on K8.
> >
> At my first look at whetstone, the biggest differenece is in usage of 
> movsd for nocona.  Second one is absence inc/dec.  The third one I found 
> is less multiplies.

Yes, this is consistent with my experience.  SSE is extremly picky on
both CPUs :(

Honza
> 
> > 
> >
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]