This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: RFA: patch - tuning gcc for Intel Nocona (64 bit).
> It is in 1st chapter of Intel's optimization manual. It is 4 for int
> and 12 for fp (up from 2 and 9 for northwood). It is definetly designed
I see, I've missed it. Thanks.
Ineed, the numbers are quite a bit larger than what I've guessed.
> for higher frequency (intel's marketing drum. I wish them to solve heat
> disipation problem for prescott/nocona which is much less problem for
> AMD. The goverments should give a rebate for processor with less heat
> disipation as they do it for appliences).
>
> >Why do you expect larger move ratios to give better results?
> >
> >
> I meant MOVE_RATIO. I've the same the same from K8/Athlon. The bigger
> blocks, the better behaviour for OOO processor (but we should remeber
> about trace cache size). There are a lot of parameters to play with it.
I see, increasing move ratio makes sense.
>
> I thought about this. Earlier nocona meant only usage of SSE3. I
> checked the difference of -mtune=nocona vs -mtune=nocona -mno-sse3.
> Only two test codes are different (perl and eon). All SPECFP test
> codes are the same. So I beleive the results should be the same (at
> least for SPECFP).
This is weird. Did you used Redhat internal version by any chance?
Definitly -mtune=nocona even in previous incarnation made CPU tunning to
switch from K8 to pentium4.
This affects for instance way how constants are load from memory.
-march=k8 results in:
movlpd .LC1, %xmm0
while -march=nocona in:
movsd .LC1, %xmm0
so you should see difference in any program loading double precision
memory constant. This happent even with the first version of patch that
added -march=nocona support.
>
> >I dug out the results relative to K8 and they are consistent with yours
> >(tought I have only C part of SPECfp that is not that interesting).
> >
> >Main difference comes from K8 optimized SSE reg-reg and loads code
> >generation that prevents almost any OOO reordering in Pentium4 based
> >cores.
> >If we got into busyness of generating code that works well on both
> >chips, I guess we can use just the natural move instructions (movsd for
> >loads/reg-reg moves) that has just moderate penalty on K8.
> >
> At my first look at whetstone, the biggest differenece is in usage of
> movsd for nocona. Second one is absence inc/dec. The third one I found
> is less multiplies.
Yes, this is consistent with my experience. SSE is extremly picky on
both CPUs :(
Honza
>
> >
> >
>