[patch] tuning gcc for Intel Core2
Vladimir Makarov
vmakarov@redhat.com
Tue Nov 14 15:27:00 GMT 2006
H. J. Lu wrote:
>On Tue, Nov 14, 2006 at 09:05:07AM +0100, Jan Hubicka wrote:
>
>
>>>On Mon, Nov 13, 2006 at 12:01:17PM -0500, Vladimir Makarov wrote:
>>>
>>>
>>>> Here is the patch for tuning gcc for Intel Core2 processor. I did
>>>>about 30 SPEC2000 runs to find good parameters which are practically
>>>>the same what Intel gave and recommended in their optimization guide
>>>>made public a few days ago.
>>>>
>>>> The patch increases SPECINT2000 score to 1963 from 1925 (for
>>>>generic) or 1901 (for nocona). SPECFP2000 sore is the same as for
>>>>generic 1875 (nocona has 1856). One benchmark (gcc) did particular
>>>>well -- about 20% improvement (1788 for generic tuning vs 2210 for
>>>>core2). The size of code generated for Core2 is smaller (0.46% for
>>>>SPECInt and 0.54% SPECFp) than one for generic.
>>>>
>>>>
>>>>
>>>This patch is the first step for Core 2 optimization. But I am afraid
>>>that it isn't very useful. I compared -mtune=generic and -mtune=core2
>>>on Core 2. Tha main difference is in 176.gcc. What this patch does is
>>>to turn on
>>>
>>>x86_rep_movl_optimal
>>>
>>>for Core 2, which will avoid external calls to memset. There
>>>are known serious performance problems with x86-64 memory functions,
>>>especially on Core 2. We are working on improving x86-64 memory
>>>functions. A better external memset can improve gcc in SPEC CPU 2K by
>>>more than 20%.
>>>
>>>
>>BTW I still have patch for memcpy/memset generation that allows you to
>>chose in between basic algorithm (rep/movq,rep/movl,loop, unrolled loop,
>>library call) based on -mtune switch and expected size of copied block.
>>It also has simple benchmark utility that allows you to set proper
>>limits. I am happy to see -mtune=core2 on place as that patch contained
>>also basic -mtune=core2 switch and I was basically holding it because I
>>didn't had time to play with the ohter arguments curefuly enough and
>>because the profile driven memcpy/memset is infrastructure is not at
>>place yet.
>>
>>I will be sending it shortly (I have non-GCC deadline to meet at 23rd,
>>so probably after that)
>>
>>What is the particular problem in x86-64 library string functions making
>>core2 unhappy about them? 20% sounds quite serious and I don't remember
>>anything particularly crazy about the implementation.
>>
>>
>>
>
>x86-64 memory functions were written for the first generation of
>Opteron more than 4 years ago. For example, in memset.S, there are
>
>/* This is somehow experimental and could made dependend on the cache
> size. */
>#define LARGE $120000
>
>120000 is much smaller than the cache size of Nocona and Core 2. Better
>memset/memcpy can help 176.gcc alot.
>
>Here is what I got with gcc 4.3. Taking out gcc, -mtune=core2 doesn't
>help much.
>
>
>
>H.J.
>------
> -mtune=generic vs. -mtune=core2
>164.gzip -0.723589%
>175.vpr 0.700935%
>176.gcc 14.9149%
>181.mcf -0.46608%
>186.crafty -1%
>197.parser 0%
>252.eon -1.11155%
>253.perlbmk 1.26084%
>254.gap 0.0899685%
>255.vortex -1.7014%
>256.bzip2 -0.301811%
>300.twolf 0.707269%
>Est. SPECint_base2000 0.934579%
>
>168.wupwise -1.25054%
>171.swim -0.770925%
>172.mgrid 0.233236%
>173.applu -1.76429%
>177.mesa 0.119%
>178.galgel 0.726676%
>179.art -1.17735%
>183.equake 0.211327%
>187.facerec -0.22805%
>188.ammp 1.04167%
>189.lucas -2.85331%
>191.fma3d -0.365344%
>200.sixtrack -1.76991%
>301.apsi -1.48924%
>Est. SPECfp_base2000 -0.683177%
>
>
I have different results on the mainline a few days old. What kind of
processor did you use? What stepping? Mine is 6 (i know there were
pre-production versions).
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Genuine Intel(R) CPU @ 2.66GHz
stepping : 4
cpu MHz : 2666.763
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips : 5339.08
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Genuine Intel(R) CPU @ 2.66GHz
stepping : 4
cpu MHz : 2666.763
cache size : 4096 KB
physical id : 3
siblings : 2
core id : 6
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips : 5332.65
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Genuine Intel(R) CPU @ 2.66GHz
stepping : 4
cpu MHz : 2666.763
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips : 5332.57
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Genuine Intel(R) CPU @ 2.66GHz
stepping : 4
cpu MHz : 2666.763
cache size : 4096 KB
physical id : 3
siblings : 2
core id : 7
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips : 5332.62
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
------------ -------- -------- -------- -------- -------- --------
164.gzip 1400 102 1376* 1400 103 1361
164.gzip 1400 102 1377 1400 103 1361
164.gzip 1400 102 1375 1400 103 1361*
175.vpr 1400 86.8 1613 1400 87.0 1610
175.vpr 1400 87.1 1608 1400 86.8 1612*
175.vpr 1400 86.9 1612* 1400 86.7 1615
176.gcc 1100 61.5 1789 1100 49.8 2209*
176.gcc 1100 61.5 1788* 1100 49.8 2210
176.gcc 1100 61.5 1788 1100 49.9 2206
181.mcf 1800 114 1573 1800 114 1575*
181.mcf 1800 115 1560 1800 114 1580
181.mcf 1800 115 1560* 1800 115 1563
186.crafty 1000 38.0 2631 1000 38.8 2579
186.crafty 1000 38.0 2630* 1000 38.8 2578*
186.crafty 1000 38.0 2629 1000 38.8 2578
197.parser 1800 149 1206 1800 149 1206
197.parser 1800 149 1207 1800 149 1207
197.parser 1800 149 1207* 1800 149 1207*
252.eon 1300 52.7 2465* 1300 52.5 2477
252.eon 1300 52.7 2467 1300 52.4 2480
252.eon 1300 52.8 2464 1300 52.5 2478*
253.perlbmk 1800 70.2 2564* 1800 71.5 2519*
253.perlbmk 1800 70.2 2564 1800 71.1 2532
253.perlbmk 1800 70.2 2564 1800 71.5 2517
254.gap 1100 54.1 2033 1100 53.3 2062*
254.gap 1100 54.2 2029 1100 53.4 2060
254.gap 1100 54.1 2032* 1100 53.3 2063
255.vortex 1900 84.6 2247 1900 85.0 2235*
255.vortex 1900 84.6 2247* 1900 85.0 2234
255.vortex 1900 84.5 2248 1900 85.0 2235
256.bzip2 1500 80.7 1858 1500 79.5 1886
256.bzip2 1500 80.5 1864 1500 79.5 1886*
256.bzip2 1500 80.7 1860* 1500 79.5 1886
300.twolf 3000 122 2460 3000 118 2537
300.twolf 3000 122 2462 3000 118 2537
300.twolf 3000 122 2461* 3000 118 2537*
=========================================================================
164.gzip 1400 102 1376* 1400 103 1361*
175.vpr 1400 86.9 1612* 1400 86.8 1612*
176.gcc 1100 61.5 1788* 1100 49.8 2209*
181.mcf 1800 115 1560* 1800 114 1575*
186.crafty 1000 38.0 2630* 1000 38.8 2578*
197.parser 1800 149 1207* 1800 149 1207*
252.eon 1300 52.7 2465* 1300 52.5 2478*
253.perlbmk 1800 70.2 2564* 1800 71.5 2519*
254.gap 1100 54.1 2032* 1100 53.3 2062*
255.vortex 1900 84.6 2247* 1900 85.0 2235*
256.bzip2 1500 80.7 1860* 1500 79.5 1886*
300.twolf 3000 122 2461* 3000 118 2537*
Est. SPECint_base2000 1925
Est. SPECint2000 1963
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
------------ -------- -------- -------- -------- -------- --------
168.wupwise 1600 78.1 2048* 1600 79.2 2019
168.wupwise 1600 78.1 2048 1600 79.2 2020*
168.wupwise 1600 78.2 2047 1600 79.2 2021
171.swim 3100 139 2235* 3100 139 2233
171.swim 3100 139 2234 3100 139 2233*
171.swim 3100 139 2236 3100 139 2232
172.mgrid 1800 110 1632* 1800 113 1600
172.mgrid 1800 110 1631 1800 113 1599*
172.mgrid 1800 110 1633 1800 113 1599
173.applu 2100 180 1167 2100 171 1232
173.applu 2100 180 1167* 2100 171 1231
173.applu 2100 180 1167 2100 171 1231*
177.mesa 1400 59.5 2355* 1400 59.9 2337
177.mesa 1400 59.3 2359 1400 60.0 2333
177.mesa 1400 59.5 2355 1400 60.0 2335*
178.galgel 2900 69.5 4171 2900 70.0 4141*
178.galgel 2900 69.9 4150 2900 70.0 4142
178.galgel 2900 69.6 4165* 2900 70.1 4140
179.art 2600 42.8 6076 2600 42.0 6184
179.art 2600 42.5 6111* 2600 42.8 6073*
179.art 2600 42.4 6137 2600 43.1 6026
183.equake 1300 70.6 1843 1300 70.7 1838*
183.equake 1300 70.6 1842* 1300 70.8 1837
183.equake 1300 70.6 1842 1300 70.7 1839
187.facerec 1900 131 1445* 1900 132 1439
187.facerec 1900 131 1445 1900 132 1440*
187.facerec 1900 132 1443 1900 132 1441
188.ammp 2200 124 1770 2200 125 1758
188.ammp 2200 124 1770* 2200 125 1759*
188.ammp 2200 124 1767 2200 125 1760
189.lucas 2000 116 1726* 2000 116 1728
189.lucas 2000 116 1727 2000 116 1728*
189.lucas 2000 116 1726 2000 116 1728
191.fma3d 2100 163 1292 2100 162 1295*
191.fma3d 2100 163 1292 2100 162 1295
191.fma3d 2100 163 1292* 2100 162 1295
200.sixtrack 1100 122 901 1100 122 904*
200.sixtrack 1100 122 900 1100 122 904
200.sixtrack 1100 122 900* 1100 122 904
301.apsi 2600 193 1345* 2600 197 1319*
301.apsi 2600 193 1344 2600 197 1319
301.apsi 2600 193 1345 2600 197 1319
========================================================================
168.wupwise 1600 78.1 2048* 1600 79.2 2020*
171.swim 3100 139 2235* 3100 139 2233*
172.mgrid 1800 110 1632* 1800 113 1599*
173.applu 2100 180 1167* 2100 171 1231*
177.mesa 1400 59.5 2355* 1400 60.0 2335*
178.galgel 2900 69.6 4165* 2900 70.0 4141*
179.art 2600 42.5 6111* 2600 42.8 6073*
183.equake 1300 70.6 1842* 1300 70.7 1838*
187.facerec 1900 131 1445* 1900 132 1440*
188.ammp 2200 124 1770* 2200 125 1759*
189.lucas 2000 116 1726* 2000 116 1728*
191.fma3d 2100 163 1292* 2100 162 1295*
200.sixtrack 1100 122 900* 1100 122 904*
301.apsi 2600 193 1345* 2600 197 1319*
Est. SPECfp_base2000 1875
Est. SPECfp2000 1872
More information about the Gcc-patches
mailing list