[patch] tuning gcc for Intel Core2

Vladimir Makarov vmakarov@redhat.com
Tue Nov 14 15:27:00 GMT 2006


H. J. Lu wrote:

>On Tue, Nov 14, 2006 at 09:05:07AM +0100, Jan Hubicka wrote:
>  
>
>>>On Mon, Nov 13, 2006 at 12:01:17PM -0500, Vladimir Makarov wrote:
>>>      
>>>
>>>> Here is the patch for tuning gcc for Intel Core2 processor.  I did
>>>>about 30 SPEC2000 runs to find good parameters which are practically
>>>>the same what Intel gave and recommended in their optimization guide
>>>>made public a few days ago.
>>>>
>>>> The patch increases SPECINT2000 score to 1963 from 1925 (for
>>>>generic) or 1901 (for nocona).  SPECFP2000 sore is the same as for
>>>>generic 1875 (nocona has 1856).  One benchmark (gcc) did particular
>>>>well -- about 20% improvement (1788 for generic tuning vs 2210 for
>>>>core2).  The size of code generated for Core2 is smaller (0.46% for
>>>>SPECInt and 0.54% SPECFp) than one for generic.
>>>>
>>>>        
>>>>
>>>This patch is the first step for Core 2 optimization. But I am afraid
>>>that it isn't very useful. I compared -mtune=generic and -mtune=core2
>>>on Core 2. Tha main difference is in 176.gcc. What this patch does is
>>>to turn on
>>>
>>>x86_rep_movl_optimal
>>>
>>>for Core 2, which will avoid external calls to memset. There
>>>are known serious performance problems with x86-64 memory functions,
>>>especially on Core 2. We are working on improving x86-64 memory
>>>functions. A better external memset can improve gcc in SPEC CPU 2K by
>>>more than 20%.
>>>      
>>>
>>BTW I still have patch for memcpy/memset generation that allows you to
>>chose in between basic algorithm (rep/movq,rep/movl,loop, unrolled loop,
>>library call) based on -mtune switch and expected size of copied block.
>>It also has simple benchmark utility that allows you to set proper
>>limits.  I am happy to see -mtune=core2 on place as that patch contained
>>also basic -mtune=core2 switch and I was basically holding it because I
>>didn't had time to play with the ohter arguments curefuly enough and
>>because the profile driven memcpy/memset is infrastructure is not at
>>place yet.
>>
>>I will be sending it shortly (I have non-GCC deadline to meet at 23rd,
>>so probably after that)
>>
>>What is the particular problem in x86-64 library string functions making
>>core2 unhappy about them? 20% sounds quite serious and I don't remember
>>anything particularly crazy about the implementation.
>>
>>    
>>
>
>x86-64 memory functions were written for the first generation of
>Opteron more than 4 years ago. For example, in memset.S, there are
>
>/* This is somehow experimental and could made dependend on the cache
>   size.  */
>#define LARGE $120000
>
>120000 is much smaller than the cache size of Nocona and Core 2. Better
>memset/memcpy can help 176.gcc alot.
>
>Here is what I got with gcc 4.3.  Taking out gcc, -mtune=core2 doesn't
>help much.
>
>
>
>H.J.
>------
>		-mtune=generic vs. -mtune=core2
>164.gzip 		 -0.723589%
>175.vpr 		 0.700935%
>176.gcc 		 14.9149%
>181.mcf 		 -0.46608%
>186.crafty 		 -1%
>197.parser 		 0%
>252.eon 		 -1.11155%
>253.perlbmk 		 1.26084%
>254.gap 		 0.0899685%
>255.vortex 		 -1.7014%
>256.bzip2 		 -0.301811%
>300.twolf 		 0.707269%
>Est. SPECint_base2000 	 0.934579%
>
>168.wupwise 		 -1.25054%
>171.swim 		 -0.770925%
>172.mgrid 		 0.233236%
>173.applu 		 -1.76429%
>177.mesa 		 0.119%
>178.galgel 		 0.726676%
>179.art 		 -1.17735%
>183.equake 		 0.211327%
>187.facerec 		 -0.22805%
>188.ammp 		 1.04167%
>189.lucas 		 -2.85331%
>191.fma3d 		 -0.365344%
>200.sixtrack 		 -1.76991%
>301.apsi 		 -1.48924%
>Est. SPECfp_base2000 	 -0.683177%
>  
>
I have different results on the mainline a few days old.  What kind of 
processor did you use?  What stepping?  Mine is 6 (i know there were 
pre-production versions).

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Genuine Intel(R) CPU                  @ 2.66GHz
stepping	: 4
cpu MHz		: 2666.763
cache size	: 4096 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips	: 5339.08
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Genuine Intel(R) CPU                  @ 2.66GHz
stepping	: 4
cpu MHz		: 2666.763
cache size	: 4096 KB
physical id	: 3
siblings	: 2
core id		: 6
cpu cores	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips	: 5332.65
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Genuine Intel(R) CPU                  @ 2.66GHz
stepping	: 4
cpu MHz		: 2666.763
cache size	: 4096 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips	: 5332.57
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Genuine Intel(R) CPU                  @ 2.66GHz
stepping	: 4
cpu MHz		: 2666.763
cache size	: 4096 KB
physical id	: 3
siblings	: 2
core id		: 7
cpu cores	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips	: 5332.62
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Benchmarks    Ref Time  Run Time   Ratio     Ref Time  Run Time   Ratio
------------  --------  --------  --------   --------  --------  --------
164.gzip          1400     102        1376*     1400     103        1361
164.gzip          1400     102        1377      1400     103        1361
164.gzip          1400     102        1375      1400     103        1361*
175.vpr           1400      86.8      1613      1400      87.0      1610
175.vpr           1400      87.1      1608      1400      86.8      1612*
175.vpr           1400      86.9      1612*     1400      86.7      1615
176.gcc           1100      61.5      1789      1100      49.8      2209*
176.gcc           1100      61.5      1788*     1100      49.8      2210
176.gcc           1100      61.5      1788      1100      49.9      2206
181.mcf           1800     114        1573      1800     114        1575*
181.mcf           1800     115        1560      1800     114        1580
181.mcf           1800     115        1560*     1800     115        1563
186.crafty        1000      38.0      2631      1000      38.8      2579
186.crafty        1000      38.0      2630*     1000      38.8      2578*
186.crafty        1000      38.0      2629      1000      38.8      2578
197.parser        1800     149        1206      1800     149        1206
197.parser        1800     149        1207      1800     149        1207
197.parser        1800     149        1207*     1800     149        1207*
252.eon           1300      52.7      2465*     1300      52.5      2477
252.eon           1300      52.7      2467      1300      52.4      2480
252.eon           1300      52.8      2464      1300      52.5      2478*
253.perlbmk       1800      70.2      2564*     1800      71.5      2519*
253.perlbmk       1800      70.2      2564      1800      71.1      2532
253.perlbmk       1800      70.2      2564      1800      71.5      2517
254.gap           1100      54.1      2033      1100      53.3      2062*
254.gap           1100      54.2      2029      1100      53.4      2060
254.gap           1100      54.1      2032*     1100      53.3      2063
255.vortex        1900      84.6      2247      1900      85.0      2235*
255.vortex        1900      84.6      2247*     1900      85.0      2234
255.vortex        1900      84.5      2248      1900      85.0      2235
256.bzip2         1500      80.7      1858      1500      79.5      1886
256.bzip2         1500      80.5      1864      1500      79.5      1886*
256.bzip2         1500      80.7      1860*     1500      79.5      1886
300.twolf         3000     122        2460      3000     118        2537
300.twolf         3000     122        2462      3000     118        2537
300.twolf         3000     122        2461*     3000     118        2537*
=========================================================================
164.gzip          1400     102        1376*     1400     103        1361*
175.vpr           1400      86.9      1612*     1400      86.8      1612*
176.gcc           1100      61.5      1788*     1100      49.8      2209*
181.mcf           1800     115        1560*     1800     114        1575*
186.crafty        1000      38.0      2630*     1000      38.8      2578*
197.parser        1800     149        1207*     1800     149        1207*
252.eon           1300      52.7      2465*     1300      52.5      2478*
253.perlbmk       1800      70.2      2564*     1800      71.5      2519*
254.gap           1100      54.1      2032*     1100      53.3      2062*
255.vortex        1900      84.6      2247*     1900      85.0      2235*
256.bzip2         1500      80.7      1860*     1500      79.5      1886*
300.twolf         3000     122        2461*     3000     118        2537*
Est. SPECint_base2000                 1925
Est. SPECint2000                                                    1963


Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
------------  --------  --------  --------  --------  --------  --------
168.wupwise       1600      78.1      2048*    1600      79.2      2019
168.wupwise       1600      78.1      2048     1600      79.2      2020*
168.wupwise       1600      78.2      2047     1600      79.2      2021
171.swim          3100     139        2235*    3100     139        2233
171.swim          3100     139        2234     3100     139        2233*
171.swim          3100     139        2236     3100     139        2232
172.mgrid         1800     110        1632*    1800     113        1600
172.mgrid         1800     110        1631     1800     113        1599*
172.mgrid         1800     110        1633     1800     113        1599
173.applu         2100     180        1167     2100     171        1232
173.applu         2100     180        1167*    2100     171        1231
173.applu         2100     180        1167     2100     171        1231*
177.mesa          1400      59.5      2355*    1400      59.9      2337
177.mesa          1400      59.3      2359     1400      60.0      2333
177.mesa          1400      59.5      2355     1400      60.0      2335*
178.galgel        2900      69.5      4171     2900      70.0      4141*
178.galgel        2900      69.9      4150     2900      70.0      4142
178.galgel        2900      69.6      4165*    2900      70.1      4140
179.art           2600      42.8      6076     2600      42.0      6184
179.art           2600      42.5      6111*    2600      42.8      6073*
179.art           2600      42.4      6137     2600      43.1      6026
183.equake        1300      70.6      1843     1300      70.7      1838*
183.equake        1300      70.6      1842*    1300      70.8      1837
183.equake        1300      70.6      1842     1300      70.7      1839
187.facerec       1900     131        1445*    1900     132        1439
187.facerec       1900     131        1445     1900     132        1440*
187.facerec       1900     132        1443     1900     132        1441
188.ammp          2200     124        1770     2200     125        1758
188.ammp          2200     124        1770*    2200     125        1759*
188.ammp          2200     124        1767     2200     125        1760
189.lucas         2000     116        1726*    2000     116        1728
189.lucas         2000     116        1727     2000     116        1728*
189.lucas         2000     116        1726     2000     116        1728
191.fma3d         2100     163        1292     2100     162        1295*
191.fma3d         2100     163        1292     2100     162        1295
191.fma3d         2100     163        1292*    2100     162        1295
200.sixtrack      1100     122         901     1100     122         904*
200.sixtrack      1100     122         900     1100     122         904
200.sixtrack      1100     122         900*    1100     122         904
301.apsi          2600     193        1345*    2600     197        1319*
301.apsi          2600     193        1344     2600     197        1319
301.apsi          2600     193        1345     2600     197        1319
========================================================================
168.wupwise       1600      78.1      2048*    1600      79.2      2020*
171.swim          3100     139        2235*    3100     139        2233*
172.mgrid         1800     110        1632*    1800     113        1599*
173.applu         2100     180        1167*    2100     171        1231*
177.mesa          1400      59.5      2355*    1400      60.0      2335*
178.galgel        2900      69.6      4165*    2900      70.0      4141*
179.art           2600      42.5      6111*    2600      42.8      6073*
183.equake        1300      70.6      1842*    1300      70.7      1838*
187.facerec       1900     131        1445*    1900     132        1440*
188.ammp          2200     124        1770*    2200     125        1759*
189.lucas         2000     116        1726*    2000     116        1728*
191.fma3d         2100     163        1292*    2100     162        1295*
200.sixtrack      1100     122         900*    1100     122         904*
301.apsi          2600     193        1345*    2600     197        1319*
Est. SPECfp_base2000                  1875
Est. SPECfp2000                                                     1872





More information about the Gcc-patches mailing list