This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] tuning gcc for Intel Core2


H. J. Lu wrote:

On Tue, Nov 14, 2006 at 09:05:07AM +0100, Jan Hubicka wrote:


On Mon, Nov 13, 2006 at 12:01:17PM -0500, Vladimir Makarov wrote:


Here is the patch for tuning gcc for Intel Core2 processor.  I did
about 30 SPEC2000 runs to find good parameters which are practically
the same what Intel gave and recommended in their optimization guide
made public a few days ago.

The patch increases SPECINT2000 score to 1963 from 1925 (for
generic) or 1901 (for nocona).  SPECFP2000 sore is the same as for
generic 1875 (nocona has 1856).  One benchmark (gcc) did particular
well -- about 20% improvement (1788 for generic tuning vs 2210 for
core2).  The size of code generated for Core2 is smaller (0.46% for
SPECInt and 0.54% SPECFp) than one for generic.



This patch is the first step for Core 2 optimization. But I am afraid
that it isn't very useful. I compared -mtune=generic and -mtune=core2
on Core 2. Tha main difference is in 176.gcc. What this patch does is
to turn on

x86_rep_movl_optimal

for Core 2, which will avoid external calls to memset. There
are known serious performance problems with x86-64 memory functions,
especially on Core 2. We are working on improving x86-64 memory
functions. A better external memset can improve gcc in SPEC CPU 2K by
more than 20%.


BTW I still have patch for memcpy/memset generation that allows you to
chose in between basic algorithm (rep/movq,rep/movl,loop, unrolled loop,
library call) based on -mtune switch and expected size of copied block.
It also has simple benchmark utility that allows you to set proper
limits.  I am happy to see -mtune=core2 on place as that patch contained
also basic -mtune=core2 switch and I was basically holding it because I
didn't had time to play with the ohter arguments curefuly enough and
because the profile driven memcpy/memset is infrastructure is not at
place yet.

I will be sending it shortly (I have non-GCC deadline to meet at 23rd,
so probably after that)

What is the particular problem in x86-64 library string functions making
core2 unhappy about them? 20% sounds quite serious and I don't remember
anything particularly crazy about the implementation.




x86-64 memory functions were written for the first generation of Opteron more than 4 years ago. For example, in memset.S, there are

/* This is somehow experimental and could made dependend on the cache
  size.  */
#define LARGE $120000

120000 is much smaller than the cache size of Nocona and Core 2. Better
memset/memcpy can help 176.gcc alot.

Here is what I got with gcc 4.3.  Taking out gcc, -mtune=core2 doesn't
help much.



H.J.
------
		-mtune=generic vs. -mtune=core2
164.gzip 		 -0.723589%
175.vpr 		 0.700935%
176.gcc 		 14.9149%
181.mcf 		 -0.46608%
186.crafty 		 -1%
197.parser 		 0%
252.eon 		 -1.11155%
253.perlbmk 		 1.26084%
254.gap 		 0.0899685%
255.vortex 		 -1.7014%
256.bzip2 		 -0.301811%
300.twolf 		 0.707269%
Est. SPECint_base2000 	 0.934579%

168.wupwise -1.25054%
171.swim -0.770925%
172.mgrid 0.233236%
173.applu -1.76429%
177.mesa 0.119%
178.galgel 0.726676%
179.art -1.17735%
183.equake 0.211327%
187.facerec -0.22805%
188.ammp 1.04167%
189.lucas -2.85331%
191.fma3d -0.365344%
200.sixtrack -1.76991%
301.apsi -1.48924%
Est. SPECfp_base2000 -0.683177%


I have different results on the mainline a few days old. What kind of processor did you use? What stepping? Mine is 6 (i know there were pre-production versions).

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Genuine Intel(R) CPU                  @ 2.66GHz
stepping	: 4
cpu MHz		: 2666.763
cache size	: 4096 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips	: 5339.08
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Genuine Intel(R) CPU                  @ 2.66GHz
stepping	: 4
cpu MHz		: 2666.763
cache size	: 4096 KB
physical id	: 3
siblings	: 2
core id		: 6
cpu cores	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips	: 5332.65
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Genuine Intel(R) CPU                  @ 2.66GHz
stepping	: 4
cpu MHz		: 2666.763
cache size	: 4096 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips	: 5332.57
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Genuine Intel(R) CPU                  @ 2.66GHz
stepping	: 4
cpu MHz		: 2666.763
cache size	: 4096 KB
physical id	: 3
siblings	: 2
core id		: 7
cpu cores	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm pni monitor ds_cpl est tm2 cx16 xtpr
bogomips	: 5332.62
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Benchmarks    Ref Time  Run Time   Ratio     Ref Time  Run Time   Ratio
------------  --------  --------  --------   --------  --------  --------
164.gzip          1400     102        1376*     1400     103        1361
164.gzip          1400     102        1377      1400     103        1361
164.gzip          1400     102        1375      1400     103        1361*
175.vpr           1400      86.8      1613      1400      87.0      1610
175.vpr           1400      87.1      1608      1400      86.8      1612*
175.vpr           1400      86.9      1612*     1400      86.7      1615
176.gcc           1100      61.5      1789      1100      49.8      2209*
176.gcc           1100      61.5      1788*     1100      49.8      2210
176.gcc           1100      61.5      1788      1100      49.9      2206
181.mcf           1800     114        1573      1800     114        1575*
181.mcf           1800     115        1560      1800     114        1580
181.mcf           1800     115        1560*     1800     115        1563
186.crafty        1000      38.0      2631      1000      38.8      2579
186.crafty        1000      38.0      2630*     1000      38.8      2578*
186.crafty        1000      38.0      2629      1000      38.8      2578
197.parser        1800     149        1206      1800     149        1206
197.parser        1800     149        1207      1800     149        1207
197.parser        1800     149        1207*     1800     149        1207*
252.eon           1300      52.7      2465*     1300      52.5      2477
252.eon           1300      52.7      2467      1300      52.4      2480
252.eon           1300      52.8      2464      1300      52.5      2478*
253.perlbmk       1800      70.2      2564*     1800      71.5      2519*
253.perlbmk       1800      70.2      2564      1800      71.1      2532
253.perlbmk       1800      70.2      2564      1800      71.5      2517
254.gap           1100      54.1      2033      1100      53.3      2062*
254.gap           1100      54.2      2029      1100      53.4      2060
254.gap           1100      54.1      2032*     1100      53.3      2063
255.vortex        1900      84.6      2247      1900      85.0      2235*
255.vortex        1900      84.6      2247*     1900      85.0      2234
255.vortex        1900      84.5      2248      1900      85.0      2235
256.bzip2         1500      80.7      1858      1500      79.5      1886
256.bzip2         1500      80.5      1864      1500      79.5      1886*
256.bzip2         1500      80.7      1860*     1500      79.5      1886
300.twolf         3000     122        2460      3000     118        2537
300.twolf         3000     122        2462      3000     118        2537
300.twolf         3000     122        2461*     3000     118        2537*
=========================================================================
164.gzip          1400     102        1376*     1400     103        1361*
175.vpr           1400      86.9      1612*     1400      86.8      1612*
176.gcc           1100      61.5      1788*     1100      49.8      2209*
181.mcf           1800     115        1560*     1800     114        1575*
186.crafty        1000      38.0      2630*     1000      38.8      2578*
197.parser        1800     149        1207*     1800     149        1207*
252.eon           1300      52.7      2465*     1300      52.5      2478*
253.perlbmk       1800      70.2      2564*     1800      71.5      2519*
254.gap           1100      54.1      2032*     1100      53.3      2062*
255.vortex        1900      84.6      2247*     1900      85.0      2235*
256.bzip2         1500      80.7      1860*     1500      79.5      1886*
300.twolf         3000     122        2461*     3000     118        2537*
Est. SPECint_base2000                 1925
Est. SPECint2000                                                    1963


Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio ------------ -------- -------- -------- -------- -------- -------- 168.wupwise 1600 78.1 2048* 1600 79.2 2019 168.wupwise 1600 78.1 2048 1600 79.2 2020* 168.wupwise 1600 78.2 2047 1600 79.2 2021 171.swim 3100 139 2235* 3100 139 2233 171.swim 3100 139 2234 3100 139 2233* 171.swim 3100 139 2236 3100 139 2232 172.mgrid 1800 110 1632* 1800 113 1600 172.mgrid 1800 110 1631 1800 113 1599* 172.mgrid 1800 110 1633 1800 113 1599 173.applu 2100 180 1167 2100 171 1232 173.applu 2100 180 1167* 2100 171 1231 173.applu 2100 180 1167 2100 171 1231* 177.mesa 1400 59.5 2355* 1400 59.9 2337 177.mesa 1400 59.3 2359 1400 60.0 2333 177.mesa 1400 59.5 2355 1400 60.0 2335* 178.galgel 2900 69.5 4171 2900 70.0 4141* 178.galgel 2900 69.9 4150 2900 70.0 4142 178.galgel 2900 69.6 4165* 2900 70.1 4140 179.art 2600 42.8 6076 2600 42.0 6184 179.art 2600 42.5 6111* 2600 42.8 6073* 179.art 2600 42.4 6137 2600 43.1 6026 183.equake 1300 70.6 1843 1300 70.7 1838* 183.equake 1300 70.6 1842* 1300 70.8 1837 183.equake 1300 70.6 1842 1300 70.7 1839 187.facerec 1900 131 1445* 1900 132 1439 187.facerec 1900 131 1445 1900 132 1440* 187.facerec 1900 132 1443 1900 132 1441 188.ammp 2200 124 1770 2200 125 1758 188.ammp 2200 124 1770* 2200 125 1759* 188.ammp 2200 124 1767 2200 125 1760 189.lucas 2000 116 1726* 2000 116 1728 189.lucas 2000 116 1727 2000 116 1728* 189.lucas 2000 116 1726 2000 116 1728 191.fma3d 2100 163 1292 2100 162 1295* 191.fma3d 2100 163 1292 2100 162 1295 191.fma3d 2100 163 1292* 2100 162 1295 200.sixtrack 1100 122 901 1100 122 904* 200.sixtrack 1100 122 900 1100 122 904 200.sixtrack 1100 122 900* 1100 122 904 301.apsi 2600 193 1345* 2600 197 1319* 301.apsi 2600 193 1344 2600 197 1319 301.apsi 2600 193 1345 2600 197 1319 ======================================================================== 168.wupwise 1600 78.1 2048* 1600 79.2 2020* 171.swim 3100 139 2235* 3100 139 2233* 172.mgrid 1800 110 1632* 1800 113 1599* 173.applu 2100 180 1167* 2100 171 1231* 177.mesa 1400 59.5 2355* 1400 60.0 2335* 178.galgel 2900 69.6 4165* 2900 70.0 4141* 179.art 2600 42.5 6111* 2600 42.8 6073* 183.equake 1300 70.6 1842* 1300 70.7 1838* 187.facerec 1900 131 1445* 1900 132 1440* 188.ammp 2200 124 1770* 2200 125 1759* 189.lucas 2000 116 1726* 2000 116 1728* 191.fma3d 2100 163 1292* 2100 162 1295* 200.sixtrack 1100 122 900* 1100 122 904* 301.apsi 2600 193 1345* 2600 197 1319* Est. SPECfp_base2000 1875 Est. SPECfp2000 1872




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]