This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] tuning gcc for Intel Core2


H. J. Lu wrote:

On Mon, Nov 13, 2006 at 12:01:17PM -0500, Vladimir Makarov wrote:


Here is the patch for tuning gcc for Intel Core2 processor.  I did
about 30 SPEC2000 runs to find good parameters which are practically
the same what Intel gave and recommended in their optimization guide
made public a few days ago.

The patch increases SPECINT2000 score to 1963 from 1925 (for
generic) or 1901 (for nocona).  SPECFP2000 sore is the same as for
generic 1875 (nocona has 1856).  One benchmark (gcc) did particular
well -- about 20% improvement (1788 for generic tuning vs 2210 for
core2).  The size of code generated for Core2 is smaller (0.46% for
SPECInt and 0.54% SPECFp) than one for generic.




This patch is the first step for Core 2 optimization. But I am afraid
that it isn't very useful. I compared -mtune=generic and -mtune=core2
on Core 2. Tha main difference is in 176.gcc. What this patch does is
to turn on


H.J., I don't pretend that the code is final. I will be glad if people find a better tuning. I did this tuning because it was needed for my work. That is strange situation we have no tuning for major processor which is available public for a few months. Ideally tunning should be availiable before the processor became available to public (that is what IBM is doing this for their power6 now). Unfortunately only Intel can make it because they do microacrhitecture documentation public too late. For example, I started the work long before the documenation was made public (a few days ago) using unjustified different information from internet.

And the patch is useful even if I get the same score because as I wrote the code is smaller.

x86_rep_movl_optimal

for Core 2, which will avoid external calls to memset. There
are known serious performance problems with x86-64 memory functions,
especially on Core 2. We are working on improving x86-64 memory
functions. A better external memset can improve gcc in SPEC CPU 2K by
more than 20%.

BTW, when Jan and I were working on -mtune=generic, we determined that
turning on x86_rep_movl_optimal wasn't a good idea. What do you get
if you turn off x86_rep_movl_optimal?


It is the same as generic (may be a bit better).


Here, the results with -m32 as you asked on the mainline few days ago. The code is smaller and Int score is a bit better (not because of gcc which is actually 0.4% worse)


base: -O2 -m32 -mtune=generic
peak: -O2 -m32 -mtune=core2

Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
------------  --------  --------  --------  --------  --------  --------
164.gzip          1400     112        1251      1400     110        1273
164.gzip          1400     112        1251      1400     110        1272*
164.gzip          1400     112        1251*     1400     110        1272
175.vpr           1400      88.0      1592      1400      86.7      1615
175.vpr           1400      87.9      1592      1400      86.9      1612
175.vpr           1400      87.9      1592*     1400      86.7      1614*
176.gcc           1100      46.4      2371*     1100      46.5      2364
176.gcc           1100      46.4      2373      1100      46.7      2356
176.gcc           1100      46.4      2370      1100      46.6      2362*
181.mcf           1800      72.0      2500      1800      71.1      2533
181.mcf           1800      71.4      2521*     1800      71.4      2521*
181.mcf           1800      71.4      2522      1800      72.4      2486
186.crafty        1000      58.2      1717      1000      58.8      1701*
186.crafty        1000      58.3      1715      1000      58.9      1699
186.crafty        1000      58.3      1716*     1000      58.8      1701
197.parser        1800     140        1287*     1800     138        1301*
197.parser        1800     140        1287      1800     138        1302
197.parser        1800     140        1286      1800     138        1301
252.eon           1300      78.6      1653      1300      78.1      1665
252.eon           1300      78.6      1653*     1300      78.1      1665*
252.eon           1300      78.7      1653      1300      78.1      1665
253.perlbmk       1800      74.9      2404      1800      79.6      2263
253.perlbmk       1800      74.9      2403      1800      79.5      2264
253.perlbmk       1800      74.9      2404*     1800      79.6      2263*
254.gap           1100      60.0      1834      1100      56.4      1950
254.gap           1100      59.9      1837      1100      56.4      1950*
254.gap           1100      60.0      1834*     1100      56.4      1949
255.vortex        1900      94.6      2008      1900      95.2      1997
255.vortex        1900      94.7      2006      1900      95.3      1993
255.vortex        1900      94.7      2006*     1900      95.2      1996*
256.bzip2         1500      93.4      1607*     1500      90.6      1655
256.bzip2         1500      93.5      1604      1500      90.9      1651
256.bzip2         1500      93.0      1612      1500      90.8      1652*
300.twolf         3000     129        2323*     3000     128        2345*
300.twolf         3000     129        2323      3000     128        2345
300.twolf         3000     129        2321      3000     128        2344
========================================================================
164.gzip          1400     112        1251*     1400     110        1272*
175.vpr           1400      87.9      1592*     1400      86.7      1614*
176.gcc           1100      46.4      2371*     1100      46.6      2362*
181.mcf           1800      71.4      2521*     1800      71.4      2521*
186.crafty        1000      58.3      1716*     1000      58.8      1701*
197.parser        1800     140        1287*     1800     138        1301*
252.eon           1300      78.6      1653*     1300      78.1      1665*
253.perlbmk       1800      74.9      2404*     1800      79.6      2263*
254.gap           1100      60.0      1834*     1100      56.4      1950*
255.vortex        1900      94.7      2006*     1900      95.2      1996*
256.bzip2         1500      93.4      1607*     1500      90.8      1652*
300.twolf         3000     129        2323*     3000     128        2345*
Est. SPECint_base2000                 1832
Est. SPECint2000                                                    1843

----------------CINT2000-----------------
-1.494%          34266          33754 164.gzip
-1.089%         133691         132235 175.vpr
-0.556%        1274503        1267415 176.gcc
-0.961%           9992           9896 181.mcf
-0.537%         199666         198594 186.crafty
-0.855%          89789          89021 197.parser
-0.176%         399886         399182 252.eon
-0.383%         480158         478318 253.perlbmk
-1.756%         432878         425278 254.gap
-0.194%         553673         552601 255.vortex
-1.613%          29763          29283 256.bzip2
-1.235%         187824         185504 300.twolf
Average = -0.723261%

Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
------------  --------  --------  --------  --------  --------  --------
168.wupwise       1600      92.2      1736*     1600      93.9      1704*
168.wupwise       1600      92.2      1736      1600      93.8      1705
168.wupwise       1600      92.2      1735      1600      93.9      1703
171.swim          3100     129        2399      3100     129        2396
171.swim          3100     129        2400      3100     129        2397
171.swim          3100     129        2400*     3100     129        2397*
172.mgrid         1800     207         870*     1800     206         872
172.mgrid         1800     207         870      1800     206         873*
172.mgrid         1800     207         870      1800     206         873
173.applu         2100     189        1108*     2100     191        1102
173.applu         2100     189        1109      2100     190        1103
173.applu         2100     190        1108      2100     190        1102*
177.mesa          1400     103        1353      1400     100        1398
177.mesa          1400     103        1354      1400     100        1399
177.mesa          1400     103        1353*     1400     100        1398*
178.galgel        2900      71.5      4055      2900      71.1      4077
178.galgel        2900      71.5      4056*     2900      71.1      4081*
178.galgel        2900      71.3      4066      2900      70.9      4089
179.art           2600      90.9      2860      2600      94.7      2747
179.art           2600      90.0      2888      2600      93.7      2774*
179.art           2600      90.5      2873*     2600      93.5      2781
183.equake        1300      73.0      1781      1300      73.0      1782*
183.equake        1300      73.0      1782      1300      73.0      1782
183.equake        1300      73.0      1782*     1300      73.0      1781
187.facerec       1900     138        1373*     1900     139        1370*
187.facerec       1900     138        1372      1900     139        1371
187.facerec       1900     138        1373      1900     139        1369
188.ammp          2200     197        1119      2200     197        1119*
188.ammp          2200     196        1120      2200     197        1119
188.ammp          2200     196        1120*     2200     197        1119
189.lucas         2000     146        1372      2000     147        1364
189.lucas         2000     146        1372*     2000     147        1364
189.lucas         2000     146        1373      2000     147        1364*
191.fma3d         2100     177        1185      2100     178        1181*
191.fma3d         2100     177        1186      2100     178        1181
191.fma3d         2100     177        1186*     2100     178        1181
200.sixtrack      1100     155         710*     1100     154         716*
200.sixtrack      1100     155         711      1100     154         716
200.sixtrack      1100     155         710      1100     154         716
301.apsi          2600     218        1192      2600     217        1200
301.apsi          2600     218        1192*     2600     216        1201
301.apsi          2600     218        1192      2600     217        1201*
========================================================================
168.wupwise       1600      92.2      1736*     1600      93.9      1704*
171.swim          3100     129        2400*     3100     129        2397*
172.mgrid         1800     207         870*     1800     206         873*
173.applu         2100     189        1108*     2100     190        1102*
177.mesa          1400     103        1353*     1400     100        1398*
178.galgel        2900      71.5      4056*     2900      71.1      4081*
179.art           2600      90.5      2873*     2600      93.7      2774*
183.equake        1300      73.0      1782*     1300      73.0      1782*
187.facerec       1900     138        1373*     1900     139        1370*
188.ammp          2200     196        1120*     2200     197        1119*
189.lucas         2000     146        1372*     2000     147        1364*
191.fma3d         2100     177        1186*     2100     178        1181*
200.sixtrack      1100     155         710*     1100     154         716*
301.apsi          2600     218        1192*     2600     217        1201*
Est. SPECfp_base2000                  1479
Est. SPECfp2000                                                     1477

----------------CFP2000-----------------
-0.628%          28047          27871 168.wupwise
-1.000%           7999           7919 171.swim
-0.819%          15634          15506 172.mgrid
-0.863%          51921          51473 173.applu
-0.974%         466501         461957 177.mesa
-1.030%         174045         172253 178.galgel
-1.343%          13109          12933 179.art
-0.751%          17038          16910 183.equake
-0.776%          68023          67495 187.facerec
-0.603%         106054         105414 188.ammp
-0.472%          47507          47283 189.lucas
-0.611%        1015909        1009701 191.fma3d
-0.685%         875414         869414 200.sixtrack
-0.695%         124355         123491 301.apsi
Average = -0.661717%




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]