This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Simplify expansion of operations on memory


Hi,
this patch avoids us to load memory operands to registers when expanding
(so globvar++ is no longer expanded as reg=globvar, reg++, globvar=reg
RTL equivalent).  This used to be needed to assist CSE to be useful on
memory operands but with SSA we should no longer need to rely on that.
The patch saves about 15% of compilation time for tramp3d with profiling
as all those tons of counters increments can now be direct.  It also
seems to have relatively positive effect on SPEC (in SPECfp actually
more to make me believe it :)

I am somewhat concerned about the comment that CSE might produce instructions
"machine cannot support".  This comment dates back to EGCS CVS creation and I
think it is related to fact that i386 backend didn't check that source memory
operand match destination memory so it might've resulted in more reloading,
not to CSE producing unrecognizable insns in some corner cases...

Bootstrapped/regtested i686-pc-gnu-linux, OK?

Honza

Size of binaries:
 164.gzip: Base: 71562 bytes
 164.gzip: Peak: 71498 bytes
 175.vpr: Base: 180696 bytes
 175.vpr: Peak: 180504 bytes
 176.gcc: Base: 1832382 bytes
 176.gcc: Peak: 1830142 bytes
 181.mcf: Base: 26815 bytes
 181.mcf: Peak: 26847 bytes
 186.crafty: Base: 238116 bytes
 186.crafty: Peak: 237252 bytes
 197.parser: Base: 185566 bytes
 197.parser: Peak: 184894 bytes
 252.eon: Base: 661565 bytes
 252.eon: Peak: 661469 bytes
 253.perlbmk: Base: 747199 bytes
 253.perlbmk: Peak: 747167 bytes
 254.gap: Base: 606174 bytes
 254.gap: Peak: 606174 bytes
 255.vortex: Base: 662449 bytes
 255.vortex: Peak: 662481 bytes
 256.bzip2: Base: 63050 bytes
 256.bzip2: Peak: 63050 bytes
 300.twolf: Base: 239937 bytes
 300.twolf: Peak: 240801 bytes
 =============================
 Total: Base: 5515511 bytes
 Total: Peak: 5512279 bytes

Compile times for benchmarks:
164.gzip base: 5 s
175.vpr base: 7 s
176.gcc base: 89 s
181.mcf base: 1 s
186.crafty base: 12 s
197.parser base: 8 s
252.eon base: 76 s
253.perlbmk base: 35 s
254.gap base: 26 s
255.vortex base: 21 s
256.bzip2 base: 2 s
300.twolf base: 15 s
164.gzip peak: 3 s
175.vpr peak: 7 s
176.gcc peak: 81 s
181.mcf peak: 1 s
186.crafty peak: 11 s
197.parser peak: 8 s
252.eon peak: 75 s
253.perlbmk peak: 34 s
254.gap peak: 26 s
255.vortex peak: 21 s
256.bzip2 peak: 2 s
300.twolf peak: 14 s
======================================
Total time for base compilation: 297 s
Total time for peak compilation: 283 s

                                     Estimated                     Estimated
                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ------------  --------  --------  --------  --------  --------  --------
   164.gzip          1400  160            874      1400  159            879 
   164.gzip          1400  156            900      1400  156            897 
   164.gzip          1400  158            887*     1400  158            885*
   175.vpr           1400  179            784      1400  176            797 
   175.vpr           1400  175            800      1400  178            788*
   175.vpr           1400  178            786*     1400  178            788 
   176.gcc           1100  102           1075*     1100   99.6         1104 
   176.gcc           1100  102           1079      1100  101           1093*
   176.gcc           1100  104           1061      1100  104           1060 
   181.mcf           1800  421            427      1800  426            423*
   181.mcf           1800  424            425      1800  425            423 
   181.mcf           1800  423            426*     1800  428            421 
   186.crafty        1000   63.0         1586*     1000   64.0         1561*
   186.crafty        1000   62.9         1590      1000   63.8         1567 
   186.crafty        1000   65.2         1534      1000   66.2         1511 
   197.parser        1800  266            676      1800  265            680 
   197.parser        1800  264            682      1800  266            675*
   197.parser        1800  266            677*     1800  267            674 
   252.eon           1300   88.1         1476      1300   83.9         1549*
   252.eon           1300   85.6         1519*     1300   86.2         1508 
   252.eon           1300   85.5         1521      1300   83.3         1560 
   253.perlbmk       1800  168           1069      1800  171           1055 
   253.perlbmk       1800  166           1085      1800  168           1070 
   253.perlbmk       1800  168           1070*     1800  170           1057*
   254.gap           1100  127            863*     1100  128            862*
   254.gap           1100  129            850      1100  129            851 
   254.gap           1100  127            868      1100  127            866 
   255.vortex        1900    0.0460          X     1900    0.0439          X
   255.vortex        1900    0.0120          X     1900    0.0119          X
   255.vortex        1900    0.0120          X     1900    0.0119          X
   256.bzip2         1500  178            842      1500  178            844 
   256.bzip2         1500  175            857      1500  175            858 
   256.bzip2         1500  177            847*     1500  177            847*
   300.twolf         3000  326            921      3000  326            921*
   300.twolf         3000  329            913      3000  326            920 
   300.twolf         3000  327            916*     3000  325            924 
   ========================================================================
   164.gzip          1400  158            887*     1400  158            885*
   175.vpr           1400  178            786*     1400  178            788*
   176.gcc           1100  102           1075*     1100  101           1093*
   181.mcf           1800  423            426*     1800  426            423*
   186.crafty        1000   63.0         1586*     1000   64.0         1561*
   197.parser        1800  266            677*     1800  266            675*
   252.eon           1300   85.6         1519*     1300   83.9         1549*
   253.perlbmk       1800  168           1070*     1800  170           1057*
   254.gap           1100  127            863*     1100  128            862*
   255.vortex                                X                             X
   256.bzip2         1500  177            847*     1500  177            847*
   300.twolf         3000  327            916*     3000  326            921*
   Est. SPECint_base2000                  914
   Est. SPECint2000                                                     915

Size of binaries:
 168.wupwise: Base: 45455 bytes
 168.wupwise: Peak: 45455 bytes
 171.swim: Base: 23314 bytes
 171.swim: Peak: 23314 bytes
 172.mgrid: Base: 28922 bytes
 172.mgrid: Peak: 28922 bytes
 173.applu: Base: 89763 bytes
 173.applu: Peak: 89763 bytes
 177.mesa: Base: 635606 bytes
 177.mesa: Peak: 635638 bytes
 179.art: Base: 32032 bytes
 179.art: Peak: 32080 bytes
 183.equake: Base: 42823 bytes
 183.equake: Peak: 42823 bytes
 187.facerec: Base: 84852 bytes
 187.facerec: Peak: 84852 bytes
 188.ammp: Base: 174450 bytes
 188.ammp: Peak: 174290 bytes
 189.lucas: Base: 81841 bytes
 189.lucas: Peak: 81841 bytes
 191.fma3d: Base: 1265779 bytes
 191.fma3d: Peak: 1258355 bytes
 200.sixtrack: Base: 1054784 bytes
 200.sixtrack: Peak: 1054272 bytes
 301.apsi: Base: 155751 bytes
 301.apsi: Peak: 155815 bytes
 =============================
 Total: Base: 3715372 bytes
 Total: Peak: 3707420 bytes

Compile times for benchmarks:
168.wupwise base: 3 s
171.swim base: 1 s
172.mgrid base: 1 s
173.applu base: 4 s
177.mesa base: 29 s
178.galgel base: 0 s
179.art base: 1 s
183.equake base: 1 s
187.facerec base: 3 s
188.ammp base: 7 s
189.lucas base: 3 s
191.fma3d base: 156 s
200.sixtrack base: 68 s
301.apsi base: 6 s
168.wupwise peak: 3 s
171.swim peak: 0 s
172.mgrid peak: 1 s
173.applu peak: 3 s
177.mesa peak: 28 s
178.galgel peak: 0 s
179.art peak: 1 s
183.equake peak: 2 s
187.facerec peak: 3 s
188.ammp peak: 8 s
189.lucas peak: 4 s
191.fma3d peak: 162 s
200.sixtrack peak: 76 s
301.apsi peak: 6 s
======================================
Total time for base compilation: 283 s
Total time for peak compilation: 297 s

                                     Estimated                     Estimated
                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ------------  --------  --------  --------  --------  --------  --------
   168.wupwise       1600   169       946    *     1600   165       967    *
   168.wupwise       1600   167       958          1600   167       956     
   168.wupwise       1600   170       943          1600   165       968     
   171.swim          3100   496       624    *     3100   492       630     
   171.swim          3100   497       624          3100   492       630    *
   171.swim          3100   496       625          3100   490       633     
   172.mgrid         1800   243       741    *     1800   241       748     
   172.mgrid         1800   241       748          1800   240       751    *
   172.mgrid         1800   244       738          1800   239       752     
   173.applu         2100   322       652          2100   315       666     
   173.applu         2100   321       655    *     2100   315       666     
   173.applu         2100   320       657          2100   315       666    *
   177.mesa          1400   119      1176          1400   117      1195    *
   177.mesa          1400   116      1203          1400   119      1176     
   177.mesa          1400   118      1182    *     1400   117      1196     
   178.galgel        2900        --          X     2900        --          X
   179.art           2600   358       725          2600   355       733     
   179.art           2600   364       715          2600   360       721     
   179.art           2600   362       718    *     2600   355       731    *
   183.equake        1300   168       775          1300   160       811     
   183.equake        1300   162       801          1300   162       800     
   183.equake        1300   164       794    *     1300   161       810    *
   187.facerec       1900   262       725          1900   263       724     
   187.facerec       1900   260       731    *     1900   261       729     
   187.facerec       1900   258       737          1900   261       728    *
   188.ammp          2200   235       936          2200   233       944     
   188.ammp          2200   237       929    *     2200   231       954     
   188.ammp          2200   237       929          2200   233       946    *
   189.lucas         2000   244       820    *     2000   246       812     
   189.lucas         2000   245       815          2000   246       814    *
   189.lucas         2000   243       823          2000   246       815     
   191.fma3d         2100   305       687          2100   303       692     
   191.fma3d         2100   305       689    *     2100   303       693    *
   191.fma3d         2100   305       689          2100   303       693     
   200.sixtrack      1100   234       471          1100   233       472     
   200.sixtrack      1100   233       473          1100   230       478     
   200.sixtrack      1100   233       472    *     1100   233       473    *
   301.apsi          2600   297       875          2600   295       881     
   301.apsi          2600   296       879    *     2600   295       881    *
   301.apsi          2600   295       881          2600   295       880     
   ========================================================================
   168.wupwise       1600   169       946    *     1600   165       967    *
   171.swim          3100   496       624    *     3100   492       630    *
   172.mgrid         1800   243       741    *     1800   240       751    *
   173.applu         2100   321       655    *     2100   315       666    *
   177.mesa          1400   118      1182    *     1400   117      1195    *
   178.galgel                                X                             X
   179.art           2600   362       718    *     2600   355       731    *
   183.equake        1300   164       794    *     1300   161       810    *
   187.facerec       1900   260       731    *     1900   261       728    *
   188.ammp          2200   237       929    *     2200   233       946    *
   189.lucas         2000   244       820    *     2000   246       814    *
   191.fma3d         2100   305       689    *     2100   303       693    *
   200.sixtrack      1100   233       472    *     1100   233       473    *
   301.apsi          2600   296       879    *     2600   295       881    *
   Est. SPECfp_base2000               765    
   Est. SPECfp2000                                                  773    


2005-07-29  Jan Hubicka  <jh@suse.cz>
	* expr.c (expand_expr_real_1): Do not load mem targets into register.
	* i386.c (ix86_fixup_binary_operands): Likewise.
	(ix86_expand_unary_operator): Likewise.
	(ix86_expand_fp_absneg_operator): Likewise.
	* optabs.c (expand_vec_cond_expr): Validate dest.
Index: expr.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/expr.c,v
retrieving revision 1.806
diff -c -3 -p -r1.806 expr.c
*** expr.c	25 Jul 2005 12:04:45 -0000	1.806
--- expr.c	29 Jul 2005 12:10:40 -0000
*************** expand_expr_real_1 (tree exp, rtx target
*** 6578,6595 ****
        target = 0;
      }
  
-   /* If will do cse, generate all results into pseudo registers
-      since 1) that allows cse to find more things
-      and 2) otherwise cse could produce an insn the machine
-      cannot support.  An exception is a CONSTRUCTOR into a multi-word
-      MEM: that's much more likely to be most efficient into the MEM.
-      Another is a CALL_EXPR which must return in memory.  */
- 
-   if (! cse_not_expected && mode != BLKmode && target
-       && (!REG_P (target) || REGNO (target) < FIRST_PSEUDO_REGISTER)
-       && ! (code == CONSTRUCTOR && GET_MODE_SIZE (mode) > UNITS_PER_WORD)
-       && ! (code == CALL_EXPR && aggregate_value_p (exp, exp)))
-     target = 0;
  
    switch (code)
      {
--- 6578,6583 ----
Index: optabs.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/optabs.c,v
retrieving revision 1.287
diff -c -3 -p -r1.287 optabs.c
*** optabs.c	12 Jul 2005 09:20:02 -0000	1.287
--- optabs.c	29 Jul 2005 12:10:41 -0000
*************** expand_vec_cond_expr (tree vec_cond_expr
*** 5475,5481 ****
    if (icode == CODE_FOR_nothing)
      return 0;
  
!   if (!target)
      target = gen_reg_rtx (mode);
  
    /* Get comparison rtx.  First expand both cond expr operands.  */
--- 5475,5481 ----
    if (icode == CODE_FOR_nothing)
      return 0;
  
!   if (!target || !insn_data[icode].operand[0].predicate (target, mode))
      target = gen_reg_rtx (mode);
  
    /* Get comparison rtx.  First expand both cond expr operands.  */
Index: config/i386/i386.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.843
diff -c -3 -p -r1.843 i386.c
*** config/i386/i386.c	18 Jul 2005 06:39:18 -0000	1.843
--- config/i386/i386.c	29 Jul 2005 12:10:42 -0000
*************** ix86_fixup_binary_operands (enum rtx_cod
*** 8154,8170 ****
        && GET_RTX_CLASS (code) != RTX_COMM_ARITH)
      src1 = force_reg (mode, src1);
  
-   /* If optimizing, copy to regs to improve CSE */
-   if (optimize && ! no_new_pseudos)
-     {
-       if (GET_CODE (dst) == MEM)
- 	dst = gen_reg_rtx (mode);
-       if (GET_CODE (src1) == MEM)
- 	src1 = force_reg (mode, src1);
-       if (GET_CODE (src2) == MEM)
- 	src2 = force_reg (mode, src2);
-     }
- 
    src1 = operands[1] = src1;
    src2 = operands[2] = src2;
    return dst;
--- 8390,8395 ----
*************** ix86_expand_unary_operator (enum rtx_cod
*** 8274,8288 ****
    if (MEM_P (src) && !matching_memory)
      src = force_reg (mode, src);
  
-   /* If optimizing, copy to regs to improve CSE.  */
-   if (optimize && ! no_new_pseudos)
-     {
-       if (GET_CODE (dst) == MEM)
- 	dst = gen_reg_rtx (mode);
-       if (GET_CODE (src) == MEM)
- 	src = force_reg (mode, src);
-     }
- 
    /* Emit the instruction.  */
  
    op = gen_rtx_SET (VOIDmode, dst, gen_rtx_fmt_e (code, mode, src));
--- 8499,8504 ----
*************** ix86_expand_fp_absneg_operator (enum rtx
*** 8410,8416 ****
    matching_memory = false;
    if (MEM_P (dst))
      {
!       if (rtx_equal_p (dst, src) && (!optimize || no_new_pseudos))
  	matching_memory = true;
        else
  	dst = gen_reg_rtx (mode);
--- 8626,8632 ----
    matching_memory = false;
    if (MEM_P (dst))
      {
!       if (rtx_equal_p (dst, src))
  	matching_memory = true;
        else
  	dst = gen_reg_rtx (mode);


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]