This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Simplify expansion of operations on memory
- From: Jan Hubicka <jh at suse dot cz>
- To: gcc-patches at gcc dot gnu dot org
- Date: Fri, 29 Jul 2005 22:48:53 +0200
- Subject: Simplify expansion of operations on memory
Hi,
this patch avoids us to load memory operands to registers when expanding
(so globvar++ is no longer expanded as reg=globvar, reg++, globvar=reg
RTL equivalent). This used to be needed to assist CSE to be useful on
memory operands but with SSA we should no longer need to rely on that.
The patch saves about 15% of compilation time for tramp3d with profiling
as all those tons of counters increments can now be direct. It also
seems to have relatively positive effect on SPEC (in SPECfp actually
more to make me believe it :)
I am somewhat concerned about the comment that CSE might produce instructions
"machine cannot support". This comment dates back to EGCS CVS creation and I
think it is related to fact that i386 backend didn't check that source memory
operand match destination memory so it might've resulted in more reloading,
not to CSE producing unrecognizable insns in some corner cases...
Bootstrapped/regtested i686-pc-gnu-linux, OK?
Honza
Size of binaries:
164.gzip: Base: 71562 bytes
164.gzip: Peak: 71498 bytes
175.vpr: Base: 180696 bytes
175.vpr: Peak: 180504 bytes
176.gcc: Base: 1832382 bytes
176.gcc: Peak: 1830142 bytes
181.mcf: Base: 26815 bytes
181.mcf: Peak: 26847 bytes
186.crafty: Base: 238116 bytes
186.crafty: Peak: 237252 bytes
197.parser: Base: 185566 bytes
197.parser: Peak: 184894 bytes
252.eon: Base: 661565 bytes
252.eon: Peak: 661469 bytes
253.perlbmk: Base: 747199 bytes
253.perlbmk: Peak: 747167 bytes
254.gap: Base: 606174 bytes
254.gap: Peak: 606174 bytes
255.vortex: Base: 662449 bytes
255.vortex: Peak: 662481 bytes
256.bzip2: Base: 63050 bytes
256.bzip2: Peak: 63050 bytes
300.twolf: Base: 239937 bytes
300.twolf: Peak: 240801 bytes
=============================
Total: Base: 5515511 bytes
Total: Peak: 5512279 bytes
Compile times for benchmarks:
164.gzip base: 5 s
175.vpr base: 7 s
176.gcc base: 89 s
181.mcf base: 1 s
186.crafty base: 12 s
197.parser base: 8 s
252.eon base: 76 s
253.perlbmk base: 35 s
254.gap base: 26 s
255.vortex base: 21 s
256.bzip2 base: 2 s
300.twolf base: 15 s
164.gzip peak: 3 s
175.vpr peak: 7 s
176.gcc peak: 81 s
181.mcf peak: 1 s
186.crafty peak: 11 s
197.parser peak: 8 s
252.eon peak: 75 s
253.perlbmk peak: 34 s
254.gap peak: 26 s
255.vortex peak: 21 s
256.bzip2 peak: 2 s
300.twolf peak: 14 s
======================================
Total time for base compilation: 297 s
Total time for peak compilation: 283 s
Estimated Estimated
Base Base Base Peak Peak Peak
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
------------ -------- -------- -------- -------- -------- --------
164.gzip 1400 160 874 1400 159 879
164.gzip 1400 156 900 1400 156 897
164.gzip 1400 158 887* 1400 158 885*
175.vpr 1400 179 784 1400 176 797
175.vpr 1400 175 800 1400 178 788*
175.vpr 1400 178 786* 1400 178 788
176.gcc 1100 102 1075* 1100 99.6 1104
176.gcc 1100 102 1079 1100 101 1093*
176.gcc 1100 104 1061 1100 104 1060
181.mcf 1800 421 427 1800 426 423*
181.mcf 1800 424 425 1800 425 423
181.mcf 1800 423 426* 1800 428 421
186.crafty 1000 63.0 1586* 1000 64.0 1561*
186.crafty 1000 62.9 1590 1000 63.8 1567
186.crafty 1000 65.2 1534 1000 66.2 1511
197.parser 1800 266 676 1800 265 680
197.parser 1800 264 682 1800 266 675*
197.parser 1800 266 677* 1800 267 674
252.eon 1300 88.1 1476 1300 83.9 1549*
252.eon 1300 85.6 1519* 1300 86.2 1508
252.eon 1300 85.5 1521 1300 83.3 1560
253.perlbmk 1800 168 1069 1800 171 1055
253.perlbmk 1800 166 1085 1800 168 1070
253.perlbmk 1800 168 1070* 1800 170 1057*
254.gap 1100 127 863* 1100 128 862*
254.gap 1100 129 850 1100 129 851
254.gap 1100 127 868 1100 127 866
255.vortex 1900 0.0460 X 1900 0.0439 X
255.vortex 1900 0.0120 X 1900 0.0119 X
255.vortex 1900 0.0120 X 1900 0.0119 X
256.bzip2 1500 178 842 1500 178 844
256.bzip2 1500 175 857 1500 175 858
256.bzip2 1500 177 847* 1500 177 847*
300.twolf 3000 326 921 3000 326 921*
300.twolf 3000 329 913 3000 326 920
300.twolf 3000 327 916* 3000 325 924
========================================================================
164.gzip 1400 158 887* 1400 158 885*
175.vpr 1400 178 786* 1400 178 788*
176.gcc 1100 102 1075* 1100 101 1093*
181.mcf 1800 423 426* 1800 426 423*
186.crafty 1000 63.0 1586* 1000 64.0 1561*
197.parser 1800 266 677* 1800 266 675*
252.eon 1300 85.6 1519* 1300 83.9 1549*
253.perlbmk 1800 168 1070* 1800 170 1057*
254.gap 1100 127 863* 1100 128 862*
255.vortex X X
256.bzip2 1500 177 847* 1500 177 847*
300.twolf 3000 327 916* 3000 326 921*
Est. SPECint_base2000 914
Est. SPECint2000 915
Size of binaries:
168.wupwise: Base: 45455 bytes
168.wupwise: Peak: 45455 bytes
171.swim: Base: 23314 bytes
171.swim: Peak: 23314 bytes
172.mgrid: Base: 28922 bytes
172.mgrid: Peak: 28922 bytes
173.applu: Base: 89763 bytes
173.applu: Peak: 89763 bytes
177.mesa: Base: 635606 bytes
177.mesa: Peak: 635638 bytes
179.art: Base: 32032 bytes
179.art: Peak: 32080 bytes
183.equake: Base: 42823 bytes
183.equake: Peak: 42823 bytes
187.facerec: Base: 84852 bytes
187.facerec: Peak: 84852 bytes
188.ammp: Base: 174450 bytes
188.ammp: Peak: 174290 bytes
189.lucas: Base: 81841 bytes
189.lucas: Peak: 81841 bytes
191.fma3d: Base: 1265779 bytes
191.fma3d: Peak: 1258355 bytes
200.sixtrack: Base: 1054784 bytes
200.sixtrack: Peak: 1054272 bytes
301.apsi: Base: 155751 bytes
301.apsi: Peak: 155815 bytes
=============================
Total: Base: 3715372 bytes
Total: Peak: 3707420 bytes
Compile times for benchmarks:
168.wupwise base: 3 s
171.swim base: 1 s
172.mgrid base: 1 s
173.applu base: 4 s
177.mesa base: 29 s
178.galgel base: 0 s
179.art base: 1 s
183.equake base: 1 s
187.facerec base: 3 s
188.ammp base: 7 s
189.lucas base: 3 s
191.fma3d base: 156 s
200.sixtrack base: 68 s
301.apsi base: 6 s
168.wupwise peak: 3 s
171.swim peak: 0 s
172.mgrid peak: 1 s
173.applu peak: 3 s
177.mesa peak: 28 s
178.galgel peak: 0 s
179.art peak: 1 s
183.equake peak: 2 s
187.facerec peak: 3 s
188.ammp peak: 8 s
189.lucas peak: 4 s
191.fma3d peak: 162 s
200.sixtrack peak: 76 s
301.apsi peak: 6 s
======================================
Total time for base compilation: 283 s
Total time for peak compilation: 297 s
Estimated Estimated
Base Base Base Peak Peak Peak
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
------------ -------- -------- -------- -------- -------- --------
168.wupwise 1600 169 946 * 1600 165 967 *
168.wupwise 1600 167 958 1600 167 956
168.wupwise 1600 170 943 1600 165 968
171.swim 3100 496 624 * 3100 492 630
171.swim 3100 497 624 3100 492 630 *
171.swim 3100 496 625 3100 490 633
172.mgrid 1800 243 741 * 1800 241 748
172.mgrid 1800 241 748 1800 240 751 *
172.mgrid 1800 244 738 1800 239 752
173.applu 2100 322 652 2100 315 666
173.applu 2100 321 655 * 2100 315 666
173.applu 2100 320 657 2100 315 666 *
177.mesa 1400 119 1176 1400 117 1195 *
177.mesa 1400 116 1203 1400 119 1176
177.mesa 1400 118 1182 * 1400 117 1196
178.galgel 2900 -- X 2900 -- X
179.art 2600 358 725 2600 355 733
179.art 2600 364 715 2600 360 721
179.art 2600 362 718 * 2600 355 731 *
183.equake 1300 168 775 1300 160 811
183.equake 1300 162 801 1300 162 800
183.equake 1300 164 794 * 1300 161 810 *
187.facerec 1900 262 725 1900 263 724
187.facerec 1900 260 731 * 1900 261 729
187.facerec 1900 258 737 1900 261 728 *
188.ammp 2200 235 936 2200 233 944
188.ammp 2200 237 929 * 2200 231 954
188.ammp 2200 237 929 2200 233 946 *
189.lucas 2000 244 820 * 2000 246 812
189.lucas 2000 245 815 2000 246 814 *
189.lucas 2000 243 823 2000 246 815
191.fma3d 2100 305 687 2100 303 692
191.fma3d 2100 305 689 * 2100 303 693 *
191.fma3d 2100 305 689 2100 303 693
200.sixtrack 1100 234 471 1100 233 472
200.sixtrack 1100 233 473 1100 230 478
200.sixtrack 1100 233 472 * 1100 233 473 *
301.apsi 2600 297 875 2600 295 881
301.apsi 2600 296 879 * 2600 295 881 *
301.apsi 2600 295 881 2600 295 880
========================================================================
168.wupwise 1600 169 946 * 1600 165 967 *
171.swim 3100 496 624 * 3100 492 630 *
172.mgrid 1800 243 741 * 1800 240 751 *
173.applu 2100 321 655 * 2100 315 666 *
177.mesa 1400 118 1182 * 1400 117 1195 *
178.galgel X X
179.art 2600 362 718 * 2600 355 731 *
183.equake 1300 164 794 * 1300 161 810 *
187.facerec 1900 260 731 * 1900 261 728 *
188.ammp 2200 237 929 * 2200 233 946 *
189.lucas 2000 244 820 * 2000 246 814 *
191.fma3d 2100 305 689 * 2100 303 693 *
200.sixtrack 1100 233 472 * 1100 233 473 *
301.apsi 2600 296 879 * 2600 295 881 *
Est. SPECfp_base2000 765
Est. SPECfp2000 773
2005-07-29 Jan Hubicka <jh@suse.cz>
* expr.c (expand_expr_real_1): Do not load mem targets into register.
* i386.c (ix86_fixup_binary_operands): Likewise.
(ix86_expand_unary_operator): Likewise.
(ix86_expand_fp_absneg_operator): Likewise.
* optabs.c (expand_vec_cond_expr): Validate dest.
Index: expr.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/expr.c,v
retrieving revision 1.806
diff -c -3 -p -r1.806 expr.c
*** expr.c 25 Jul 2005 12:04:45 -0000 1.806
--- expr.c 29 Jul 2005 12:10:40 -0000
*************** expand_expr_real_1 (tree exp, rtx target
*** 6578,6595 ****
target = 0;
}
- /* If will do cse, generate all results into pseudo registers
- since 1) that allows cse to find more things
- and 2) otherwise cse could produce an insn the machine
- cannot support. An exception is a CONSTRUCTOR into a multi-word
- MEM: that's much more likely to be most efficient into the MEM.
- Another is a CALL_EXPR which must return in memory. */
-
- if (! cse_not_expected && mode != BLKmode && target
- && (!REG_P (target) || REGNO (target) < FIRST_PSEUDO_REGISTER)
- && ! (code == CONSTRUCTOR && GET_MODE_SIZE (mode) > UNITS_PER_WORD)
- && ! (code == CALL_EXPR && aggregate_value_p (exp, exp)))
- target = 0;
switch (code)
{
--- 6578,6583 ----
Index: optabs.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/optabs.c,v
retrieving revision 1.287
diff -c -3 -p -r1.287 optabs.c
*** optabs.c 12 Jul 2005 09:20:02 -0000 1.287
--- optabs.c 29 Jul 2005 12:10:41 -0000
*************** expand_vec_cond_expr (tree vec_cond_expr
*** 5475,5481 ****
if (icode == CODE_FOR_nothing)
return 0;
! if (!target)
target = gen_reg_rtx (mode);
/* Get comparison rtx. First expand both cond expr operands. */
--- 5475,5481 ----
if (icode == CODE_FOR_nothing)
return 0;
! if (!target || !insn_data[icode].operand[0].predicate (target, mode))
target = gen_reg_rtx (mode);
/* Get comparison rtx. First expand both cond expr operands. */
Index: config/i386/i386.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.843
diff -c -3 -p -r1.843 i386.c
*** config/i386/i386.c 18 Jul 2005 06:39:18 -0000 1.843
--- config/i386/i386.c 29 Jul 2005 12:10:42 -0000
*************** ix86_fixup_binary_operands (enum rtx_cod
*** 8154,8170 ****
&& GET_RTX_CLASS (code) != RTX_COMM_ARITH)
src1 = force_reg (mode, src1);
- /* If optimizing, copy to regs to improve CSE */
- if (optimize && ! no_new_pseudos)
- {
- if (GET_CODE (dst) == MEM)
- dst = gen_reg_rtx (mode);
- if (GET_CODE (src1) == MEM)
- src1 = force_reg (mode, src1);
- if (GET_CODE (src2) == MEM)
- src2 = force_reg (mode, src2);
- }
-
src1 = operands[1] = src1;
src2 = operands[2] = src2;
return dst;
--- 8390,8395 ----
*************** ix86_expand_unary_operator (enum rtx_cod
*** 8274,8288 ****
if (MEM_P (src) && !matching_memory)
src = force_reg (mode, src);
- /* If optimizing, copy to regs to improve CSE. */
- if (optimize && ! no_new_pseudos)
- {
- if (GET_CODE (dst) == MEM)
- dst = gen_reg_rtx (mode);
- if (GET_CODE (src) == MEM)
- src = force_reg (mode, src);
- }
-
/* Emit the instruction. */
op = gen_rtx_SET (VOIDmode, dst, gen_rtx_fmt_e (code, mode, src));
--- 8499,8504 ----
*************** ix86_expand_fp_absneg_operator (enum rtx
*** 8410,8416 ****
matching_memory = false;
if (MEM_P (dst))
{
! if (rtx_equal_p (dst, src) && (!optimize || no_new_pseudos))
matching_memory = true;
else
dst = gen_reg_rtx (mode);
--- 8626,8632 ----
matching_memory = false;
if (MEM_P (dst))
{
! if (rtx_equal_p (dst, src))
matching_memory = true;
else
dst = gen_reg_rtx (mode);