Summary: | autoincrement generation is poor | ||
---|---|---|---|
Product: | gcc | Reporter: | Jorn Wolfgang Rennecke <amylaar> |
Component: | rtl-optimization | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED WONTFIX | ||
Severity: | enhancement | CC: | gcc-bugs, giovannibajo, hp, pinskia |
Priority: | P2 | Keywords: | missed-optimization |
Version: | 2.95 | ||
Target Milestone: | --- | ||
See Also: | https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31241 | ||
Host: | Target: | sh*-*-* | |
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2013-12-24 00:00:00 | |
Bug Depends on: | 19078 | ||
Bug Blocks: | 29842, 17652, 23111 | ||
Attachments: |
fix discover_flags_reg & garbage collection
baseline sizes baseline times Sizes with patch for PR20211 times with patch for PR20211 applied This is the patch I tested with CSiBE sh base sizes sh sizes with optimize_related_values sh base sizes for 20050517 baseline sh sizes with 20050517 patch ppc sizes with 20050517 baseline ppc sizes with 20050517 patch ppc sizes with 20050517 patch & 20050518 amendment add-on patch |
Description
Jorn Wolfgang Rennecke
2005-02-25 16:19:13 UTC
What is the compile-time impact of this patch on cc-i compilation, the usual C++ testcases, and SPEC? I am sure this is something worthwile to mention for a review. And BTW, out of curiosity, does the new pass have to live in regmove.c? (Interested because I see this for CRIS too. For CRIS v32, it's even more interesting, because it has no reg+offset addressing.) Subject: Re: autoincrement generation is poor giovannibajo at libero dot it wrote: >------- Additional Comments From giovannibajo at libero dot it 2005-02-25 18:59 ------- >What is the compile-time impact of this patch on cc-i compilation, the usual >C++ testcases, and SPEC? I am sure this is something worthwile to mention for >a review. > > Sorry, I don't have such numbers at the moment. Note that this optimization is specific to AUTO_INC_DEC targets (in principle it is possible to use it to reduce register pressure on other targets too, but there is also a cost to pay in reduced flexibility for sched2 till you make the scheduler aware of the possible transformations). Note that there is a specific flag to enable or disable this optimization, so for each target this can be tuned by OPTIMIZATION_OPTIONS. I made this switch default-on at -O2 mainly because it makes testing the patch much simpler. For integration, I'm also fine with having it default-off and having only the targets that want it turn it on in OPTIMIZATION_OPTIONS. >And BTW, out of curiosity, does the new pass have to live in regmove.c? > > Not absolutely, but it makes sense insofar as an aspect of this optimization is to avoid register-register moves which are required when you end up with awkward register assignments for the addds, and it is also nice to have sched1 run first. And there is also the issue fo inserting add instructions, which needs infrastructure from regmove to avoid issues with targets that have a flags register that can be clobbered by the new adds. For the c4x, it would make sense to run this pass earlier, as detailed in the 1999 thread. (In reply to comment #3) >> What is the compile-time impact of this patch on cc-i compilation, the usual >> C++ testcases, and SPEC? I am sure this is something worthwile to >> mention for a review. > Sorry, I don't have such numbers at the moment. Note that this > optimization is specific to AUTO_INC_DEC targets There are both primary and secondary platforms among the AUTO_INC_DEC targets. So it is probably good to gain some wider test coverage about the compile- time/run-time impact of this. E.g. testing CSIBE on ARM, or SPEC on RS6000. > I made this switch default-on at > -O2 mainly because it makes testing the patch much simpler. > For integration, I'm also fine with having it > default-off and having only the targets that want it turn it on in > OPTIMIZATION_OPTIONS. IMHO it is better to activate new optimization passes at either -O2 or -O3, depending on their impact on compile time and code generation (see above). So I am fine with that once it is proven that this is doing well. > >And BTW, out of curiosity, does the new pass have to live in regmove.c? > Not absolutely, but it makes sense insofar as an aspect of this > optimization is to avoid register-register > moves which are required when you end up with awkward register > assignments for the addds, > and it is also nice to have sched1 run first. And there is also the > issue fo inserting add instructions, > which needs infrastructure from regmove to avoid issues with targets > that have a flags register that > can be clobbered by the new adds. One of the common complaints of RTL code is that monolithic files containing multiple optimization passes (with interwinded functions) and with undocumented public interface is a maintenance mess. Compare that with tree-ssa where each pass lives in its own small[1] file and just exposes the pass description structure. I believe it would be great if the new pass could be live in its own file. The utility functions currently in regmove could be extracted in regmovelib or whatever name fits best. [1] ok, let's ignore ivopts for the sake of this discussion ;) > A patch against 4.0 20050218 is here:
> http://gcc.gnu.org/ml/gcc-patches/2005-02/msg01612.html
discover_flags_reg also needs to be updated to understand INSNs.
(In reply to comment #4) > There are both primary and secondary platforms among the AUTO_INC_DEC targets. > So it is probably good to gain some wider test coverage about the compile- > time/run-time impact of this. E.g. testing CSIBE on ARM, or SPEC on RS6000. I currently don't have an ARM or RS6000 platform to run benchmarks on. We are looking into setting up a powermac for this task, though. AFAICS RS6000 provides register+offset addressing for all modes except the altivec vector modes, so to see any significant benefit from optimize_related_values, the benchmark should have multiple altivec memory accesses from the same structure / array in a single basic block. > One of the common complaints of RTL code is that monolithic files containing > multiple optimization passes (with interwinded functions) and with undocumented > public interface is a maintenance mess. Compare that with tree-ssa where each > pass lives in its own small[1] file and just exposes the pass description > structure. > > I believe it would be great if the new pass could be live in its own file. The > utility functions currently in regmove could be extracted in regmovelib or > whatever name fits best. I am willing to make that change, but would first like to hear from a global or middle-end maintainer that they agree that this is a move in the right direction. Also, there are some details that should be agreed on first. I.e.: Should tehre remain a toplevel regmove function that calls mark_flags_life_zones first, and then the sub-passes, or should each sub-pass call mark_flags_life_zones? If there is such a toplevel functiojn, would it live in the regmove* files, or would that be rest_of_handle_regmove in passes.c ? Where does the combine_stack_adjustments pass belong? Should it be in flow.c, or in it's own file (combine_stack_adj.c ?) This code is waiting for review since 1999: http://gcc.gnu.org/ml/gcc-patches/1999-11n/msg00583.html (In reply to comment #4) > There are both primary and secondary platforms among the AUTO_INC_DEC targets. > So it is probably good to gain some wider test coverage about the compile- > time/run-time impact of this. E.g. testing CSIBE on ARM, or SPEC on RS6000. I have been trying to install Linux/GNU on an POwer Macintosh G3, but it appears that this is not going to work. Inserting a Yellowdog 3.0.1 install 1 CD-rom into the drive freezes the machine solid. BootX versions later than BootX_1.1.3 won't unpack. BootX 1.1.3 freezes the machine about 60% of the time rather than starting a kernel. rocklinux gets a CRC error on the ramdisk and then can't find init. ubuntu manages to continue after ramdisk unpacking errors, and even goes as far as partitioning if you restrict the partitions to 13 GB or less, but then it say it can't find the CD_ROM it just has read lots of files from. So, since there is no native platform I can test, would you like me to benchmark cross-compiling a suitable package? IIRC powerpc-eabisim is a possible cross target. Would CSIBE cross-compile time measurements fit your requirements? (In reply to comment #8) > (In reply to comment #4) > > There are both primary and secondary platforms among the AUTO_INC_DEC targets. > > So it is probably good to gain some wider test coverage about the compile- > > time/run-time impact of this. E.g. testing CSIBE on ARM, or SPEC on RS6000. > > I have been trying to install Linux/GNU on an POwer Macintosh G3, but it > appears that this is not going to work. Inserting a Yellowdog 3.0.1 > install 1 CD-rom into the drive freezes the machine solid. BootX versions > later than BootX_1.1.3 won't unpack. BootX 1.1.3 freezes the machine about > 60% of the time rather than starting a kernel. rocklinux gets a CRC error > on the ramdisk and then can't find init. ubuntu manages to continue after > ramdisk unpacking errors, and even goes as far as partitioning if you > restrict the partitions to 13 GB or less, but then it say it can't find the > CD_ROM it just has read lots of files from. Mac OS X and darwin works on the G3, just fine. (In reply to comment #9) > Mac OS X and darwin works on the G3, just fine. The Mac OS X tiger system requirements say that it needs built-in firewire. This Mac doesn't have firewire. It also came with 32 MB RAM originally, although it has been upgraded to 384 MB. (In reply to comment #10) > (In reply to comment #9) > > Mac OS X and darwin works on the G3, just fine. > > The Mac OS X tiger system requirements say that it needs built-in firewire. > This Mac doesn't have firewire. It also came with 32 MB RAM originally, > although it has been upgraded to 384 MB. Oh, it is a Yikes, oh well. Created attachment 8758 [details]
fix discover_flags_reg & garbage collection
This patch fixes the bitrot of discover_flags_regs which prevents
the optimize_related_values optimization - and some older parts of
regmove too - from actually doing anything.
It also includes regmove.c into the garbage collector header mechanism.
An updated patch is here: http://gcc.gnu.org/ml/gcc-patches/2005-04/msg02898.html Created attachment 8875 [details]
baseline sizes
These are ppc-eabisim CSiBE sizes for a 2005-05-11 snapshot.
Created attachment 8876 [details]
baseline times
Created attachment 8877 [details] Sizes with patch for PR20211 Created attachment 8878 [details] times with patch for PR20211 applied The only change is the following: before: bzip2-1.0.2,compress,9408 bzip2-1.0.2,decompress,10604 after: bzip2-1.0.2,compress,9428 bzip2-1.0.2,decompress,10640 Created attachment 8879 [details] This is the patch I tested with CSiBE I had tested this together with the patch for PR rtl-optimization/20756 because I had sh-elf build problems with aborts in sched-rgn.c. It turned out to be a an issue of REG_NOTES interaction with fixup_match_1, though. Another difference to previous patches was that when setting last_base_insn after perform_addition, I had to add some code to actually find the insns where the base register dies in case the addition is performed in multiple steps. All in all, the differences seem to be only noise for ppc-eabisim. The net size difference is four bytes in favour of the unpatched version; I don't know where I should begin to get a total for the times; there are probably too unreliable for the small differences anyway. As said before the powerpc does not benefit from this optimization because it already has register + offset addressing, unless you bring altivec modes into the picture. Subject: Re: autoincrement generation is poor pinskia at gcc dot gnu dot org wrote: >------- Additional Comments From pinskia at gcc dot gnu dot org 2005-05-12 18:59 ------- >The only change is the following: >before: >bzip2-1.0.2,compress,9408 >bzip2-1.0.2,decompress,10604 >after: >bzip2-1.0.2,compress,9428 >bzip2-1.0.2,decompress,10640 > > > Well, there were also lots of small changes. As said before, if you add it all up, the net difference is just four bytes. Created attachment 8903 [details]
sh base sizes
Created attachment 8904 [details]
sh sizes with optimize_related_values
The compiler was run with -m4 -ml in addition to -Os.
bash-2.05b$ tail +4 ~/sh-base-result-size.csv | awk 'FS="," {sum += $3} END {
print sum }'
3051741
bash-2.05b$ tail +4 ~/sh-orv-result-size.csv | awk 'FS="," {sum += $3} END {
print sum }'
3040971
I.e. the sizes with the optimize_related_values patch are on average a third of
a percent smaller.
updated patch: http://gcc.gnu.org/ml/gcc-patches/2005-05/msg01768.html Created attachment 8917 [details]
sh base sizes for 20050517 baseline
Created attachment 8918 [details]
sh sizes with 20050517 patch
bash-2.05b$ tail +4 ~/20050517-sh-base-result-size.csv | awk 'FS="," {sum +=
$3} END {print sum }'
3051893
bash-2.05b$ tail +4 ~/20050517-sh-orv-result-size.csv | awk 'FS="," {sum += $3}
END {print sum }'
3037343
I.e. with the add_limits logic restored, we are up to 0.477% size improvement.
Created attachment 8920 [details]
ppc sizes with 20050517 baseline
Created attachment 8921 [details]
ppc sizes with 20050517 patch
bash-2.05b$ tail +4 ~/20050517-ppc-base-result-size.csv | awk 'FS="," {sum +=
$3} END {print sum }'
3511496
bash-2.05b$ tail +4 ~/20050517-ppc-orv-result-size.csv | awk 'FS="," {sum +=
$3} END {print sum }'
3514464
I.e. with the add_limits logic restored, we now see a 0.0845% code size
increase.
I think that can be explained by the fact that linking chains together
only works when we have non-zero add_limuts, and the main point of linking
chains is to avoid reg-reg copies. If a three-address add is available,
starting a new chain in a new register doesn't cost extra insns, if the
register allocation stays the same. On the other hand, linking chains
together reduces the freedom of the register allocator.
Created attachment 8924 [details] ppc sizes with 20050517 patch & 20050518 amendment After applying this patch for i686-pc-linux-gnu bootstrap: http://gcc.gnu.org/ml/gcc-patches/2005-05/msg01837.html and this to recognize the availability of three-address adds: http://gcc.gnu.org/ml/gcc-patches/2005-05/msg01863.html The code bloat is gone: bash-2.05b$ tail +4 ~/20050518-ppc-orv-result-size.csv | awk 'FS="," {sum += $3} END {print sum }' 3511488 Subject: Bug 20211 CVSROOT: /cvs/gcc Module name: gcc Branch: sh-elf-4_1-branch Changes by: amylaar@gcc.gnu.org 2005-09-19 16:52:39 Modified files: gcc/doc : invoke.texi Log message: 2005-09-19 J"orn Rennecke <joern.rennecke@st.com> Bernd Schmidt <bernds@redhat.com> PR rtl-optimization/20211 * common.opt: Add optimize-related-values entry. * opts.c (decode_options): Set flag_optimize_related_values. * optabs.c (gen_add3_insn): If direct addition is not possible, try to move the constant into the destination register first. * regmove.c (obstack.h, ggc.h, optabs.h): Include. (related, rel_use_chain, rel_mod, rel_use): New structures. (related_baseinfo, update): Likewise. (lookup_related, rel_build_chain): New functions. (recognize_related_for_insn, record_reg_use, create_rel_use): Likewise. (new_reg_use, rel_record_mem, new_base, invalidate_related): Likewise. (find_related, find_related_toplev, chain_starts_earlier): Likewise. (chain_ends_later, mod_before, remove_setting_insns): Likewise. (perform_addition, modify_address): Likewise. (optimize_related_values_1, optimize_related_values_0): Likewise. (optimize_related_values, count_sets, link_chains): Likewise. (init_add_limits): Likewise. (REL_USE_HASH_SIZE, REL_USE_HASH, rel_alloc, rel_new): New macros. (regno_related, rel_base_list, unrelatedly_used): New variables. (related_obstack, add_limits): Likewise. (regmove_optimize): Call optimize_related_values. Include gt-regmove.h. (have_3addr_const_add): New variable. * Makefile.in (gt-regmove.h): New rule. (regmove.o): Depend on $(OPTABS_H) and gt-regmove.h. (GTFILES): Add regmove.c. * doc/invoke.texi: Document -foptimize-related-values. 2005-09-19 J"orn Rennecke <joern.rennecke@st.com> * regmove.c (discover_flags_reg): Use the PATTERN of an INSN. * regmove.c (fixup_match_1): When moving a death note, check if it needs changing into a REG_UNUSED note. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/doc/invoke.texi.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.598.2.4&r2=1.598.2.5 Subject: Bug 20211 CVSROOT: /cvs/gcc Module name: gcc Branch: sh-elf-4_1-branch Changes by: amylaar@gcc.gnu.org 2005-09-19 16:54:05 Modified files: gcc : ChangeLog common.opt optabs.c opts.c regmove.c Log message: 2005-09-19 J"orn Rennecke <joern.rennecke@st.com> Bernd Schmidt <bernds@redhat.com> PR rtl-optimization/20211 * common.opt: Add optimize-related-values entry. * opts.c (decode_options): Set flag_optimize_related_values. * optabs.c (gen_add3_insn): If direct addition is not possible, try to move the constant into the destination register first. * regmove.c (obstack.h, ggc.h, optabs.h): Include. (related, rel_use_chain, rel_mod, rel_use): New structures. (related_baseinfo, update): Likewise. (lookup_related, rel_build_chain): New functions. (recognize_related_for_insn, record_reg_use, create_rel_use): Likewise. (new_reg_use, rel_record_mem, new_base, invalidate_related): Likewise. (find_related, find_related_toplev, chain_starts_earlier): Likewise. (chain_ends_later, mod_before, remove_setting_insns): Likewise. (perform_addition, modify_address): Likewise. (optimize_related_values_1, optimize_related_values_0): Likewise. (optimize_related_values, count_sets, link_chains): Likewise. (init_add_limits): Likewise. (REL_USE_HASH_SIZE, REL_USE_HASH, rel_alloc, rel_new): New macros. (regno_related, rel_base_list, unrelatedly_used): New variables. (related_obstack, add_limits): Likewise. (regmove_optimize): Call optimize_related_values. Include gt-regmove.h. (have_3addr_const_add): New variable. * Makefile.in (gt-regmove.h): New rule. (regmove.o): Depend on $(OPTABS_H) and gt-regmove.h. (GTFILES): Add regmove.c. * doc/invoke.texi: Document -foptimize-related-values. 2005-09-19 J"orn Rennecke <joern.rennecke@st.com> * regmove.c (discover_flags_reg): Use the PATTERN of an INSN. * regmove.c (fixup_match_1): When moving a death note, check if it needs changing into a REG_UNUSED note. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=2.8142.2.30&r2=2.8142.2.31 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/common.opt.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.66.2.3&r2=1.66.2.4 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/optabs.c.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.268.2.4&r2=1.268.2.5 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/opts.c.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.99.2.3&r2=1.99.2.4 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/regmove.c.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.166.18.2&r2=1.166.18.3 Test results for sh-elf-4_1-branch with http://gcc.gnu.org/ml/gcc-patches/2005-09/msg01176.html are: http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg00925.html http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg00926.html http://gcc.gnu.org/ml/gcc-testresults/2005-09/msg00927.html Subject: Bug 20211 Author: amylaar Date: Wed Nov 2 21:50:22 2005 New Revision: 106401 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=106401 Log: Belated Makefile.in checkin for: 2005-09-19 J"orn Rennecke <joern.rennecke@st.com> Bernd Schmidt <bernds@redhat.com> PR rtl-optimization/20211 http://gcc.gnu.org/ml/gcc-patches/2005-09/msg01176.html * common.opt: Add optimize-related-values entry. * opts.c (decode_options): Set flag_optimize_related_values. * optabs.c (gen_add3_insn): If direct addition is not possible, try to move the constant into the destination register first. * regmove.c (obstack.h, ggc.h, optabs.h): Include. (related, rel_use_chain, rel_mod, rel_use): New structures. (related_baseinfo, update): Likewise. (lookup_related, rel_build_chain): New functions. (recognize_related_for_insn, record_reg_use, create_rel_use): Likewise. (new_reg_use, rel_record_mem, new_base, invalidate_related): Likewise. (find_related, find_related_toplev, chain_starts_earlier): Likewise. (chain_ends_later, mod_before, remove_setting_insns): Likewise. (perform_addition, modify_address): Likewise. (optimize_related_values_1, optimize_related_values_0): Likewise. (optimize_related_values, count_sets, link_chains): Likewise. (init_add_limits): Likewise. (REL_USE_HASH_SIZE, REL_USE_HASH, rel_alloc, rel_new): New macros. (regno_related, rel_base_list, unrelatedly_used): New variables. (related_obstack, add_limits): Likewise. (regmove_optimize): Call optimize_related_values. Include gt-regmove.h. (have_3addr_const_add): New variable. * Makefile.in (gt-regmove.h): New rule. (regmove.o): Depend on $(OPTABS_H) and gt-regmove.h. (GTFILES): Add regmove.c. * doc/invoke.texi: Document -foptimize-related-values. Modified: branches/sh-elf-4_1-branch/gcc/Makefile.in Created attachment 12718 [details]
add-on patch
I've found that this patch is also necessary to avoid invalid transformations.
This should be fixed when the dataflow branch is merged. There is a new pass there dedicated to generating auto-inc/dec insns. If the pass in auto-inc-dec.c on the dataflow branch does not fix this issue, a proper fix for this bug will have to be implemented there instead of in regmove. Just removing patch keyword as the patch is no longer applies after the dataflow branch merge. No 4.3 pending patch anymore. Maybe this issue migrated into PR31849? (In reply to comment #38) > Maybe this issue migrated into PR31849? Not entirely, PR31849 is tree-optimization, and a lot of auto-increment opportunities are only visible at the rtl level. Just hit this PR by accident. I wonder how many address mode related PRs are hanging around there... I hope that the AMS pass will help. The current branch is https://github.com/erikvarga/gcc After 12 years of changes, the value of the patches in this bug report, and indeed the bug report itself, is difficult to gauge. If this is still an issue, and that issue isn't covered by one of the more specific auto-inc-dec issues, please open a new PR with a test case and with the expected code. For now, wontfix. |