Bug 20211 - autoincrement generation is poor
Summary: autoincrement generation is poor
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 2.95
: P2 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on: 19078
Blocks: 29842 17652 23111
  Show dependency treegraph
 
Reported: 2005-02-25 16:19 UTC by Jorn Wolfgang Rennecke
Modified: 2013-12-24 22:52 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-12-24 00:00:00


Attachments
fix discover_flags_reg & garbage collection (1.08 KB, patch)
2005-04-28 18:37 UTC, Jorn Wolfgang Rennecke
Details | Diff
baseline sizes (7.46 KB, text/plain)
2005-05-12 18:52 UTC, Jorn Wolfgang Rennecke
Details
baseline times (8.93 KB, text/plain)
2005-05-12 18:52 UTC, Jorn Wolfgang Rennecke
Details
Sizes with patch for PR20211 (7.46 KB, text/plain)
2005-05-12 18:54 UTC, Jorn Wolfgang Rennecke
Details
times with patch for PR20211 applied (9.02 KB, text/plain)
2005-05-12 18:55 UTC, Jorn Wolfgang Rennecke
Details
This is the patch I tested with CSiBE (22.66 KB, patch)
2005-05-12 19:11 UTC, Jorn Wolfgang Rennecke
Details | Diff
sh base sizes (7.40 KB, text/plain)
2005-05-16 19:07 UTC, Jorn Wolfgang Rennecke
Details
sh sizes with optimize_related_values (7.39 KB, text/plain)
2005-05-16 19:11 UTC, Jorn Wolfgang Rennecke
Details
sh base sizes for 20050517 baseline (7.38 KB, text/plain)
2005-05-18 09:18 UTC, Jorn Wolfgang Rennecke
Details
sh sizes with 20050517 patch (7.38 KB, text/plain)
2005-05-18 09:22 UTC, Jorn Wolfgang Rennecke
Details
ppc sizes with 20050517 baseline (7.46 KB, text/plain)
2005-05-18 12:11 UTC, Jorn Wolfgang Rennecke
Details
ppc sizes with 20050517 patch (7.47 KB, text/plain)
2005-05-18 12:25 UTC, Jorn Wolfgang Rennecke
Details
ppc sizes with 20050517 patch & 20050518 amendment (7.47 KB, text/plain)
2005-05-18 17:12 UTC, Jorn Wolfgang Rennecke
Details
add-on patch (924 bytes, patch)
2006-11-30 20:04 UTC, Jorn Wolfgang Rennecke
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jorn Wolfgang Rennecke 2005-02-25 16:19:13 UTC
When a processor does not allow register+offset addressing for a register
class, as for the floating point registers on the SH3E / SH4, the way to
avoid excessive reloads and to expose the issue to the rtl optimizers is
to disallow this addressing mode for the machine modes for which pseudo
registers are usually allocated to the register class in question, i.e.
SFmode / DFmode in our example.
Thus, when there is a structure access to a member with such a mode and a
non-zero offset, the address cannot be expressed directly, and thus,
during rtl expansion, the sum is calculated into a pseudo register first.
cse typically places these additions together at the start of a basic
block; the idea there is that we might find some cse opportunities, and
if not, combine can do something with these sums.  However, that doesn't
work when these sums are used as addresses and the machine mode of the
access does not allow reg+offset addressing.  The auto-increment generation
is flow can't do anything useful with these sums either, since the auto-inc
generation in flow only looks for cases where a memory access already matches
an exitsing add.  Thus we end up with lots of adds and sky-high register
pressure.  On two-address machines, there i an added problem that the
adds are so arranged that (at least, not counting reloads...) a
two-instruction sequence is needed to do the additions.

What is required is an optimization pass that finds all the uses of a sum
of a base register and an offset in a basic block, and figures out where an
auto-increment addressing mode can be profitably used, and also to reduce the
register pressure and number of reg-reg copies.

A patch against 4.0 20050218 is here:
http://gcc.gnu.org/ml/gcc-patches/2005-02/msg01612.html
Comment 1 Giovanni Bajo 2005-02-25 18:59:52 UTC
What is the compile-time impact of this patch on cc-i compilation, the usual 
C++ testcases, and SPEC? I am sure this is something worthwile to mention for 
a review.

And BTW, out of curiosity, does the new pass have to live in regmove.c?
Comment 2 Hans-Peter Nilsson 2005-02-26 12:56:30 UTC
(Interested because I see this for CRIS too.  For CRIS v32, it's even more
interesting, because it has no reg+offset addressing.)
Comment 3 joern.rennecke@st.com 2005-02-28 13:46:36 UTC
Subject: Re:  autoincrement generation is poor

giovannibajo at libero dot it wrote:

>------- Additional Comments From giovannibajo at libero dot it  2005-02-25 18:59 -------
>What is the compile-time impact of this patch on cc-i compilation, the usual 
>C++ testcases, and SPEC? I am sure this is something worthwile to mention for 
>a review.
>  
>

Sorry, I don't have such numbers at the moment.  Note that this 
optimization is specific to
AUTO_INC_DEC targets (in principle it is possible to use it to reduce 
register pressure on
other targets too, but there is also a cost to pay in reduced 
flexibility for sched2 till you make
the scheduler aware of the possible transformations).
Note that there is a specific flag to enable or disable this 
optimization, so for each target this
can be tuned by OPTIMIZATION_OPTIONS.  I made this switch default-on at 
-O2 mainly
because it makes testing the patch much simpler.  For integration, I'm 
also fine with having it
default-off and having only the targets that want it turn it on in 
OPTIMIZATION_OPTIONS.

>And BTW, out of curiosity, does the new pass have to live in regmove.c?
>  
>
Not absolutely, but it makes sense insofar as an aspect of this 
optimization is to avoid register-register
moves  which are required when you end up with awkward register 
assignments for the addds,
and it is also nice to have sched1 run first. And there is also the 
issue fo inserting add instructions,
which needs infrastructure from regmove to avoid issues with targets 
that have a flags register that
can be clobbered by the new adds.
For the c4x, it would make sense to run this pass earlier, as detailed 
in the 1999 thread.

Comment 4 Giovanni Bajo 2005-03-15 02:12:21 UTC
(In reply to comment #3)

>> What is the compile-time impact of this patch on cc-i compilation, the usual 
>> C++ testcases, and SPEC? I am sure this is something worthwile to
>> mention for a review.

> Sorry, I don't have such numbers at the moment. Note that this 
> optimization is specific to AUTO_INC_DEC targets 

There are both primary and secondary platforms among the AUTO_INC_DEC targets. 
So it is probably good to gain some wider test coverage about the compile-
time/run-time impact of this. E.g. testing CSIBE on ARM, or SPEC on RS6000.

> I made this switch default-on at 
> -O2 mainly because it makes testing the patch much simpler.  
> For integration, I'm also fine with having it
> default-off and having only the targets that want it turn it on in 
> OPTIMIZATION_OPTIONS.

IMHO it is better to activate new optimization passes at either -O2 or -O3, 
depending on their impact on compile time and code generation (see above). So I 
am fine with that once it is proven that this is doing well.

> >And BTW, out of curiosity, does the new pass have to live in regmove.c?

> Not absolutely, but it makes sense insofar as an aspect of this 
> optimization is to avoid register-register
> moves  which are required when you end up with awkward register 
> assignments for the addds,
> and it is also nice to have sched1 run first. And there is also the 
> issue fo inserting add instructions,
> which needs infrastructure from regmove to avoid issues with targets 
> that have a flags register that
> can be clobbered by the new adds.

One of the common complaints of RTL code is that monolithic files containing 
multiple optimization passes (with interwinded functions) and with undocumented 
public interface is a maintenance mess. Compare that with tree-ssa where each 
pass lives in its own small[1] file and just exposes the pass description 
structure.

I believe it would be great if the new pass could be live in its own file. The 
utility functions currently in regmove could be extracted in regmovelib or 
whatever name fits best. 

[1] ok, let's ignore ivopts for the sake of this discussion ;)
Comment 5 Jorn Wolfgang Rennecke 2005-03-15 18:08:41 UTC
> A patch against 4.0 20050218 is here:
> http://gcc.gnu.org/ml/gcc-patches/2005-02/msg01612.html

discover_flags_reg also needs to be updated to understand INSNs.

Comment 6 Jorn Wolfgang Rennecke 2005-03-17 16:37:30 UTC
(In reply to comment #4)
> There are both primary and secondary platforms among the AUTO_INC_DEC targets. 
> So it is probably good to gain some wider test coverage about the compile-
> time/run-time impact of this. E.g. testing CSIBE on ARM, or SPEC on RS6000.

I currently don't have an ARM or RS6000 platform to run benchmarks on.
We are looking into setting up a powermac for this task, though.
AFAICS RS6000 provides register+offset addressing for all modes except the
altivec vector modes, so to see any significant benefit from
optimize_related_values, the benchmark should have multiple altivec memory
accesses from the same structure / array in a single basic block.

> One of the common complaints of RTL code is that monolithic files containing 
> multiple optimization passes (with interwinded functions) and with undocumented 
> public interface is a maintenance mess. Compare that with tree-ssa where each 
> pass lives in its own small[1] file and just exposes the pass description 
> structure.
> 
> I believe it would be great if the new pass could be live in its own file. The 
> utility functions currently in regmove could be extracted in regmovelib or 
> whatever name fits best.

I am willing to make that change, but would first like to hear from a global
or middle-end maintainer that they agree that this is a move in the right direction.
Also, there are some details that should be agreed on first.
I.e.:
Should tehre remain a toplevel regmove function that calls 
mark_flags_life_zones first, and then the sub-passes, or should
each sub-pass call mark_flags_life_zones?
If there is such a toplevel functiojn, would it live in the regmove*
files, or would that be rest_of_handle_regmove in passes.c ?
Where does the combine_stack_adjustments pass belong?  Should it
be in flow.c, or in it's own file (combine_stack_adj.c ?)
Comment 7 Jorn Wolfgang Rennecke 2005-04-12 18:06:47 UTC
This code is waiting for review since 1999:

http://gcc.gnu.org/ml/gcc-patches/1999-11n/msg00583.html
Comment 8 Jorn Wolfgang Rennecke 2005-04-27 13:37:46 UTC
(In reply to comment #4)
> There are both primary and secondary platforms among the AUTO_INC_DEC targets. 
> So it is probably good to gain some wider test coverage about the compile-
> time/run-time impact of this. E.g. testing CSIBE on ARM, or SPEC on RS6000.

I have been trying to install Linux/GNU on an POwer Macintosh G3, but it
appears that this is not going to work.  Inserting a Yellowdog 3.0.1
install 1 CD-rom into the drive freezes the machine solid.  BootX versions
later than BootX_1.1.3 won't unpack.  BootX 1.1.3 freezes the machine about
60% of the time rather than starting a kernel.  rocklinux gets a CRC error
on the ramdisk and then can't find init.  ubuntu manages to continue after
ramdisk unpacking errors, and even goes as far as partitioning if you
restrict the partitions to 13 GB or less, but then it say it can't find the
CD_ROM it just has read lots of files from.

So, since there is no native platform I can test, would you like me to
benchmark cross-compiling a suitable package?  IIRC powerpc-eabisim is
a possible cross target.  Would CSIBE cross-compile time measurements
fit your requirements?
Comment 9 Andrew Pinski 2005-04-27 13:41:56 UTC
(In reply to comment #8)
> (In reply to comment #4)
> > There are both primary and secondary platforms among the AUTO_INC_DEC targets. 
> > So it is probably good to gain some wider test coverage about the compile-
> > time/run-time impact of this. E.g. testing CSIBE on ARM, or SPEC on RS6000.
> 
> I have been trying to install Linux/GNU on an POwer Macintosh G3, but it
> appears that this is not going to work.  Inserting a Yellowdog 3.0.1
> install 1 CD-rom into the drive freezes the machine solid.  BootX versions
> later than BootX_1.1.3 won't unpack.  BootX 1.1.3 freezes the machine about
> 60% of the time rather than starting a kernel.  rocklinux gets a CRC error
> on the ramdisk and then can't find init.  ubuntu manages to continue after
> ramdisk unpacking errors, and even goes as far as partitioning if you
> restrict the partitions to 13 GB or less, but then it say it can't find the
> CD_ROM it just has read lots of files from.

Mac OS X and darwin works on the G3, just fine.
Comment 10 Jorn Wolfgang Rennecke 2005-04-28 13:36:52 UTC
(In reply to comment #9)
> Mac OS X and darwin works on the G3, just fine.

The Mac OS X tiger system requirements say that it needs built-in firewire.
This Mac doesn't have firewire.  It also came with 32 MB RAM originally,
although it has been upgraded to 384 MB.

Comment 11 Andrew Pinski 2005-04-28 13:42:59 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > Mac OS X and darwin works on the G3, just fine.
> 
> The Mac OS X tiger system requirements say that it needs built-in firewire.
> This Mac doesn't have firewire.  It also came with 32 MB RAM originally,
> although it has been upgraded to 384 MB.

Oh, it is a Yikes, oh well.
Comment 12 Jorn Wolfgang Rennecke 2005-04-28 18:37:29 UTC
Created attachment 8758 [details]
fix discover_flags_reg & garbage collection

This patch fixes the bitrot of discover_flags_regs which prevents
the optimize_related_values optimization - and some older parts of
regmove too - from actually doing anything.
It also includes regmove.c into the garbage collector header mechanism.
Comment 13 Jorn Wolfgang Rennecke 2005-04-28 19:21:45 UTC
An updated patch is here:
http://gcc.gnu.org/ml/gcc-patches/2005-04/msg02898.html
Comment 14 Jorn Wolfgang Rennecke 2005-05-12 18:52:20 UTC
Created attachment 8875 [details]
baseline sizes

These are ppc-eabisim CSiBE sizes for a 2005-05-11 snapshot.
Comment 15 Jorn Wolfgang Rennecke 2005-05-12 18:52:58 UTC
Created attachment 8876 [details]
baseline times
Comment 16 Jorn Wolfgang Rennecke 2005-05-12 18:54:32 UTC
Created attachment 8877 [details]
Sizes with patch for PR20211
Comment 17 Jorn Wolfgang Rennecke 2005-05-12 18:55:46 UTC
Created attachment 8878 [details]
times with patch for PR20211 applied
Comment 18 Andrew Pinski 2005-05-12 18:59:50 UTC
The only change is the following:
before:
bzip2-1.0.2,compress,9408
bzip2-1.0.2,decompress,10604
after:
bzip2-1.0.2,compress,9428
bzip2-1.0.2,decompress,10640
Comment 19 Jorn Wolfgang Rennecke 2005-05-12 19:11:35 UTC
Created attachment 8879 [details]
This is the patch I tested with CSiBE

I had tested this together with the patch for
PR rtl-optimization/20756 because I had sh-elf build
problems with aborts in sched-rgn.c.  It turned out to
be a an issue of REG_NOTES interaction with fixup_match_1, though.
Another difference to previous patches was that when setting
last_base_insn after perform_addition, I had to add some code to
actually find the insns where the base register dies in case the
addition is performed in multiple steps.

All in all, the differences seem to be only noise for ppc-eabisim.
The net size difference is four bytes in favour of the unpatched version;
I don't know where I should begin to get a total for the times; there are
probably too unreliable for the small differences anyway.
As said before the powerpc does not benefit from this optimization because
it already has register + offset addressing, unless you bring altivec modes
into the picture.
Comment 20 joern.rennecke@st.com 2005-05-12 19:30:56 UTC
Subject: Re:  autoincrement generation is poor

pinskia at gcc dot gnu dot org wrote:

>------- Additional Comments From pinskia at gcc dot gnu dot org  2005-05-12 18:59 -------
>The only change is the following:
>before:
>bzip2-1.0.2,compress,9408
>bzip2-1.0.2,decompress,10604
>after:
>bzip2-1.0.2,compress,9428
>bzip2-1.0.2,decompress,10640
>
>  
>
Well, there were also lots of small changes.  As said before, if you add 
it all up, the net difference is just four bytes.
Comment 21 Jorn Wolfgang Rennecke 2005-05-16 19:07:48 UTC
Created attachment 8903 [details]
sh base sizes
Comment 22 Jorn Wolfgang Rennecke 2005-05-16 19:11:46 UTC
Created attachment 8904 [details]
sh sizes with optimize_related_values

The compiler was run with -m4 -ml in addition to -Os.

bash-2.05b$ tail +4 ~/sh-base-result-size.csv | awk 'FS="," {sum += $3} END {
print sum }'
3051741
bash-2.05b$ tail +4 ~/sh-orv-result-size.csv | awk 'FS="," {sum += $3} END {
print sum }'
3040971

I.e. the sizes with the optimize_related_values patch are on average a third of
a percent smaller.
Comment 23 Jorn Wolfgang Rennecke 2005-05-17 19:20:38 UTC
updated patch:
http://gcc.gnu.org/ml/gcc-patches/2005-05/msg01768.html
Comment 24 Jorn Wolfgang Rennecke 2005-05-18 09:18:52 UTC
Created attachment 8917 [details]
sh base sizes for 20050517 baseline
Comment 25 Jorn Wolfgang Rennecke 2005-05-18 09:22:40 UTC
Created attachment 8918 [details]
sh sizes with 20050517 patch

bash-2.05b$ tail +4 ~/20050517-sh-base-result-size.csv | awk 'FS="," {sum +=
$3} END {print sum }'
3051893
bash-2.05b$ tail +4 ~/20050517-sh-orv-result-size.csv | awk 'FS="," {sum += $3}
END {print sum }'
3037343

I.e. with the add_limits logic restored, we are up to 0.477% size improvement.
Comment 26 Jorn Wolfgang Rennecke 2005-05-18 12:11:37 UTC
Created attachment 8920 [details]
ppc sizes with 20050517 baseline
Comment 27 Jorn Wolfgang Rennecke 2005-05-18 12:25:29 UTC
Created attachment 8921 [details]
ppc sizes with 20050517 patch

bash-2.05b$ tail +4 ~/20050517-ppc-base-result-size.csv | awk 'FS="," {sum +=
$3} END {print sum }'
3511496
bash-2.05b$ tail +4 ~/20050517-ppc-orv-result-size.csv | awk 'FS="," {sum +=
$3} END {print sum }'
3514464
I.e. with the add_limits logic restored, we now see a 0.0845% code size
increase.
I think that can be explained by the fact that linking chains together
only works when we have non-zero add_limuts, and the main point of linking
chains is to avoid reg-reg copies.  If a three-address add is available,
starting a new chain in a new register doesn't cost extra insns, if the
register allocation stays the same.  On the other hand, linking chains
together reduces the freedom of the register allocator.
Comment 28 Jorn Wolfgang Rennecke 2005-05-18 17:12:02 UTC
Created attachment 8924 [details]
ppc sizes with 20050517 patch & 20050518 amendment

After applying this patch for i686-pc-linux-gnu bootstrap:
http://gcc.gnu.org/ml/gcc-patches/2005-05/msg01837.html
and this to recognize the availability of three-address adds:
http://gcc.gnu.org/ml/gcc-patches/2005-05/msg01863.html

The code bloat is gone:

bash-2.05b$ tail +4 ~/20050518-ppc-orv-result-size.csv	| awk 'FS="," {sum +=
$3} END {print sum }'
3511488
Comment 29 CVS Commits 2005-09-19 16:52:46 UTC
Subject: Bug 20211

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	sh-elf-4_1-branch
Changes by:	amylaar@gcc.gnu.org	2005-09-19 16:52:39

Modified files:
	gcc/doc        : invoke.texi 

Log message:
	2005-09-19  J"orn Rennecke <joern.rennecke@st.com>
	Bernd Schmidt  <bernds@redhat.com>
	
	PR rtl-optimization/20211
	* common.opt: Add optimize-related-values entry.
	* opts.c (decode_options): Set flag_optimize_related_values.
	* optabs.c (gen_add3_insn): If direct addition is not possible,
	try to move the constant into the destination register first.
	* regmove.c (obstack.h, ggc.h, optabs.h): Include.
	(related, rel_use_chain, rel_mod, rel_use): New structures.
	(related_baseinfo, update): Likewise.
	(lookup_related, rel_build_chain): New functions.
	(recognize_related_for_insn, record_reg_use, create_rel_use): Likewise.
	(new_reg_use, rel_record_mem, new_base, invalidate_related): Likewise.
	(find_related, find_related_toplev, chain_starts_earlier): Likewise.
	(chain_ends_later, mod_before, remove_setting_insns): Likewise.
	(perform_addition, modify_address): Likewise.
	(optimize_related_values_1, optimize_related_values_0): Likewise.
	(optimize_related_values, count_sets, link_chains): Likewise.
	(init_add_limits): Likewise.
	(REL_USE_HASH_SIZE, REL_USE_HASH, rel_alloc, rel_new): New macros.
	(regno_related, rel_base_list, unrelatedly_used): New variables.
	(related_obstack, add_limits): Likewise.
	(regmove_optimize): Call optimize_related_values.
	Include gt-regmove.h.
	(have_3addr_const_add): New variable.
	* Makefile.in (gt-regmove.h): New rule.
	(regmove.o): Depend on $(OPTABS_H) and gt-regmove.h.
	(GTFILES): Add regmove.c.
	* doc/invoke.texi: Document -foptimize-related-values.
	
	2005-09-19  J"orn Rennecke <joern.rennecke@st.com>
	
	* regmove.c (discover_flags_reg): Use the PATTERN of an INSN.
	
	* regmove.c (fixup_match_1): When moving a death note, check if
	it needs changing into a REG_UNUSED note.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/doc/invoke.texi.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.598.2.4&r2=1.598.2.5

Comment 30 CVS Commits 2005-09-19 16:54:16 UTC
Subject: Bug 20211

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	sh-elf-4_1-branch
Changes by:	amylaar@gcc.gnu.org	2005-09-19 16:54:05

Modified files:
	gcc            : ChangeLog common.opt optabs.c opts.c regmove.c 

Log message:
	2005-09-19  J"orn Rennecke <joern.rennecke@st.com>
	Bernd Schmidt  <bernds@redhat.com>
	
	PR rtl-optimization/20211
	* common.opt: Add optimize-related-values entry.
	* opts.c (decode_options): Set flag_optimize_related_values.
	* optabs.c (gen_add3_insn): If direct addition is not possible,
	try to move the constant into the destination register first.
	* regmove.c (obstack.h, ggc.h, optabs.h): Include.
	(related, rel_use_chain, rel_mod, rel_use): New structures.
	(related_baseinfo, update): Likewise.
	(lookup_related, rel_build_chain): New functions.
	(recognize_related_for_insn, record_reg_use, create_rel_use): Likewise.
	(new_reg_use, rel_record_mem, new_base, invalidate_related): Likewise.
	(find_related, find_related_toplev, chain_starts_earlier): Likewise.
	(chain_ends_later, mod_before, remove_setting_insns): Likewise.
	(perform_addition, modify_address): Likewise.
	(optimize_related_values_1, optimize_related_values_0): Likewise.
	(optimize_related_values, count_sets, link_chains): Likewise.
	(init_add_limits): Likewise.
	(REL_USE_HASH_SIZE, REL_USE_HASH, rel_alloc, rel_new): New macros.
	(regno_related, rel_base_list, unrelatedly_used): New variables.
	(related_obstack, add_limits): Likewise.
	(regmove_optimize): Call optimize_related_values.
	Include gt-regmove.h.
	(have_3addr_const_add): New variable.
	* Makefile.in (gt-regmove.h): New rule.
	(regmove.o): Depend on $(OPTABS_H) and gt-regmove.h.
	(GTFILES): Add regmove.c.
	* doc/invoke.texi: Document -foptimize-related-values.
	
	2005-09-19  J"orn Rennecke <joern.rennecke@st.com>
	
	* regmove.c (discover_flags_reg): Use the PATTERN of an INSN.
	
	* regmove.c (fixup_match_1): When moving a death note, check if
	it needs changing into a REG_UNUSED note.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=2.8142.2.30&r2=2.8142.2.31
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/common.opt.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.66.2.3&r2=1.66.2.4
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/optabs.c.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.268.2.4&r2=1.268.2.5
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/opts.c.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.99.2.3&r2=1.99.2.4
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/regmove.c.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.166.18.2&r2=1.166.18.3

Comment 32 Jorn Wolfgang Rennecke 2005-11-02 21:50:28 UTC
Subject: Bug 20211

Author: amylaar
Date: Wed Nov  2 21:50:22 2005
New Revision: 106401

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=106401
Log:
Belated Makefile.in checkin for:

2005-09-19  J"orn Rennecke <joern.rennecke@st.com>
	    Bernd Schmidt  <bernds@redhat.com>

	PR rtl-optimization/20211
	http://gcc.gnu.org/ml/gcc-patches/2005-09/msg01176.html
	* common.opt: Add optimize-related-values entry.
	* opts.c (decode_options): Set flag_optimize_related_values.
	* optabs.c (gen_add3_insn): If direct addition is not possible,
	try to move the constant into the destination register first.
	* regmove.c (obstack.h, ggc.h, optabs.h): Include.
	(related, rel_use_chain, rel_mod, rel_use): New structures.
	(related_baseinfo, update): Likewise.
	(lookup_related, rel_build_chain): New functions.
	(recognize_related_for_insn, record_reg_use, create_rel_use): Likewise.
	(new_reg_use, rel_record_mem, new_base, invalidate_related): Likewise.
	(find_related, find_related_toplev, chain_starts_earlier): Likewise.
	(chain_ends_later, mod_before, remove_setting_insns): Likewise.
	(perform_addition, modify_address): Likewise.
	(optimize_related_values_1, optimize_related_values_0): Likewise.
	(optimize_related_values, count_sets, link_chains): Likewise.
	(init_add_limits): Likewise.
	(REL_USE_HASH_SIZE, REL_USE_HASH, rel_alloc, rel_new): New macros.
	(regno_related, rel_base_list, unrelatedly_used): New variables.
	(related_obstack, add_limits): Likewise.
	(regmove_optimize): Call optimize_related_values.
	Include gt-regmove.h.
	(have_3addr_const_add): New variable.
	* Makefile.in (gt-regmove.h): New rule.
	(regmove.o): Depend on $(OPTABS_H) and gt-regmove.h.
	(GTFILES): Add regmove.c.
	* doc/invoke.texi: Document -foptimize-related-values.

Modified:
    branches/sh-elf-4_1-branch/gcc/Makefile.in

Comment 33 Jorn Wolfgang Rennecke 2006-11-15 20:11:26 UTC
4.x patches:
http://gcc.gnu.org/ml/gcc-patches/2006-06/msg01184.html
Comment 34 Jorn Wolfgang Rennecke 2006-11-30 20:04:56 UTC
Created attachment 12718 [details]
add-on patch

I've found that this patch is also necessary to avoid invalid transformations.
Comment 35 Steven Bosscher 2006-12-17 14:47:42 UTC
This should be fixed when the dataflow branch is merged.  There is a new pass there dedicated to generating auto-inc/dec insns.  If the pass in auto-inc-dec.c on the dataflow branch does not fix this issue, a proper fix for this bug will have to be implemented there instead of in regmove.
Comment 36 Andrew Pinski 2007-06-18 02:03:06 UTC
Just removing patch keyword as the patch is no longer applies after the dataflow branch merge.
Comment 37 Richard Biener 2007-09-17 20:06:15 UTC
No 4.3 pending patch anymore.
Comment 38 Hans-Peter Nilsson 2008-05-23 23:13:25 UTC
Maybe this issue migrated into PR31849?
Comment 39 Jorn Wolfgang Rennecke 2010-02-22 06:04:55 UTC
(In reply to comment #38)
> Maybe this issue migrated into PR31849?

Not entirely, PR31849 is tree-optimization, and a lot of auto-increment opportunities are only visible at the rtl level.