Bug 20070 - If-conversion can't match equivalent code, and cross-jumping only works for literal matches
If-conversion can't match equivalent code, and cross-jumping only works for l...
Status: ASSIGNED
Product: gcc
Classification: Unclassified
Component: rtl-optimization
3.3
: P2 enhancement
: ---
Assigned To: Steven Bosscher
: missed-optimization, patch
Depends on:
Blocks: 29842 40730 17652 21803 23111 28177 30905 4.4pending 37515 4.7pending
  Show dependency treegraph
 
Reported: 2005-02-19 03:29 UTC by Jorn Wolfgang Rennecke
Modified: 2011-03-17 16:01 UTC (History)
8 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2005-02-19 05:04:56


Attachments
incremental patch to prevent invalid moves during cross-jump (2.39 KB, patch)
2005-03-16 15:35 UTC, Jorn Wolfgang Rennecke
Details | Diff
Incremental patch to fix problem found by valgrind. (1.85 KB, patch)
2005-04-18 15:48 UTC, Jorn Wolfgang Rennecke
Details | Diff
testcase (1.19 KB, text/plain)
2005-07-08 16:35 UTC, Jorn Wolfgang Rennecke
Details
incremental patch (1.83 KB, patch)
2005-07-08 17:37 UTC, Jorn Wolfgang Rennecke
Details | Diff
Incremental patch for input conflict problem (1.69 KB, patch)
2005-07-11 21:43 UTC, Jorn Wolfgang Rennecke
Details | Diff
Don't reorder inputs unless there is a checkpoint. (1.84 KB, patch)
2005-07-15 13:42 UTC, Jorn Wolfgang Rennecke
Details | Diff
Incremental patch to properly check for exactly one input (394 bytes, patch)
2005-07-27 14:02 UTC, Jorn Wolfgang Rennecke
Details | Diff
Patch agains r148322, works pre-RA only (14.48 KB, patch)
2009-06-14 19:54 UTC, Steven Bosscher
Details | Diff
Patch agains r155731, works pre-RA only (14.79 KB, patch)
2010-01-09 21:27 UTC, Steven Bosscher
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jorn Wolfgang Rennecke 2005-02-19 03:29:03 UTC
If-conversion currently lacks the ability to merge two
basic blocks that are identical except for an input register
that can be set up using a conditional move, and possibly some
different local registers.

Cross-jumping lacks the ability to recognize that two blocks are
equivalent when there are different local registers, and (for -Os)
to find opportunities where setting up input registers will allow
to do cross jumping.

The problem has been discussed here:
http://gcc.gnu.org/ml/gcc-patches/2004-01/msg03281.html
http://gcc.gnu.org/ml/gcc-patches/2005-01/msg00672.html

A patch against 4.0 20050218 has been posted here:
http://gcc.gnu.org/ml/gcc-patches/2005-02/msg01066.html
Comment 1 Andrew Pinski 2005-02-19 05:04:56 UTC
Confirmed based on RTH's comments to the patch.
Comment 2 Jorn Wolfgang Rennecke 2005-02-21 15:44:32 UTC
regression testing of the patch in gcc 4.0 20050218 was successful.
Comment 3 Jorn Wolfgang Rennecke 2005-02-21 16:38:27 UTC
While merging the patch in a 3.4.3 based branch, I noticed three small issues:

- The rest_of_handle_life prototype in passes.c is not necessary.  It is a
  vestige from a previous (more intrusive) attempt to provide life
  information to if_convert.
- In passes.c:rest_of_handle_if_conversion, at the end of the
  flag_expensive_optimizations code, EXIT_BLOCK_PTR->global_live_at_start
  has to be cleared for compatibility with this patch:

  2004-05-17  J"orn Rennecke <joern.rennecke@superh.com>

        * cse.c (trivially_dead_nonlocal_regs): New variable.
        (note_dead_set): New function.
        (delete_trivially_dead_insns): If life info is available, update it.

  For completeness, we might also clear ENTRY_BLOCK_PTR->global_live_at_end.
- config/pa/pa.c:pa_commutative_p is missing the parameter list:
  (rtx x, int outer_code)
Comment 4 Jorn Wolfgang Rennecke 2005-03-02 22:06:29 UTC
The recog.c / recog.h part of the patch has been committed as part
of another patch:
http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00133.html
Comment 5 Andrew Pinski 2005-03-03 02:44:37 UTC
(In reply to comment #4)
> The recog.c / recog.h part of the patch has been committed as part
> of another patch:
> http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00133.html

Yes this also caused a regression on ppc-darwin (it might also be reproducable with ppc-elf and ppc-
linux).
Comment 6 joern.rennecke@st.com 2005-03-03 14:59:07 UTC
Subject: Re:  If-conversion can't match equivalent code, and cross-jumping only works for literal matches

pinskia at gcc dot gnu dot org wrote:

>------- Additional Comments From pinskia at gcc dot gnu dot org  2005-03-03 02:44 -------
>(In reply to comment #4)
>  
>
>>The recog.c / recog.h part of the patch has been committed as part
>>of another patch:
>>http://gcc.gnu.org/ml/gcc-patches/2005-03/msg00133.html
>>    
>>
>
>Yes this also caused a regression on ppc-darwin (it might also be reproducable with ppc-elf and ppc-
>linux).
>
>  
>
I've tried ppc-eabisim, but can't reproduce a failure there building gcc 
or newlib.  powerpc-apple-darwin7.4.0
wants lots of include files which I don't have.  Could you send me a 
preprocessed file so that I can test my patch
against that?
Comment 7 Jorn Wolfgang Rennecke 2005-03-16 15:35:15 UTC
Created attachment 8403 [details]
incremental patch to prevent invalid moves during cross-jump
Comment 8 Jorn Wolfgang Rennecke 2005-04-12 18:56:54 UTC
PR and patch predate gcc 4.0 branch.
Comment 9 Jorn Wolfgang Rennecke 2005-04-18 15:48:13 UTC
Created attachment 8677 [details]
Incremental patch to fix problem found by valgrind.
Comment 10 Jorn Wolfgang Rennecke 2005-06-01 11:59:52 UTC
For the purposes of PR20070, it is sufficient that PR21767 is fixed on mainline.
Comment 11 Jorn Wolfgang Rennecke 2005-07-07 18:29:43 UTC
An updated patch is here:
http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00482.html
Comment 12 Jorn Wolfgang Rennecke 2005-07-08 16:35:43 UTC
Created attachment 9231 [details]
testcase

The jave/parse.c from the current sh-elf-4_1-branch is miscompiled
at -march=i686 -O2 when the most recently posted patch is applied.
This testcase shows the pruned-down function; the result of not_accessible_p
is ignored.
Comment 13 Jorn Wolfgang Rennecke 2005-07-08 17:37:56 UTC
Created attachment 9232 [details]
incremental patch

The problem was that at the end of the insn processing loop, cancel_changes
cancelled any changes that had not been committed while processing an
equivalence.
The wider issue here is that we potentially built up a large number of changes
when processing lots of insns.	This is fixed in the attached patch by changing

struct_equiv_improve_checkpoint so that whenever it commits to the current set
of changes, it confirms or cancels them.
Comment 14 Jorn Wolfgang Rennecke 2005-07-11 21:43:29 UTC
Created attachment 9245 [details]
Incremental patch for input conflict problem

sh-elf regression testing in the sh-elf-4_1-branch revealed another problem:
When struct_equiv_block_eq is run a second time, and
struct_equiv_improve_checkpoint resolves an input conflict,
check_input_conflict
is reset at the end of struct_equiv_block_eq.  This causes the final phase not
to resolve input conflicts.  Fixed in attched patch with the addition of the
had_input_conflict field.  Visual inspection of the code also showed that there

was a potential problem when an input conflict is resolved in
struct_equiv_improve_checkpoint, but the checkpoint is not actually taken;
then confirmed inputs could be mixed with ones that are rolled back later.
Fixed with a small code rearrangent.
Comment 15 Jorn Wolfgang Rennecke 2005-07-12 13:14:13 UTC
An updated patch is here:
http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00843.html
Comment 16 CVS Commits 2005-07-12 13:30:29 UTC
Subject: Bug 20070

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	sh-elf-4_1-branch
Changes by:	amylaar@gcc.gnu.org	2005-07-12 13:29:58

Modified files:
	gcc            : ChangeLog basic-block.h cfgcleanup.c ifcvt.c 
	                 recog.c recog.h 

Log message:
	PR rtl-optimization/20070
	http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00843.html
	* basic-block.h (STRUCT_EQUIV_START, STRUCT_EQUIV_RERUN): Define.
	(STRUCT_EQUIV_FINAL, STRUCT_EQUIV_MAX_LOCAL): Likewise.
	(struct struct_equiv_checkpoint, struct equiv_info): Likewise.
	(struct_equiv_block_eq): Declare.
	* cfgcleanup.c (reload.h, expr.h): #include.
	(IMPOSSIBLE_MOVE_FACTOR): Define.
	(flow_find_cross_jump): Remove.
	(assign_reg_reg_set, struct_equiv, struct_equiv_set): New functions.
	(struct_equiv_dst_mem, struct_equiv_make_checkpoint): Likewise.
	(struct_equiv_improve_checkpoint): Likewise.
	(struct_equiv_restore_checkpoint, struct_equiv_death): Likewise.
	(struct_equiv_block_eq): Likewise.
	(find_dying_inputs, resolve_input_conflict): Likewise.
	(try_crossjump_to_edge): Use struct_equiv_block_eq instead of
	flow_find_cross_jump.
	* ifcvt.c (noce_try_complex_cmove): New function.
	(noce_process_if_block): Call it.
	For flag_expensive_optimizations, update cond_exec_changed_p on
	success.
	(rest_of_handle_if_conversion): For flag_expensive_optimizations,
	provide if_convert with register lifeness info.
	
	Back out this change:
	2005-03-07  Kazu Hirata  <kazu@cs.umass.edu>
	* recog.c (verify_changes): Make it static.
	* recog.h: Remove the corresponding prototype.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=2.8142.2.17&r2=2.8142.2.18
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/basic-block.h.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.246.2.2&r2=1.246.2.3
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cfgcleanup.c.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.143.2.1&r2=1.143.2.2
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ifcvt.c.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.184.2.1&r2=1.184.2.2
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/recog.c.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.221.4.1&r2=1.221.4.2
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/recog.h.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.55.4.1&r2=1.55.4.2

Comment 17 Jorn Wolfgang Rennecke 2005-07-15 13:42:28 UTC
Created attachment 9280 [details]
Don't reorder inputs unless there is a checkpoint.

In reply #14 I wrote:
> Visual inspection of the code also showed that there
> was a potential problem when an input conflict is resolved in
> struct_equiv_improve_checkpoint, but the checkpoint is not actually taken;
> then confirmed inputs could be mixed with ones that are rolled back later.
> Fixed with a small code rearrange[me]nt.

But somehow that bit got lost.	Here it is again, along with a change to
resolve_input_conflict to make sure it doesn't leave the inputs in a different
order if it can't resolve the conflict.
Comment 18 Jorn Wolfgang Rennecke 2005-07-15 14:07:39 UTC
An updated patch is here:
http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01041.html
Comment 19 CVS Commits 2005-07-15 14:25:35 UTC
Subject: Bug 20070

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	sh-elf-4_1-branch
Changes by:	amylaar@gcc.gnu.org	2005-07-15 14:25:24

Modified files:
	gcc            : cfgcleanup.c ChangeLog 

Log message:
	Update this patch:
	2005-07-14  J"orn Rennecke <joern.rennecke@st.com>
	
	PR rtl-optimization/20070
	http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01041.html
	* basic-block.h (STRUCT_EQUIV_START, STRUCT_EQUIV_RERUN): Define.
	(STRUCT_EQUIV_FINAL, STRUCT_EQUIV_MAX_LOCAL): Likewise.
	(struct struct_equiv_checkpoint, struct equiv_info): Likewise.
	(struct_equiv_block_eq): Declare.
	* cfgcleanup.c (reload.h, expr.h): #include.
	(IMPOSSIBLE_MOVE_FACTOR): Define.
	(flow_find_cross_jump): Remove.
	(assign_reg_reg_set, struct_equiv, struct_equiv_set): New functions.
	(struct_equiv_dst_mem, struct_equiv_make_checkpoint): Likewise.
	(struct_equiv_improve_checkpoint): Likewise.
	(struct_equiv_restore_checkpoint, struct_equiv_death): Likewise.
	(struct_equiv_block_eq): Likewise.
	(find_dying_inputs, resolve_input_conflict): Likewise.
	(try_crossjump_to_edge): Use struct_equiv_block_eq instead of
	flow_find_cross_jump.
	* ifcvt.c (noce_try_complex_cmove): New function.
	(noce_process_if_block): Call it.
	For flag_expensive_optimizations, update cond_exec_changed_p on
	success.
	(rest_of_handle_if_conversion): For flag_expensive_optimizations,
	provide if_convert with register lifeness info.
	
	Back out this change:
	2005-03-07  Kazu Hirata  <kazu@cs.umass.edu>
	* recog.c (verify_changes): Make it static.
	* recog.h: Remove the corresponding prototype.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cfgcleanup.c.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=1.143.2.3&r2=1.143.2.4
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=sh-elf-4_1-branch&r1=2.8142.2.22&r2=2.8142.2.23

Comment 20 Jorn Wolfgang Rennecke 2005-07-27 14:02:47 UTC
Created attachment 9371 [details]
Incremental patch to properly check for exactly one input

Regression testing of sh64-elf in mainline from 20050722 found a regression
of gcc.c-torture/execute/20000715-1.c at -O2.
The attached one-line incremental patch fixes this.
Comment 21 Steven Bosscher 2005-12-01 20:17:41 UTC
For Bug 21803 we could use similar infrastructure to what is proposed for this bug.
Comment 22 Jorn Wolfgang Rennecke 2005-12-07 13:31:46 UTC
Subject: Bug 20070

Author: amylaar
Date: Wed Dec  7 13:31:41 2005
New Revision: 108164

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108164
Log:
2005-12-07  J"orn Rennecke <joern.rennecke@st.com>

	Preparation for PR rtl-optimization/20070 / part1
	* basic-block.h (insns_match_p, flow_find_cross_jump): Declare.
	* cfgcleanup.c (condjump_equiv_p): New function, broken out of
	outgoing_edges_match.
	(outgoing_edges_match): Use condjump_equiv_p.
	(merge_memattrs, insns_match_p, flow_find_cross_jump): Move from here
	into..
	* struct-equiv.c: New file.
	(death_notes_match_p) New function, broken out of insns_match_p.
	* Makefile.in (OBJS-common): Add struct-equiv.o.
	(struct-equiv.o): New target.

Added:
    trunk/gcc/struct-equiv.c
      - copied, changed from r107723, trunk/gcc/cfgcleanup.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/basic-block.h
    trunk/gcc/cfgcleanup.c

Comment 23 Jorn Wolfgang Rennecke 2005-12-13 13:04:29 UTC
Subject: Bug 20070

Author: amylaar
Date: Tue Dec 13 13:04:18 2005
New Revision: 108480

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108480
Log:
	PR rtl-optimization/20070 / part1
	* flow.c (update_life_info): If PROP_POST_REGSTACK is set, call
	count_or_remove_death_notes with kill == -1.
	(mark_set_1): Don't add REG_DEAD / REG_UNUSED notes for stack
	registers if PROP_POST_REGSTACK is set.
	(mark_used_reg): Likewise.
	(count_or_remove_death_notes): If kill is -1, don't remove REG_DEAD /
	REG_UNUSED notes for stack regs.
	* cfgcleanup.c (condjump_equiv_p): Change parameters and processing
	to match rtx_equiv_p machinery.  Change caller.
	(outgoing_edges_match): Likewise.
	(try_crossjump_to_edge): Use struct_equiv_block_eq
	instead of flow_find_cross_jump.
	* basic-block.h (PROP_POST_REGSTACK, STRUCT_EQUIV_START): Define.
	(STRUCT_EQUIV_RERUN, STRUCT_EQUIV_FINAL): Likewise.
	(STRUCT_EQUIV_NEED_FULL_BLOCK, STRUCT_EQUIV_MATCH_JUMPS): Likewise.
	(STRUCT_EQUIV_MAX_LOCAL): Likewise.
	(struct struct_equiv_checkpoint, struct equiv_info): Likewise.
	(insns_match_p): Update prototype.
	(flow_find_cross_jump): Remove prototype.
	(struct_equiv_block_eq, struct_equiv_init): Declare.
	(rtx_equiv_p, condjump_equiv_p): Likewise.
	* struct-equiv.c: Include reload.h.
	(IMPOSSIBLE_MOVE_FACTOR): Define.
	(assign_reg_reg_set, struct_equiv_make_checkpoint): New functions.
	(struct_equiv_improve_checkpoint): Likewise.
	(struct_equiv_restore_checkpoint, rtx_equiv_p): Likewise.
	(set_dest_equiv_p, set_dest_addr_equiv_p, struct_equiv_init): Likewise.
	(struct_equiv_merge, find_dying_input): Likewise.
	(resolve_input_conflict, note_local_live): Likewise.
	(death_notes_match_p): Change parameters and processing
	to match rtx_equiv_p machinery.  Change caller.
	(insns_match_p): Likewise.
	(flow_find_cross_jump): Replace with:
	(struct_equiv_block_eq).

	Back out this change:
	2005-03-07  Kazu Hirata  <kazu@cs.umass.edu>
          * recog.c (verify_changes): Make it static.
          * recog.h: Remove the corresponding prototype.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/basic-block.h
    trunk/gcc/cfgcleanup.c
    trunk/gcc/flow.c
    trunk/gcc/recog.c
    trunk/gcc/recog.h
    trunk/gcc/struct-equiv.c

Comment 24 Jorn Wolfgang Rennecke 2005-12-19 14:37:03 UTC
Subject: Bug 20070

Author: amylaar
Date: Mon Dec 19 14:36:59 2005
New Revision: 108792

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108792
Log:
	* cfgcleanup.c: Temporarily revert patches for PR 20070 till Bernd
	comes back.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgcleanup.c

Comment 25 Paolo Bonzini 2006-03-21 08:27:48 UTC
can we close this?
Comment 26 Jorn Wolfgang Rennecke 2006-03-21 14:24:13 UTC
(In reply to comment #25)
> can we close this?
> 
No, we still have to actually re-enable the cross-jumping code,
and we haven't even gotten to the review of the if-conversion part.

Comment 27 Jorn Wolfgang Rennecke 2006-11-17 18:58:24 UTC
We've been using this patch:
http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01488.html
successfully in a 4.1.1 based compiler.

It is not optimal, though: the check for equality of the regsets in struct_equiv_init is unnecessary, and it can also lead skipping possible
optimizations.  The frequent live updates are also not necessary.
This optimization can work with register life info that is inexact, as long
as it is conservative, i.e. it may flag registers as live that are not, but
it most not pretend that live registers are dead.

Note also that re-integration of the if-conversion improvements was not
within the scope of this patch, i.e. this is still an outstanding issue.
Comment 28 Steven Bosscher 2009-06-14 19:54:24 UTC
Created attachment 17995 [details]
Patch agains r148322, works pre-RA only

Joern's original ifcvt.c patch only dealt with pre-reload if-conversion.  The subsequent changes to make struct-equiv work for crossjumping and after reload, made the code too complicated IMHO.

So I've gone back to the roots of the patch.  I've simplified things a bit -- mostly by using the DF machinery.  This new attached patch is far from complete though.  The struct-equiv code should use rtx_equal_p_cb, but the rtx_equal_p_cb needs to be modified first (to be more like for_each_rtx: 3-state and passing around a pointer to auxiliary data).  The local_reg_p stuff should probably go into df-problems.c as a _p function.  And so on.

But the patch does work.  I wanted to let folks now that this bug is not yet forgotten!
Comment 29 Rahul Kharche 2009-09-04 14:51:28 UTC
I am testing Steven's Crossjumping patch attached here. With CoreMark we see a 1% increase in performance when using Os. Other proprietary tests show ~0.5% decrease in code size.

The path however does not fix PR30905.
Comment 30 Steven Bosscher 2010-01-08 17:08:01 UTC
Comment on attachment 17995 [details]
Patch agains r148322, works pre-RA only

The ifcvt.c part of the patch does not work anymore, because it doesn't handle DEBUG_INSNs properly. I am working on an update.
Comment 31 Steven Bosscher 2010-01-09 21:27:00 UTC
Created attachment 19525 [details]
Patch agains r155731, works pre-RA only

Bootstrapped&tested on ia64-unknown-linux-gnu.
Comment 32 Rahul Kharche 2010-01-11 12:34:13 UTC
I will re-test on our port and report my findings, cheers!
Comment 33 Steven Bosscher 2010-04-23 07:38:37 UTC
*** Bug 43864 has been marked as a duplicate of this bug. ***