Bug 30905 - [4.3/4.4/4.5 Regression] Fails to cross-jump
Summary: [4.3/4.4/4.5 Regression] Fails to cross-jump
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.3.0
: P2 normal
Target Milestone: 4.3.5
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, ra
Depends on: 20070
Blocks: 16996 33315 24001
  Show dependency treegraph
 
Reported: 2007-02-21 10:03 UTC by Richard Biener
Modified: 2010-03-21 12:03 UTC (History)
7 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work: 4.2.0
Known to fail: 4.4.0 4.4.1 4.4.2 4.5.0
Last reconfirmed: 2009-06-11 19:49:12


Attachments
Fix using run_fast_dce (1.79 KB, patch)
2008-01-10 20:16 UTC, Steven Bosscher
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2007-02-21 10:03:59 UTC
static int a[30];
static int b[30];
int gen_int(int);
void kernel ()
{
  int i;

  i = gen_int (1);

  if (i != 0)
    {
      a[0] = a[0] + (a[0] & 3);
      b[0] = b[0] + (b[0] | 3);
    }
  else
    {
      a[0] = a[0] + (a[0] & 3);
      b[0] = b[0] + (b[0] | 3);
    }
  if (i != 1)
    {
      a[1] = a[1] + (a[1] & 3);
      b[1] = b[1] + (b[1] | 3);
    }
  else
    {
      a[1] = a[1] + (a[1] & 3);
      b[1] = b[1] + (b[1] | 3);
    }
}

is optimized by mainline to straight line code without compares and jumps at
-O2.  dataflow branch retains the comparisons with i and the duplicate
instructions.
Comment 1 Steven Bosscher 2007-02-21 20:59:58 UTC
Confirmed, we almost never do cross-jumping on the dataflow-branch anymore: only after regmove.
Comment 2 Steven Bosscher 2007-02-21 22:09:32 UTC
On the trunk, *and* on the dataflow branch, we crossjump the code starting with "if (i != 1)" on the first cleanup_cfg iteration when it's called from rest_of_handle_stack_adjustments.  Trunk then goes on to crossjump the other blocks, but the df-branch stops because there is a set to the CC-reg in the way. That set has a REG_UNUSED flag on it.

What probably happens, is that flow on the trunk does some dce in the liveness update, and the df-branch does not.
Comment 3 Steven Bosscher 2007-05-08 23:15:40 UTC
This patch would fix it, but it's brute-force and it causes a ~1.5% slowdown.  Some form of DCE a little more delicate than this will be necessary to fix this bug, though.


Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c        (revision 124550)
+++ cfgcleanup.c        (working copy)
@@ -2286,10 +2286,10 @@ cleanup_cfg (int mode)
     {
       delete_unreachable_blocks (), changed = true;
       if (!(mode & CLEANUP_NO_INSN_DEL)
-         && (mode & CLEANUP_EXPENSIVE)
-         && !reload_completed)
+         && (mode & (CLEANUP_EXPENSIVE | CLEANUP_CROSSJUMP)))
        {
-         if (!delete_trivially_dead_insns (get_insns (), max_reg_num ()))
+         gcc_assert (df);
+         if (! run_fast_dce ())
            break;
        }
       else
@@ -2343,10 +2343,9 @@ static unsigned int
 rest_of_handle_jump2 (void)
 {
   delete_trivially_dead_insns (get_insns (), max_reg_num ());
+  delete_unreachable_blocks ();
   if (dump_file)
     dump_flow_info (dump_file, dump_flags);
-  cleanup_cfg ((optimize ? CLEANUP_EXPENSIVE : 0)
-              | (flag_thread_jumps ? CLEANUP_THREADING : 0));
   return 0;
 }
Comment 4 Richard Biener 2007-06-12 19:17:25 UTC
Now in mainline.
Comment 5 Andrew Pinski 2007-07-04 22:35:20 UTC
I can't reproduce this on the trunk,  unless I am making a mistake.  I think the cross jump is happening now.
Comment 6 Richard Biener 2007-07-05 08:41:11 UTC
For 4.1.x I get

kernel:
.LFB2:
        subq    $8, %rsp
.LCFI0:
        movl    $1, %edi
        call    gen_int
        movl    a(%rip), %eax
        movl    %eax, %edx
        andl    $3, %edx
        addl    %edx, %eax
        movl    %eax, a(%rip)
        movl    b(%rip), %eax
        movl    %eax, %edx
        orl     $3, %edx
        addl    %edx, %eax
        movl    %eax, b(%rip)
        movl    a+4(%rip), %eax
        movl    %eax, %edx
        andl    $3, %edx
        addl    %edx, %eax
        movl    %eax, a+4(%rip)
        movl    b+4(%rip), %eax
        movl    %eax, %edx
        orl     $3, %edx
        addl    %edx, %eax
        movl    %eax, b+4(%rip)
        addq    $8, %rsp
        ret

while with trunk we have

kernel:
.LFB2:
        subq    $8, %rsp
.LCFI0:
        movl    $1, %edi
        call    gen_int
        testl   %eax, %eax
        je      .L2
        movl    a(%rip), %edx
        movl    %edx, %eax
        andl    $3, %eax
        addl    %edx, %eax
        movl    b(%rip), %edx
        movl    %eax, a(%rip)
        movl    %edx, %eax
        orl     $3, %eax
        addl    %edx, %eax
        movl    %eax, b(%rip)
.L7:
        movl    a+4(%rip), %edx
        movl    %edx, %eax
        andl    $3, %eax
        addl    %edx, %eax
        movl    b+4(%rip), %edx
        movl    %eax, a+4(%rip)
        movl    %edx, %eax
        orl     $3, %eax
        addl    %edx, %eax
        movl    %eax, b+4(%rip)
        addq    $8, %rsp
        ret
        .p2align 4,,10
        .p2align 3
.L2:
        movl    a(%rip), %edx
        movl    %edx, %eax
        andl    $3, %eax
        addl    %edx, %eax
        movl    b(%rip), %edx
        movl    %eax, a(%rip)
        movl    %edx, %eax
        orl     $3, %eax
        addl    %edx, %eax
        movl    %eax, b(%rip)
        jmp     .L7
.LFE2:

so, no, this is not yet fixed.
Comment 7 Steven Bosscher 2007-10-16 13:34:13 UTC
Does not really "block" 24001, but the test case for that bug would be fixed if code hoisting would be implemented properly.
Comment 8 Steven Bosscher 2008-01-10 19:18:33 UTC
We start with a CFG that looks like this (all edges directed down):

ENTRY
|
2
|\
| \
3  5
|\  \
| \  \
7  4--6
 \   /
  \ /
   8
   |
  EXIT

where basic block 4 is a forwarder block.  Insns in blocks 6 and 7 match and are cross-jumped, to give a new CFG:

ENTRY
|
2
|\
| \
3  5
|\  \
| \  \
|  4--6
 \   /
  \ /
   7
   |
   8
   |
  EXIT

where basic blocks 4 *and* 6 are now forwarder blocks.  try_optimize_cfg then removes the redundant forwarders, which results in the following simpler CFG:

ENTRY
|
2
|\
3 5
|/
7
|
EXIT

where basic blocks 3 and 5 have matching insns that could be cross-jumped, except that there is a dead "(set (reg:CCZ 17 flags) (compare ...))" in the way that the delete_trivially_dead_insns() call in cleanup_cfg() can obviously not delete.  If that dead SET is deleted, blocks 3 and 5 are merged and we get the same code as previous releases.

In pre-DF gcc releases, a liveness update would run a liveness update with dead code elimination.  Copying that code from the GCC 4.2 release branch:

	  /* Cleaning up CFG introduces more opportunities for dead code
	     removal that in turn may introduce more opportunities for
	     cleaning up the CFG.  */
	  if (!update_life_info_in_dirty_blocks (UPDATE_LIFE_GLOBAL_RM_NOTES,
						 PROP_DEATH_NOTES
						 | PROP_SCAN_DEAD_CODE
						 | PROP_KILL_DEAD_CODE
						 | ((mode & CLEANUP_LOG_LINKS)
						    ? PROP_LOG_LINKS : 0)))

I don't believe we have to aggressively delete dead code on every cleanup_cfg() iteration.  We did plenty experiments with that on the DF branch, and running DCE all the time just loses in the cost/benefit trade-off.  As I've noted before, we should remove dead code when it is beneficial.

In this case, when we remove the forwarder edges and turn a conditional jump into a non-conditional jump, we should kill the code that computes the condition if the condition is dead at the end of the basic block.  We can do this if we have liveness information available during cleanup_cfg().
Comment 9 Steven Bosscher 2008-01-10 20:16:35 UTC
Created attachment 14915 [details]
Fix using run_fast_dce

I see no way around running run_fast_DCE.  But at least let's try to run it only when really necessary, and try to avoid unnecessary repetitive work if CLEANUP_CROSSJUMP is set.
Comment 10 Kenneth Zadeck 2008-01-11 13:15:25 UTC
stevens patch bootstrapped and regression tested on x86-86, ppc-32 and ia-64.

Comment 11 Richard Biener 2008-01-11 13:41:37 UTC
The patch is ok.
Comment 12 stevenb.gcc@gmail.com 2008-01-11 13:48:28 UTC
Subject: Re:  [4.3 Regression] Fails to cross-jump

Richi, could you commit it for me?
Comment 13 Richard Biener 2008-01-11 14:56:19 UTC
Subject: Bug 30905

Author: rguenth
Date: Fri Jan 11 14:55:34 2008
New Revision: 131468

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131468
Log:
2008-01-11  Steven Bosscher  <stevenb.gcc@gmail.com>

	PR rtl-optimization/30905
	* cfgcleanup.c: Include dce.h
	(crossjumps_occured): New global variable.
	(try_crossjump_bb): Exit loop after finding a fallthru edge.
	If something changed, set crossjumps_occured to true.
	(try_optimize_cfg): Clear crossjumps_occured at the beginning.
	Don't add/remove fake edges to exit here...
	(cleanup_cfg): ...but do it here, when crossjumping.
	Run a fast DCE when successful crossjumps occured in the latest
	iteration of try_optimize_cfg.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgcleanup.c

Comment 14 Steven Bosscher 2008-01-11 15:09:02 UTC
Whee, thanks Kenny and Richi!!!

Zapp...
Comment 15 Rahul Kharche 2009-06-11 17:38:12 UTC
GCC4.4 is still missing this fix. GCC-4.4.1 (20090507) on x86_64 produces the following with O2/O3

kernel:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$24, %esp
	movl	$1, (%esp)
	call	gen_int
	testl	%eax, %eax
	je	.L2
	movl	a, %edx
	movl	%edx, %ecx
	andl	$3, %ecx
	leal	(%ecx,%edx), %edx
	movl	%edx, a
	movl	b, %edx
	movl	%edx, %ecx
	orl	$3, %ecx
	leal	(%ecx,%edx), %edx
	movl	%edx, b
.L7:
	movl	a+4, %eax
	movl	%eax, %edx
	andl	$3, %edx
	leal	(%edx,%eax), %eax
	movl	%eax, a+4
	movl	b+4, %eax
	movl	%eax, %edx
	orl	$3, %edx
	leal	(%edx,%eax), %eax
	movl	%eax, b+4
	leave
	ret
	.p2align 4,,7
	.p2align 3
.L2:
	movl	a, %eax
	movl	%eax, %edx
	andl	$3, %edx
	leal	(%edx,%eax), %eax
	movl	%eax, a
	movl	b, %eax
	movl	%eax, %edx
	orl	$3, %edx
	leal	(%edx,%eax), %eax
	movl	%eax, b
	jmp	.L7

Any reason why this shouldn't go into 4.4?
Comment 16 Steven Bosscher 2009-06-11 19:48:02 UTC
The patch is in 4.4.  Apparently it doesn't work?  I'll have another look...
Comment 17 Steven Bosscher 2010-01-09 22:17:09 UTC
In GCC 4.4 only one crossjump happens. The second crossjump does not happen because the basic blocks are not identical. The register allocator has not allocated the registers in the same way:
					.L2:
        movl    a, %edx			        movl    a, %eax
        movl    %edx, %ecx		        movl    %eax, %edx
        andl    $3, %ecx		        andl    $3, %edx
        leal    (%ecx,%edx), %edx	        leal    (%edx,%eax), %eax
        movl    %edx, a			        movl    %eax, a
        movl    b, %edx			        movl    b, %eax
        movl    %edx, %ecx		        movl    %eax, %edx
        orl     $3, %ecx		        orl     $3, %edx
        leal    (%ecx,%edx), %edx	        leal    (%edx,%eax), %eax
        movl    %edx, b			        movl    %eax, b
					        jmp     .L7

The problem in this case is thus, that there is no pre-RA crossjumping pass anymore.

Works for me with my patch for PR20070.
Comment 18 Steven Bosscher 2010-02-12 21:11:42 UTC
As far as I'm concerned, this is WONTFIX for all affected compilers.
Comment 19 Steven Bosscher 2010-03-21 12:03:56 UTC
Cause here is better register allocation and lack of cross-jumping before register allocation. This will not be fixed. For GCC 4.6 we should add a cross-jumping patch (an improved version if this pass, anyway) before RA.