Bug 19464 - [3.3/3.4/4.0 Regression] gcse causes poor register allocation
Summary: [3.3/3.4/4.0 Regression] gcse causes poor register allocation
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.0.0
: P2 normal
Target Milestone: 4.0.0
Assignee: Steven Bosscher
URL:
Keywords: missed-optimization, ra
Depends on:
Blocks: 14741
  Show dependency treegraph
 
Reported: 2005-01-15 20:58 UTC by Serge Belyshev
Modified: 2005-01-23 19:43 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work: 2.95.3 3.0.4
Known to fail: 3.3.5 3.4.4 4.0.0
Last reconfirmed: 2005-01-21 12:25:01


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Serge Belyshev 2005-01-15 20:58:16 UTC
GCC with gcse enabled generates very poor code for this fragment:

------------------------------------------------------------------------------
int r0, r1, r2, r3, r4, r5;

void f (int n)
{
  while (-- n)
    {
      r1 += r0;
      r2 += r1;
      r3 += r2;
      r4 += r3;
      r5 += r4;
      r0 += r5;
    }
}
------------------------------------------------------------------------------

This is code from 3.4.4 with -O2 -fomit-frame-pointer:
.L4:
	movl	16(%esp), %ebp
	movl	12(%esp), %esi
	movl	8(%esp), %ecx
	movl	4(%esp), %edx
	movl	(%esp), %eax
	addl	%edi, %ebp
	addl	%ebp, %esi
	movl	%ebp, 16(%esp)
	addl	%esi, %ecx
	movl	%esi, 12(%esp)
	addl	%ecx, %edx
	movl	%ecx, 8(%esp)
	addl	%edx, %eax
	decl	20(%esp)
	movl	%edx, 4(%esp)
	leal	(%edi,%eax), %ebx
	movl	%eax, (%esp)
	movl	%ebx, %edi
	jne	.L4

And this is with -O2 -fomit-frame-pointer -fno-gcse:
.L4:
	addl	%edi, %ebp
	addl	%ebp, %ebx
	addl	%ebx, %ecx
	addl	%ecx, %edx
	addl	%edx, %eax
	addl	%eax, %edi
	decl	%esi
	jne	.L4

Same applies for 4.0.0, only one should use '-O2 -fomit-frame-pointer
-fno-tree-loop-im -fno-gcse' to get decent code.

This is a regression and was introduced by this change:

2001-07-16  Daniel Berlin  <dan@cgsoftware.com>

	* gcse.c: Update comment at top. 
	Update comment on mem handling.
	mem_last_set, mem_first_set, mem_set_in_block: gone.
	Declaration of reg_set_info: gone.
	(oprs_unchanged_p): Don't use mem_*set_* anymore. They are
	pointless with load_killed_in_block_p (they are *more*
	conservative then it, not less, and less accurate).
	(oprs_not_set_p): Ditto.	
	(alloc_gcse_mem): Don't allocate mem_set_in_block
	(free_gcse_mem): Don't free it, either.
	(record_last_mem_set_info): Update comment in front, remove
	mem_*set_* stuff. Note the reason we don't handle stores directly
	here.
	(compute_hash_table): Update comments to reflect reality. Remove
	mem_*set_* references.
	(reset_opr_set_tables): Remove mem_*set_* references.
	(mark_call): Ditto.
	(mark_set): Ditto.  Also remove double sets of bitmaps for REG's.	
	(mark_clobber): Ditto (on both parts, we double set here too).
	(expr_killed_p): Remove mem_set_in_block test.
	(compute_transp): Remove mem_set_in_block test.
Comment 1 Andrew Pinski 2005-01-15 21:04:44 UTC
Confirmed, for 3.4 and 3.3, -fnew-ra fixes the problem so this is just the register allocator really being 
stupid.


With 4.0 and -fnew-ra, we just seg fault (I think this is why we should remove -fnew-ra on the mainline 
because it does not work on most programs at all).
Comment 2 Daniel Berlin 2005-01-15 21:44:40 UTC
Subject: Re:  New: [3.3/3.4/4.0 Regression]
	gcse causes poor register allocation


> This is a regression and was introduced by this change:

I have a very hard time believing this, since the patch below only
touched store motion.

> 
> 2001-07-16  Daniel Berlin  <dan@cgsoftware.com>
> 
> 	* gcse.c: Update comment at top. 
> 	Update comment on mem handling.
> 	mem_last_set, mem_first_set, mem_set_in_block: gone.
> 	Declaration of reg_set_info: gone.
> 	(oprs_unchanged_p): Don't use mem_*set_* anymore. They are
> 	pointless with load_killed_in_block_p (they are *more*
> 	conservative then it, not less, and less accurate).
> 	(oprs_not_set_p): Ditto.	
> 	(alloc_gcse_mem): Don't allocate mem_set_in_block
> 	(free_gcse_mem): Don't free it, either.
> 	(record_last_mem_set_info): Update comment in front, remove
> 	mem_*set_* stuff. Note the reason we don't handle stores directly
> 	here.
> 	(compute_hash_table): Update comments to reflect reality. Remove
> 	mem_*set_* references.
> 	(reset_opr_set_tables): Remove mem_*set_* references.
> 	(mark_call): Ditto.
> 	(mark_set): Ditto.  Also remove double sets of bitmaps for REG's.	
> 	(mark_clobber): Ditto (on both parts, we double set here too).
> 	(expr_killed_p): Remove mem_set_in_block test.
> 	(compute_transp): Remove mem_set_in_block test.
> 

Comment 3 Daniel Berlin 2005-01-15 21:48:44 UTC
Subject: Re:  [3.3/3.4/4.0 Regression] gcse
	causes poor register allocation

On Sat, 2005-01-15 at 21:44 +0000, dberlin at dberlin dot org wrote:
> ------- Additional Comments From dberlin at gcc dot gnu dot org  2005-01-15 21:44 -------
> Subject: Re:  New: [3.3/3.4/4.0 Regression]
> 	gcse causes poor register allocation
> 
> 
> > This is a regression and was introduced by this change:
> 
> I have a very hard time believing this, since the patch below only
> touched store motion.


Unless, of course, GCSE was completely broken for simple expressions
somehow before the patch.



Comment 4 Steven Bosscher 2005-01-21 12:25:01 UTC
This can be fixed at the tree level without any gcse or ra hacking: 
http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01462.html 
Comment 5 Steven Bosscher 2005-01-21 12:51:12 UTC
With my patch I get almost perfect code for amd64: 
 
.globl f 
        .type   f, @function 
f: 
.LFB2: 
        decl    %edi 
        je      .L6 
        movl    r5(%rip), %r9d 
        movl    r4(%rip), %r8d 
        xorl    %r10d, %r10d 
        movl    r3(%rip), %esi 
        movl    r2(%rip), %ecx 
        movl    r0(%rip), %edx 
        movl    r1(%rip), %eax 
        .p2align 4,,7 
.L4: 
        addl    %edx, %eax 
        incl    %r10d 
        addl    %eax, %ecx 
        addl    %ecx, %esi 
        addl    %esi, %r8d 
        addl    %r8d, %r9d 
        addl    %r9d, %edx 
        cmpl    %r10d, %edi 
        jne     .L4 
        movl    %r9d, r5(%rip) 
        movl    %r8d, r4(%rip) 
        movl    %esi, r3(%rip) 
        movl    %ecx, r2(%rip) 
        movl    %edx, r0(%rip) 
        movl    %eax, r1(%rip) 
.L6: 
        rep ; ret 
 
Comment 6 Serge Belyshev 2005-01-23 00:42:16 UTC
(In reply to comment #4)
> This can be fixed at the tree level without any gcse or ra hacking: 
> http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01462.html 

GCC generates poor code for this function even with that patch applied:

int r[6];

void g (int n)
{
  while (-- n)
    {
      r [0] += r [5];
      r [1] += r [0];
      r [2] += r [1];
      r [3] += r [2];
      r [4] += r [3];
      r [5] += r [4];
    }
}
Comment 7 Andrew Pinski 2005-01-23 01:17:18 UTC
(In reply to comment #6)
> (In reply to comment #4)
> > This can be fixed at the tree level without any gcse or ra hacking: 
> > http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01462.html 
> 
> GCC generates poor code for this function even with that patch applied:

This looks like a different bug because with 3.3.3 we get the good code but not with 3.4.0.
Good code meaning:
.L5:
        addl    %edi, %ebp
        addl    %ebp, %esi
        addl    %esi, %ecx
        addl    %ecx, %edx
        addl    %edx, %eax
        addl    %eax, %edi
        decl    %ebx
        jne     .L5
Comment 8 Serge Belyshev 2005-01-23 01:43:30 UTC
(In reply to comment #7)
> This looks like a different bug because with 3.3.3 we get the good code but
not with 3.4.0.

this is now bug 19580
Comment 9 GCC Commits 2005-01-23 19:36:11 UTC
Subject: Bug 19464

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	steven@gcc.gnu.org	2005-01-23 19:36:01

Modified files:
	gcc            : ChangeLog tree-optimize.c 

Log message:
	PR rtl-optimization/19464
	* tree-optimize.c (init_tree_optimization_passes): Add one more
	copyrename pass just before out-of-ssa.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7249&r2=2.7250
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-optimize.c.diff?cvsroot=gcc&r1=2.69&r2=2.70

Comment 10 Steven Bosscher 2005-01-23 19:43:39 UTC
.