GCC with gcse enabled generates very poor code for this fragment: ------------------------------------------------------------------------------ int r0, r1, r2, r3, r4, r5; void f (int n) { while (-- n) { r1 += r0; r2 += r1; r3 += r2; r4 += r3; r5 += r4; r0 += r5; } } ------------------------------------------------------------------------------ This is code from 3.4.4 with -O2 -fomit-frame-pointer: .L4: movl 16(%esp), %ebp movl 12(%esp), %esi movl 8(%esp), %ecx movl 4(%esp), %edx movl (%esp), %eax addl %edi, %ebp addl %ebp, %esi movl %ebp, 16(%esp) addl %esi, %ecx movl %esi, 12(%esp) addl %ecx, %edx movl %ecx, 8(%esp) addl %edx, %eax decl 20(%esp) movl %edx, 4(%esp) leal (%edi,%eax), %ebx movl %eax, (%esp) movl %ebx, %edi jne .L4 And this is with -O2 -fomit-frame-pointer -fno-gcse: .L4: addl %edi, %ebp addl %ebp, %ebx addl %ebx, %ecx addl %ecx, %edx addl %edx, %eax addl %eax, %edi decl %esi jne .L4 Same applies for 4.0.0, only one should use '-O2 -fomit-frame-pointer -fno-tree-loop-im -fno-gcse' to get decent code. This is a regression and was introduced by this change: 2001-07-16 Daniel Berlin <dan@cgsoftware.com> * gcse.c: Update comment at top. Update comment on mem handling. mem_last_set, mem_first_set, mem_set_in_block: gone. Declaration of reg_set_info: gone. (oprs_unchanged_p): Don't use mem_*set_* anymore. They are pointless with load_killed_in_block_p (they are *more* conservative then it, not less, and less accurate). (oprs_not_set_p): Ditto. (alloc_gcse_mem): Don't allocate mem_set_in_block (free_gcse_mem): Don't free it, either. (record_last_mem_set_info): Update comment in front, remove mem_*set_* stuff. Note the reason we don't handle stores directly here. (compute_hash_table): Update comments to reflect reality. Remove mem_*set_* references. (reset_opr_set_tables): Remove mem_*set_* references. (mark_call): Ditto. (mark_set): Ditto. Also remove double sets of bitmaps for REG's. (mark_clobber): Ditto (on both parts, we double set here too). (expr_killed_p): Remove mem_set_in_block test. (compute_transp): Remove mem_set_in_block test.
Confirmed, for 3.4 and 3.3, -fnew-ra fixes the problem so this is just the register allocator really being stupid. With 4.0 and -fnew-ra, we just seg fault (I think this is why we should remove -fnew-ra on the mainline because it does not work on most programs at all).
Subject: Re: New: [3.3/3.4/4.0 Regression] gcse causes poor register allocation > This is a regression and was introduced by this change: I have a very hard time believing this, since the patch below only touched store motion. > > 2001-07-16 Daniel Berlin <dan@cgsoftware.com> > > * gcse.c: Update comment at top. > Update comment on mem handling. > mem_last_set, mem_first_set, mem_set_in_block: gone. > Declaration of reg_set_info: gone. > (oprs_unchanged_p): Don't use mem_*set_* anymore. They are > pointless with load_killed_in_block_p (they are *more* > conservative then it, not less, and less accurate). > (oprs_not_set_p): Ditto. > (alloc_gcse_mem): Don't allocate mem_set_in_block > (free_gcse_mem): Don't free it, either. > (record_last_mem_set_info): Update comment in front, remove > mem_*set_* stuff. Note the reason we don't handle stores directly > here. > (compute_hash_table): Update comments to reflect reality. Remove > mem_*set_* references. > (reset_opr_set_tables): Remove mem_*set_* references. > (mark_call): Ditto. > (mark_set): Ditto. Also remove double sets of bitmaps for REG's. > (mark_clobber): Ditto (on both parts, we double set here too). > (expr_killed_p): Remove mem_set_in_block test. > (compute_transp): Remove mem_set_in_block test. >
Subject: Re: [3.3/3.4/4.0 Regression] gcse causes poor register allocation On Sat, 2005-01-15 at 21:44 +0000, dberlin at dberlin dot org wrote: > ------- Additional Comments From dberlin at gcc dot gnu dot org 2005-01-15 21:44 ------- > Subject: Re: New: [3.3/3.4/4.0 Regression] > gcse causes poor register allocation > > > > This is a regression and was introduced by this change: > > I have a very hard time believing this, since the patch below only > touched store motion. Unless, of course, GCSE was completely broken for simple expressions somehow before the patch.
This can be fixed at the tree level without any gcse or ra hacking: http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01462.html
With my patch I get almost perfect code for amd64: .globl f .type f, @function f: .LFB2: decl %edi je .L6 movl r5(%rip), %r9d movl r4(%rip), %r8d xorl %r10d, %r10d movl r3(%rip), %esi movl r2(%rip), %ecx movl r0(%rip), %edx movl r1(%rip), %eax .p2align 4,,7 .L4: addl %edx, %eax incl %r10d addl %eax, %ecx addl %ecx, %esi addl %esi, %r8d addl %r8d, %r9d addl %r9d, %edx cmpl %r10d, %edi jne .L4 movl %r9d, r5(%rip) movl %r8d, r4(%rip) movl %esi, r3(%rip) movl %ecx, r2(%rip) movl %edx, r0(%rip) movl %eax, r1(%rip) .L6: rep ; ret
(In reply to comment #4) > This can be fixed at the tree level without any gcse or ra hacking: > http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01462.html GCC generates poor code for this function even with that patch applied: int r[6]; void g (int n) { while (-- n) { r [0] += r [5]; r [1] += r [0]; r [2] += r [1]; r [3] += r [2]; r [4] += r [3]; r [5] += r [4]; } }
(In reply to comment #6) > (In reply to comment #4) > > This can be fixed at the tree level without any gcse or ra hacking: > > http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01462.html > > GCC generates poor code for this function even with that patch applied: This looks like a different bug because with 3.3.3 we get the good code but not with 3.4.0. Good code meaning: .L5: addl %edi, %ebp addl %ebp, %esi addl %esi, %ecx addl %ecx, %edx addl %edx, %eax addl %eax, %edi decl %ebx jne .L5
(In reply to comment #7) > This looks like a different bug because with 3.3.3 we get the good code but not with 3.4.0. this is now bug 19580
Subject: Bug 19464 CVSROOT: /cvs/gcc Module name: gcc Changes by: steven@gcc.gnu.org 2005-01-23 19:36:01 Modified files: gcc : ChangeLog tree-optimize.c Log message: PR rtl-optimization/19464 * tree-optimize.c (init_tree_optimization_passes): Add one more copyrename pass just before out-of-ssa. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.7249&r2=2.7250 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-optimize.c.diff?cvsroot=gcc&r1=2.69&r2=2.70
.