Bug 38533 - [4.3 regression] tree-ssa-reassoc.c increases register pressure several times
Summary: [4.3 regression] tree-ssa-reassoc.c increases register pressure several times
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.4.0
: P2 normal
Target Milestone: 4.4.0
Assignee: Jakub Jelinek
URL:
Keywords: missed-optimization
Depends on:
Blocks: 27855 28481
  Show dependency treegraph
 
Reported: 2008-12-15 17:53 UTC by Hiroshi Shimamoto
Modified: 2009-04-22 15:19 UTC (History)
5 users (show)

See Also:
Host:
Target: x86_64-unknown-linux-gnu
Build:
Known to work: 4.4.0
Known to fail: 4.3.3
Last reconfirmed: 2008-12-16 23:25:08


Attachments
gcc44-pr38533.patch (1.42 KB, patch)
2008-12-17 13:06 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Hiroshi Shimamoto 2008-12-15 17:53:26 UTC
The stack usage of the code gcc-4.x generated looks inefficient on x86 and
x86_64. A simple test case is below;
========
#define copy_from_asm(x, addr, err)	\
asm volatile(				\
	"1:\tmovl %2, %1\n"		\
	"2:\n"				\
	".section .fixup,\"ax\"\n"	\
	"\txor %1,%1\n"			\
	"\tmov $1,%0\n"			\
	"\tjmp 2b\n"			\
	".previous\n"			\
	: "=r" (err), "=r" (x)		\
	: "m" (*(int*)(addr)), "0" (err))

#define copy_from(x, addr, err)	do {		\
	(err) = 0;				\
	copy_from_asm((x), (addr), (err));	\
} while (0)

#define copy(x, addr)	({		\
	int __err;			\
	copy_from((x), (addr), __err);	\
	__err;				\
})

int src[32];
int dst[32];

#define my_copy(x)	do { err |= copy(dst[x], &src[x]); } while (0)

int test(void)
{
	int err = 0;

	my_copy( 0); my_copy( 1); my_copy( 2); my_copy( 3);
	my_copy( 4); my_copy( 5); my_copy( 6); my_copy( 7);
	my_copy( 8); my_copy( 9); my_copy(10); my_copy(11);
	my_copy(12); my_copy(13); my_copy(14); my_copy(15);
	my_copy(16); my_copy(17); my_copy(18); my_copy(19);
	my_copy(20); my_copy(21); my_copy(22); my_copy(23);
	my_copy(24); my_copy(25); my_copy(26); my_copy(27);
	my_copy(28); my_copy(29); my_copy(30); my_copy(31);

	return err;
}
======

I compiled this test case with gcc-3.4.6, 4.2.4, 4.3.2 and 4.4-20081205,
and the compile option is "-g -Os -mno-red-zone".
The code size of objects are below;
$ size test.o.*
   text	   data	    bss	    dec	    hex	filename
    945	      0	      0	    945	    3b1	test.o.34
   1157	      0	      0	   1157	    485	test.o.42
   1133	      0	      0	   1133	    46d	test.o.43
   1201	      0	      0	   1201	    4b1	test.o.44

gcc-3.4.6 generates;
0000000000000000 <test>:
   0:	31 c9                	xor    %ecx,%ecx
   2:	8b 05 00 00 00 00    	mov    0x0(%rip),%eax        # 8 <test+0x8>
   8:	89 05 00 00 00 00    	mov    %eax,0x0(%rip)        # e <test+0xe>
   e:	31 c0                	xor    %eax,%eax
  10:	8b 15 00 00 00 00    	mov    0x0(%rip),%edx        # 16 <test+0x16>
  16:	09 c1                	or     %eax,%ecx
  18:	89 15 00 00 00 00    	mov    %edx,0x0(%rip)        # 1e <test+0x1e>
  1e:	31 c0                	xor    %eax,%eax
  20:	8b 15 00 00 00 00    	mov    0x0(%rip),%edx        # 26 <test+0x26>
  26:	09 c1                	or     %eax,%ecx

gcc-4.4 generates;
0000000000000000 <test>:
   0:	41 57                	push   %r15
   2:	31 c0                	xor    %eax,%eax
   4:	41 56                	push   %r14
   6:	41 55                	push   %r13
   8:	41 89 c5             	mov    %eax,%r13d
   b:	41 54                	push   %r12
   d:	55                   	push   %rbp
   e:	53                   	push   %rbx
   f:	48 83 ec 58          	sub    $0x58,%rsp
  13:	8b 15 00 00 00 00    	mov    0x0(%rip),%edx        # 19 <test+0x19>
  19:	89 15 00 00 00 00    	mov    %edx,0x0(%rip)        # 1f <test+0x1f>
  1f:	41 89 c6             	mov    %eax,%r14d
  22:	8b 15 00 00 00 00    	mov    0x0(%rip),%edx        # 28 <test+0x28>
  28:	89 15 00 00 00 00    	mov    %edx,0x0(%rip)        # 2e <test+0x2e>
  2e:	41 89 c4             	mov    %eax,%r12d
...
  bf:	31 d2                	xor    %edx,%edx
  c1:	44 8b 3d 00 00 00 00 	mov    0x0(%rip),%r15d        # c8 <test+0xc8>
  c8:	89 54 24 04          	mov    %edx,0x4(%rsp)
  cc:	44 89 3d 00 00 00 00 	mov    %r15d,0x0(%rip)        # d3 <test+0xd3>
  d3:	45 31 ff             	xor    %r15d,%r15d
  d6:	8b 15 00 00 00 00    	mov    0x0(%rip),%edx        # dc <test+0xdc>
  dc:	44 89 7c 24 54       	mov    %r15d,0x54(%rsp)
...
 26d:	0b 54 24 54          	or     0x54(%rsp),%edx
 271:	0b 54 24 50          	or     0x50(%rsp),%edx
 275:	0b 54 24 4c          	or     0x4c(%rsp),%edx
 279:	0b 54 24 48          	or     0x48(%rsp),%edx
...

On gcc-4.x, error values temporally stored to stack and at the last "or" all
stored data. This stack usage seems inefficient.
Comment 1 Jakub Jelinek 2008-12-16 13:49:06 UTC
The culprit is tree-ssa-reassoc.c, with -fno-tree-ssa-reassoc the generated code is comparable to 3.4.
For some reason it decides:
Transforming err_10 = __err_3 | __err_8;
 into err_10 = __err_8 | __err_3;
(no idea what advantages that has, but anyway) but instead of reinserting
the transformed stmt into the location where err_10 used to be set before,
it moves all the 30 err_A = err_B | __err_C stmts (where each stmt depends on the preceeding one) after the last asm volatile.  This means that register pressure jumps through the roof.  In this case (if the transformation actually makes sense) certainly the first transformed stmt can be inserted into the old definition spot and all the following stmts can be left untouched.
Comment 2 H.J. Lu 2008-12-16 14:58:43 UTC
Sounds similar to PR 32183.
Comment 3 Jakub Jelinek 2008-12-16 16:57:15 UTC
In particular, the problem is in linearize_expr_tree.  The |s are already perfectly linearized (in all but the innermost recursive linearize_expr_tree
binlhsisreassoc is 1 and binrhsisreassoc is 0, in the innermost both are 0),
yet:
1645  gsinow = gsi_for_stmt (stmt);
1646  gsilhs = gsi_for_stmt (SSA_NAME_DEF_STMT (binlhs));
1647  gsi_move_before (&gsilhs, &gsinow);
moves all of them to one spot.  Couldn't that moving be deferred till we actually know it is worthwhile to reassociate something, not blindly just in case something will be needed?
Comment 4 Daniel Berlin 2008-12-16 17:00:31 UTC
Subject: Re:  [4.2/4.3/4.4 regression] tree-ssa-reassoc.c increases register pressure several times

Yes, that looks like a bug.
There are also numerous ways in which the placement can be improved.
A few people had talked about rewriting it to be a balanced tree
placement, etc, so i've never bothered to do even the small cleanups.
If you want to change placement, go for it.
:)

On Tue, Dec 16, 2008 at 11:57 AM, jakub at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #3 from jakub at gcc dot gnu dot org  2008-12-16 16:57 -------
> In particular, the problem is in linearize_expr_tree.  The |s are already
> perfectly linearized (in all but the innermost recursive linearize_expr_tree
> binlhsisreassoc is 1 and binrhsisreassoc is 0, in the innermost both are 0),
> yet:
> 1645  gsinow = gsi_for_stmt (stmt);
> 1646  gsilhs = gsi_for_stmt (SSA_NAME_DEF_STMT (binlhs));
> 1647  gsi_move_before (&gsilhs, &gsinow);
> moves all of them to one spot.  Couldn't that moving be deferred till we
> actually know it is worthwhile to reassociate something, not blindly just in
> case something will be needed?
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38533
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>
Comment 5 Jakub Jelinek 2008-12-17 13:06:12 UTC
Created attachment 16915 [details]
gcc44-pr38533.patch

Patch that cures this.  Bootstrapped/regtested on x86_64-linux, except for
gfortran.dg/reassoc_4.f failure where the undistribution doesn't happen any more for some reason.  Looking into it.
Comment 6 Jakub Jelinek 2008-12-17 16:04:29 UTC
The problem with the patch is that useless insns (visited with has_zero_uses on lhs) are now scattered in the bbs and so aren't removed immediately.
Comment 7 Jakub Jelinek 2008-12-18 07:56:05 UTC
Subject: Bug 38533

Author: jakub
Date: Thu Dec 18 07:54:43 2008
New Revision: 142807

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=142807
Log:
	PR middle-end/38533
	* tree-ssa-reassoc.c (remove_visited_stmt_chain): New function.
	(rewrite_expr_tree): Add moved argument, move stmts together if
	needed.  Call remove_visited_stmt_chain.
	(linearize_expr_tree): Don't move stmts here.
	(reassociate_bb): Call remove_visited_stmt_chain if num ops is 1.
	Adjust rewrite_expr_tree caller.

	* gcc.dg/tree-ssa/pr38533.c: New test.
	* gcc.c-torture/execute/pr38533.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr38533.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr38533.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-reassoc.c

Comment 8 Jakub Jelinek 2008-12-18 07:56:35 UTC
Fixed on the trunk.
Comment 9 Paolo Bonzini 2009-02-04 07:44:09 UTC
Did the patch cure PR28481 too?
Comment 10 Joseph S. Myers 2009-03-31 21:05:15 UTC
Closing 4.2 branch.
Comment 11 Richard Biener 2009-04-22 15:19:16 UTC
WONTFIX on the 4.3 branch.