Bug 47258 - [4.5/4.6/4.7 Regression] Extra instruction generated in 4.5.2
Summary: [4.5/4.6/4.7 Regression] Extra instruction generated in 4.5.2
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.5.2
: P3 normal
Target Milestone: 4.7.0
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2011-01-11 13:37 UTC by Bingfeng Mei
Modified: 2012-02-02 08:38 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work: 4.4.0
Known to fail: 4.5.2, 4.6.0, 4.7.0
Last reconfirmed:


Attachments
Preprocessed test case (4.92 KB, application/octet-stream)
2011-01-11 13:38 UTC, Bingfeng Mei
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bingfeng Mei 2011-01-11 13:37:14 UTC
I encounter a performance regression in 4.5.2 (4.6 as well) compared with 4.5.1.

The code is from Core Mark. 

Compile the attached .i file. 

~/work/install-x86-452/bin/gcc core_matrix.i -O2 -S -o x86-452.s
...
.L5:
	movl	%r8d, %r10d
.L3:
	mov	%r9d, %r8d
	movswl	(%rcx,%rax), %r11d
	addq	$2, %rax
	movswl	(%rdx,%r8,2), %r8d
	addl	$1, %r9d
	imull	%r11d, %r8d
	addl	%r10d, %r8d
	cmpq	%rbx, %rax
	jne	.L5
...

~/work/install-x86-451/bin/gcc core_matrix.i -O2 -S -o x86-451.s
...
.L3:
	mov	%r9d, %r8d
	movswl	(%rcx,%rax), %r11d
	addq	$2, %rax
	movswl	(%rdx,%r8,2), %r8d
	addl	$1, %r9d
	imull	%r11d, %r8d
	addl	%r8d, %r10d
	cmpq	%rbx, %rax
	jne	.L3
...

The performance hit is even worse on our architecture because zero-overhead
loop instruction cannot be used in such irregular loop produced by 4.5.2

The configuration used is:
../gcc-4.5.1/configure --prefix=/projects/firepath/tools/work/bmei/install-x86-451 --with-mpfr=/projects/firepath/tools/work/bmei/packages/mpfr/2.4.1/x86-64 --with-gmp=/projects/firepath/tools/work/bmei/packages/gmp/4.3.0/x86-64 --with-mpc=/projects/firepath/tools/work/bmei/packages/mpc/0.8.1/x86-64 --with-elf=/projects/firepath/tools/work/bmei/packages/libelf/x86-64 --disable-bootstrap --enable-languages=c --no-create --no-recursion


The difference between 4.5.1 and 4.5.2 seems to occur in RTL expand pass.
Comment 1 Bingfeng Mei 2011-01-11 13:38:13 UTC
Created attachment 22944 [details]
Preprocessed test case
Comment 2 Bingfeng Mei 2011-01-11 16:16:28 UTC
After tried patches one-by-one, I believe the misoptimization is down to the following patch.

Index: tree-ssa-copyrename.c
===================================================================
RCS file: /cvs/dev/tools/src/fp_gcc/gcc/tree-ssa-copyrename.c,v
retrieving revision 1.1.2.5.2.1
retrieving revision 1.1.2.5.2.2
diff -u -r1.1.2.5.2.1 -r1.1.2.5.2.2
--- tree-ssa-copyrename.c	12 Apr 2010 13:15:43 -0000	1.1.2.5.2.1
+++ tree-ssa-copyrename.c	13 Dec 2010 05:51:45 -0000	1.1.2.5.2.2
@@ -225,11 +225,11 @@
       ign2 = false;
     }
 
-  /* Don't coalesce if the two variables aren't type compatible.  */
-  if (!types_compatible_p (TREE_TYPE (root1), TREE_TYPE (root2)))
+  /* Don't coalesce if the two variables are not of the same type.  */
+  if (TREE_TYPE (root1) != TREE_TYPE (root2))
     {
       if (debug)
-	fprintf (debug, " : Incompatible types.  No coalesce.\n");
+	fprintf (debug, " : Different types.  No coalesce.\n");
       return false;
     }
Comment 3 Richard Biener 2011-01-11 16:34:40 UTC
(In reply to comment #2)
> After tried patches one-by-one, I believe the misoptimization is down to the
> following patch.

Which is a correctness patch.  You can try dumbing it down somewhat with

if (TYPE_MAIN_VARIANT (TREE_TYPE (root1)) != TYPE_MAIN_VARIANT (TREE_TYPE (root2))
    || !types_compatible_p (TREE_TYPE (root1), TREE_TYPE (root2)))

and see if that helps.

> Index: tree-ssa-copyrename.c
> ===================================================================
> RCS file: /cvs/dev/tools/src/fp_gcc/gcc/tree-ssa-copyrename.c,v
> retrieving revision 1.1.2.5.2.1
> retrieving revision 1.1.2.5.2.2
> diff -u -r1.1.2.5.2.1 -r1.1.2.5.2.2
> --- tree-ssa-copyrename.c    12 Apr 2010 13:15:43 -0000    1.1.2.5.2.1
> +++ tree-ssa-copyrename.c    13 Dec 2010 05:51:45 -0000    1.1.2.5.2.2
> @@ -225,11 +225,11 @@
>        ign2 = false;
>      }
> 
> -  /* Don't coalesce if the two variables aren't type compatible.  */
> -  if (!types_compatible_p (TREE_TYPE (root1), TREE_TYPE (root2)))
> +  /* Don't coalesce if the two variables are not of the same type.  */
> +  if (TREE_TYPE (root1) != TREE_TYPE (root2))
>      {
>        if (debug)
> -    fprintf (debug, " : Incompatible types.  No coalesce.\n");
> +    fprintf (debug, " : Different types.  No coalesce.\n");
>        return false;
>      }
Comment 4 Richard Biener 2011-01-11 16:35:23 UTC
But we'll create bogus debug info for the typedef type decls then.
Comment 5 Bingfeng Mei 2011-01-13 15:49:23 UTC
It works. But I have no idea about the debug info issue in your other comment. 

> (In reply to comment #2)
> > After tried patches one-by-one, I believe the misoptimization is down to the
> > following patch.
> 
> Which is a correctness patch.  You can try dumbing it down somewhat with
> 
> if (TYPE_MAIN_VARIANT (TREE_TYPE (root1)) != TYPE_MAIN_VARIANT (TREE_TYPE
> (root2))
>     || !types_compatible_p (TREE_TYPE (root1), TREE_TYPE (root2)))
> 
> and see if that helps.
Comment 6 Andrew Pinski 2011-12-15 02:03:43 UTC
Can you try this patch:
Index: tree-outof-ssa.c
===================================================================
--- tree-outof-ssa.c	(revision 67191)
+++ tree-outof-ssa.c	(revision 67192)
@@ -1021,6 +1021,9 @@ insert_backedge_copies (void)
   basic_block bb;
   gimple_stmt_iterator gsi;
 
+  /* Make sure that edges have updated to be marked for back edges. */
+  mark_dfs_back_edges ();
+
   FOR_EACH_BB (bb)
     {
       /* Mark block as possibly needing calculation of UIDs.  */
--- CUT ---
I did not create this patch, it came from http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01865.html .

Which means this is most likely fixed on the trunk already.
Comment 7 Bingfeng Mei 2011-12-15 10:18:06 UTC
Yes, the patch fixes the bug. Thanks.
Comment 8 Andrew Pinski 2012-02-02 08:38:55 UTC
Fixed for 4.7.0 by:
------------------------------------------------------------------------
r181476 | wschmidt | 2011-11-18 06:15:38 -0800 (Fri, 18 Nov 2011) | 6 lines

2011-11-18  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

        * tree-outof-ssa.c (insert_back_edge_copies):  Add call to
        mark_dfs_back_edges.