Bug 45416 - [4.5/4.6 Regression] Code size regression from 4.4 for ARM
Summary: [4.5/4.6 Regression] Code size regression from 4.4 for ARM
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.6.0
: P2 normal
Target Milestone: 4.7.0
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords: missed-optimization, patch
Depends on:
Blocks:
 
Reported: 2010-08-26 12:34 UTC by Bingfeng Mei
Modified: 2012-02-01 14:47 UTC (History)
6 users (show)

See Also:
Host: x86_64-unknown-linux
Target: arm
Build:
Known to work: 4.4.5, 4.7.0
Known to fail: 4.5.0, 4.5.1
Last reconfirmed: 2010-08-27 07:41:26


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bingfeng Mei 2010-08-26 12:34:44 UTC
This is a performance/size regression between 4.6 (4.5) and 4.4. 

The C code:
int foo(long long a)
{
   if (a & (long long) 0x400)
      return 1;
   return 0;
}

Assemble code generated by 4.6 trunk:
foo:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	movq	%rsp, %rbp
	.cfi_offset 6, -16
	.cfi_def_cfa_register 6
	movq	%rdi, -8(%rbp)
	movq	-8(%rbp), %rax
	andl	$1024, %eax
	testq	%rax, %rax
	je	.L2
	movl	$1, %eax
	jmp	.L3
.L2:
	movl	$0, %eax
.L3:
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc

Assemble code generated  by 4.4.0:
foo:
.LFB0:
	.cfi_startproc
	shrq	$10, %rdi
	movl	%edi, %eax
	andl	$1, %eax
	ret
	.cfi_endproc

After tree optimizations, both compilers produce different
but essentially same forms. RTL expander and later passes 
then go on to do different optimizations and generate very
different code.
Comment 1 Richard Biener 2010-08-26 12:43:26 UTC
It works for me.
Comment 2 Bingfeng Mei 2010-08-26 12:47:36 UTC
Sorry, I first observed this on our target.  Then I tried to reproduce on x86, but I forgot to turn on optimization flags. It does work for x86. Please delete this report. I will figure out what happen with my target. 
Comment 3 Bingfeng Mei 2010-08-26 12:55:57 UTC
I found I can reproduce the bug with ARM

ARM trunk -Os:
foo:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	mov	r2, #1024
	mov	r3, #0
	and	r2, r2, r0
	and	r3, r3, r1
	orrs	r1, r2, r3
	moveq	r0, #0
	movne	r0, #1
	mov	pc, lr
	.size	foo, .-foo
Arm 4.40 -Os:

foo:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	mov	r0, r0, lsr #10
	and	r0, r0, #1
	bx	lr


Comment 4 Mikael Pettersson 2010-08-26 14:01:56 UTC
(In reply to comment #3)
> I found I can reproduce the bug with ARM

I see this too on armv5tel-linux-gnueabi.  gcc-4.4.4 -Os generates 3 instructions for the body of foo(), 4.5.1 and 4.6.0 generate 8 instructions.
Comment 5 Mikael Pettersson 2010-08-26 21:13:05 UTC
The code size regression on ARM is caused by r146817, Matz' expand from SSA patch: http://gcc.gnu.org/ml/gcc-cvs/2009-04/msg01459.html

Here's the diff in the assembly code generated by a cross to armv5tel-linux-gnueabi, with -Os -S, for r146816 and r146817:

--- pr45316.s-r146816   2010-08-26 23:00:18.000000000 +0200
+++ pr45316.s-r146817   2010-08-26 23:03:24.000000000 +0200
@@ -17,8 +17,13 @@
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
-       mov     r0, r0, lsr #10
-       and     r0, r0, #1
+       mov     r2, #1024
+       mov     r3, #0
+       and     r2, r2, r0
+       and     r3, r3, r1
+       orrs    r1, r2, r3
+       moveq   r0, #0
+       movne   r0, #1
        bx      lr
        .size   foo, .-foo
        .ident  "GCC: (GNU) 4.5.0 20090426 (experimental)"

My cross was configured:
../gcc-4.5-r146817/configure --target=armv5tel-unknown-linux-gnueabi --with-arch=armv5te --with-tune=xscale --disable-plugin --disable-lto --disable-nls --disable-shared --disable-libmudflap --disable-multilib --enable-threads=posix --enable-checking=release --enable-languages=c
Comment 6 Richard Biener 2010-12-16 13:03:16 UTC
GCC 4.5.2 is being released, adjusting target milestone.
Comment 7 Richard Biener 2011-04-28 14:51:27 UTC
GCC 4.5.3 is being released, adjusting target milestone.
Comment 8 Bingfeng Mei 2011-04-28 15:22:26 UTC
I am currently on vacation until 4/5/2011. I may access my mail irregularly.
Cheers,
Bingfeng Mei
Comment 9 Andrew Pinski 2011-12-05 20:31:17 UTC
We produce better code with the trunk now:
	xorl	%eax, %eax
	testl	$1024, %edi
	setne	%al
	ret

But that is still worse than before with the and ...
Comment 10 Andrew Pinski 2011-12-05 20:40:59 UTC
  /* If this is an equality or inequality test of a single bit, we can
     do this by shifting the bit being tested to the low-order bit and
     masking the result with the constant 1.  If the condition was EQ,
     we xor it with 1.  This does not require an scc insn and is faster
     than an scc insn even if we have it.

     The code to make this transformation was moved into fold_single_bit_test,
     so we just call into the folder and expand its result.  */

  if ((code == NE || code == EQ)
      && TREE_CODE (arg0) == BIT_AND_EXPR && integer_zerop (arg1)
      && integer_pow2p (TREE_OPERAND (arg0, 1))
      && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))

This needs to be updated for the out-of-ssa expand.
Comment 11 Andrew Pinski 2011-12-05 20:51:58 UTC
I have a patch:
Index: expr.c
===================================================================
--- expr.c	(revision 182017)
+++ expr.c	(working copy)
@@ -10563,15 +10563,24 @@ do_store_flag (sepops ops, rtx target, e
      so we just call into the folder and expand its result.  */
 
   if ((code == NE || code == EQ)
-      && TREE_CODE (arg0) == BIT_AND_EXPR && integer_zerop (arg1)
-      && integer_pow2p (TREE_OPERAND (arg0, 1))
+      && TREE_CODE (arg0) == SSA_NAME && integer_zerop (arg1)
       && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))
     {
-      tree type = lang_hooks.types.type_for_mode (mode, unsignedp);
-      return expand_expr (fold_single_bit_test (loc,
-						code == NE ? NE_EXPR : EQ_EXPR,
-						arg0, arg1, type),
-			  target, VOIDmode, EXPAND_NORMAL);
+      gimple srcstmt = get_gimple_for_ssa_name (arg0);
+      if (srcstmt
+	  && TREE_CODE (gimple_assign_rhs_code (srcstmt)) == BIT_AND_EXPR
+	  && integer_pow2p (gimple_assign_rhs2 (srcstmt))
+	{
+	  enum tree_code tcode = code == NE ? NE_EXPR : EQ_EXPR;
+	  tree type = lang_hooks.types.type_for_mode (mode, unsignedp);
+	  tree temp = fold_build2 (loc, BIT_AND_EXPR, type,
+				   gimple_assign_rhs1 (srcstmt),
+				   gimple_assign_rhs2 (srcstmt));
+	  return expand_expr (fold_single_bit_test (loc,
+			      			    tcode,
+			      			    arg0, arg1, type),
+			      target, VOIDmode, EXPAND_NORMAL);
+	}
     }
 
   if (! get_subtarget (target)
Comment 12 Andrew Pinski 2011-12-05 20:56:14 UTC
(In reply to comment #11)
> I have a patch:
With a couple of fixes above that.
Comment 13 Andrew Pinski 2011-12-07 19:23:13 UTC
Author: pinskia
Date: Wed Dec  7 19:23:10 2011
New Revision: 182084

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=182084
Log:
2011-12-07  Andrew Pinski  <apinski@cavium.com>

	PR middle-end/45416
	* expr.c (do_store_flag): Rewrite code that looks for BIT_AND_EXPR for
	SSA-expand.

2011-12-07  Andrew Pinski  <apinski@cavium.com>

	PR middle-end/45416
	* gcc.dg/pr45416.c: New testcase.


Added:
    trunk/gcc/testsuite/gcc.dg/pr45416.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/expr.c
    trunk/gcc/testsuite/ChangeLog
Comment 14 Andrew Pinski 2011-12-07 19:24:22 UTC
Fixed on the trunk.  If someone wants to backport the patch, that is ok with me.
Comment 15 Georg-Johann Lay 2011-12-09 13:01:39 UTC
Appears that the patch leads to code bloat on AVR at least.

It is general wrong to assume that it's cheap to perform x >> n without caring for the underlying architecture.
Comment 16 Ramana Radhakrishnan 2012-01-23 17:59:56 UTC
Author: ramana
Date: Mon Jan 23 17:59:51 2012
New Revision: 183446

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183446
Log:


2012-01-23  Ramana Radhakrishnan  <ramana.radhakrishnan@linaro.org>

       PR middle-end/45416
        * gcc.dg/pr45416.c: Skip if Thumb1. Handle ubfx.


Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/pr45416.c
Comment 17 Richard Earnshaw 2012-02-01 14:47:09 UTC
fixed.  Not a wrong-code bug, so not backporting.