Bug 40487 - Extra zero extensions produced for ARM.
Summary: Extra zero extensions produced for ARM.
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.5.0
: P3 normal
Target Milestone: 4.5.0
Assignee: Ramana Radhakrishnan
URL:
Keywords:
Depends on:
Blocks: 16996
  Show dependency treegraph
 
Reported: 2009-06-18 12:50 UTC by Ramana Radhakrishnan
Modified: 2009-07-15 16:20 UTC (History)
2 users (show)

See Also:
Host: x86_64-linux
Target: arm-eabi
Build: x86_64-linux
Known to work:
Known to fail:
Last reconfirmed: 2009-06-23 12:38:42


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ramana Radhakrishnan 2009-06-18 12:50:21 UTC
A colleague at ARM found this a couple of days back. 

With trunk as of a few days back configured for arm-none-eabi for cortex-a8


typedef unsigned short ushort;
typedef unsigned char uchar;

ushort foo(uchar data, uchar data1, uchar data2)
{
  uchar x = (uchar)(data);
  x ^= (x + 5); 
  x ^= (x << 2); 
  x ^= (x << 1); 
  return x;
}


foo:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	add	r3, r0, #5
	eor	r0, r3, r0
	uxtb	r0, r0   //redundant
	eor	r0, r0, r0, lsl #2
	uxtb	r0, r0   // redundant
	eor	r0, r0, r0, lsl #1
	uxtb	r0, r0
	bx	lr
Comment 1 Ramana Radhakrishnan 2009-06-18 12:58:54 UTC
I'm not sure about the best way of fixing this without looking at bigger trees at expand time or for combine to be able to do something smart about this one. Essentially you fold the previous zero extension with the current operation because the current operation doesn't care about the higher order bits and there is a zero extension afterwards that puts it into the right shape.


x = zextb (y)
z = x ^ (x << #n)
w = zextb (z)

into 

x = y ^ (y << #n)
w = zextb (x)




Comment 2 Steven Bosscher 2009-06-18 14:00:12 UTC
Why does the zero-bits machinery in combine not make these redundant extensions go away?
Comment 3 Steven Bosscher 2009-06-22 16:36:53 UTC
Is this related to bug 39715?
Comment 4 Richard Earnshaw 2009-06-22 17:00:23 UTC
(In reply to comment #3)
> Is this related to bug 39715?
> 

Maybe.
Comment 5 Steven Bosscher 2009-06-22 17:58:32 UTC
Compiling with gcc 4.4.1 with options "-Os -mtune=cortex-a8" I get this:

	.cpu arm7tdmi
	.fpu softvfp
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 1
	.eabi_attribute 30, 4
	.eabi_attribute 18, 4
	.file	"PR40487.c"
	.text
	.align	2
	.global	foo
	.type	foo, %function
foo:
	@ Function supports interworking.
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	add	r3, r0, #5
	eor	r3, r3, r0
	and	r3, r3, #255
	eor	r3, r3, r3, asl #2
	and	r3, r3, #255
	eor	r3, r3, r3, asl #1
	and	r0, r3, #255
	bx	lr
	.size	foo, .-foo
	.ident	"GCC: (GNU) 4.4.1 20090622 (prerelease) [gcc-4_4-branch revision 148809]"


which isn't a whole lot better, is it...
Comment 6 Steven Bosscher 2009-06-22 18:25:02 UTC
I get the same code with 4.5-today as the code of comment #5.  I configured for --target=arm-eabi.  Should I configure differently to see the shifts instead of ands?


Comment 7 Steven Bosscher 2009-06-22 18:25:26 UTC
see the uxtbs instead of the ands, that is...
Comment 8 Ramana Radhakrishnan 2009-06-22 22:57:46 UTC
(In reply to comment #5)
> Compiling with gcc 4.4.1 with options "-Os -mtune=cortex-a8" I get this:

Try with -mcpu=cortex-a8 . -mtune=cortex-a8 doesn't choose the cpu for that ,  insn selection for the arm port happens with mcpu and tuning and costs are controlled by mtune.
And no it isn't a whole load better :)


I had configured the toolchain with --with-cpu=cortex-a8.

Thanks,
Ramana
Comment 9 Ramana Radhakrishnan 2009-06-23 09:16:04 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Is this related to bug 39715?
> > 
> 
> Maybe.
> 

39715 appears to be strictly a 4.5 missed optimization, but from comment #5 it appears as though this is different and I can verify that the same problem exists for 4.3 and trunk.



Comment 10 Steven Bosscher 2009-06-23 09:50:22 UTC
Yes, this bug is indeed not related to bug 39715.

I have also verified that the SEE pass (sign-extend elimination, but also should handle zero-extend) fails to handle this case.  And that pass doesn't exist anymore on the trunk anyway.

I wonder if we can optimize this in GIMPLE already...
Comment 11 Richard Earnshaw 2009-07-14 14:53:51 UTC
The following define_split works for this specific case, but it needs to be made more generic (handling IOR and HImode variants).

It also needs reworking for big-endian -- that needs (subreg...3).

(define_split
  [(set (match_operand:SI 0 "s_register_operand" "")
	(xor:SI (and:SI (ashift:SI (match_operand:SI 1 "s_register_operand" "")
				   (match_operand:SI 2 "const_int_operand" ""))
			(match_operand:SI 3 "const_int_operand" ""))
		(zero_extend:SI (subreg:QI (match_dup 1) 0))))]
  "TARGET_32BIT && INTVAL (operands[3]) == (255 & (255 << (INTVAL (operands[2]))))"
  [(set (match_dup 0) (xor:SI (ashift:SI (match_dup 1) (match_dup 2))
			      (match_dup 1)))
   (set (match_dup 0) (and:SI (match_dup 0) (const_int 255)))]
  "")
Comment 12 Richard Earnshaw 2009-07-15 10:31:41 UTC
Fixed with:

http://gcc.gnu.org/ml/gcc-patches/2009-07/msg00848.html
Comment 13 Paolo Bonzini 2009-07-15 16:20:39 UTC
For the record, it's actually somewhat related to PR39726 (a m68k pessimization), not PR39715.  However, because of the way combine canonicalizes the resulting expression, the patch for that bug does not fix the testcase.