17731 – sub-optimal code generated for left shift

Bug 17731 - sub-optimal code generated for left shift

Summary: sub-optimal code generated for left shift

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.0.0

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2004-09-29 17:58 UTC by Tom Tromey
Modified:	2011-05-22 15:30 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2011-05-22 17:27:32

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tom Tromey 2004-09-29 17:58:03 UTC

Consider this java method:

  public int shift (int x, int y)
  {
    return x << y;
  }

On x86, with -O3, this becomes:

_ZN1t5shiftEii:
.LFB2:
	pushl	%ebp
.LCFI0:
	movl	%esp, %ebp
.LCFI1:
	movl	16(%ebp), %ecx
	movl	12(%ebp), %eax
	popl	%ebp
	andl	$31, %ecx
	sall	%cl, %eax
	ret


However, on x86 (starting with 80286), the shift count is masked
to 5 bits by sall, so the "andl" instruction is redundant.

I haven't looked to see exactly where this buglet might lie.

Comment 1 Andrew Pinski 2004-09-29 18:05:17 UTC

But note on most other targets x << y where y > 31 is undefined and so is the tree code for SHIFT 
LEFT.

Comment 2 Tom Tromey 2004-09-29 18:09:32 UTC

Yes, that's fine.
The java front end generates trees like <shift op <and op 31>>.
I would like the extra "and" to be optimized out somewhere
along the way, on a target-specific basis.

Comment 3 Andrew Pinski 2004-09-29 18:11:04 UTC

on PPC:
__ZN4test5shiftEii:
        rlwinm r5,r5,0,27,31
        slw r3,r4,r5
        blr

Which is correct as r5 could be greater than 31 and being greater than 31 is undefined (4 bits are only 
used really but it could do something different than that).

So if this were to be done correctly, we would need a target hook for the Java front-end.

Note the only reason why I bring this up is because we just had a bootstrap failure where we depended 
on being anded by the machine and not in the code.

Comment 4 Andrew Pinski 2004-09-29 18:14:08 UTC

Then the easy way to do that would add a pattern in the machine description for
(ashift:SI () (and:SI () (const_int 31 [0x1f])))
and let combine do its job but note we are masking 4 bits and not 5 out so we still will get different 
answers than what we are expecting.

Comment 5 Andrew Pinski 2004-09-29 18:16:14 UTC

forget that thing about 4 vs 5 bits, I cannot count.
Confirmed.

Comment 6 Richard Henderson 2004-09-29 19:39:45 UTC

This will be done properly if SHIFT_COUNT_TRUNCATED is true.

The problem is that SHIFT_COUNT_TRUNCATED applies to *all* shifts and at
the given mode size.  Which is not true for x86, since HImode shifts are
still masked to 31, not 15.

Ideally we'd extend Richard Sandiford's new target.shift_truncation_mask
such that it can replace SHIFT_COUNT_TRUNCATED.

In the short term, Java could examine shift_truncation_mask and avoid
creating the AND at the tree level in the first place.

Not sure whether "middle-end" or "rtl-opt" is a better category, but this
is definitely not target-specific and thus "target" doesn't apply.

Comment 7 Tom Tromey 2004-09-29 21:03:51 UTC

Thanks for the response.
I'm not too concerned in the short run -- we've had this bug for a
long time.

I neglected to mention that in java a long (64 bit, i.e. long << int)
shift uses "& 63" (as one would expect).  I didn't look to see
what kind of code this generates or whether it can be further optimized.

Comment 8 Steven Bosscher 2011-05-22 15:30:58 UTC

(In reply to comment #2)
> Yes, that's fine.
> The java front end generates trees like <shift op <and op 31>>.
> I would like the extra "and" to be optimized out somewhere
> along the way, on a target-specific basis.

Fixed by tree-ssa::

$ cat t.c
int shift (int x, int y)
{
  int _y = y & 31;
  return x << _y;
}

$ ./cc1 -quiet -m32 -O2 t.c -fdump-tree-optimized
$ cat t.s 
	.file	"t.c"
	.text
	.p2align 4,,15
	.globl	shift
	.type	shift, @function
shift:
.LFB0:
	.cfi_startproc
	movl	4(%esp), %eax
	movl	8(%esp), %ecx
	sall	%cl, %eax
	ret
	.cfi_endproc
.LFE0:
	.size	shift, .-shift
	.ident	"GCC: (GNU) 4.6.0 20110312 (experimental) [trunk revision 170907]"
	.section	.note.GNU-stack,"",@progbits