52134 – Does not fold (x * 4) & -4

Bug 52134 - Does not fold (x * 4) & -4

Summary: Does not fold (x * 4) & -4

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	4.7.0

Importance:	P3 enhancement
Target Milestone:	4.8.0
Assignee:	Richard Biener

URL:
Keywords:	missed-optimization, TREE

Depends on:
Blocks:	19986
	Show dependency tree / graph

Reported:	2012-02-06 10:50 UTC by Richard Biener
Modified:	2012-03-14 09:39 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2012-02-06 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Biener 2012-02-06 10:50:20 UTC

We appearantly do not fold

((sizetype) MAX_EXPR <R1b, 0> + 2) * 4

BIT_AND_EXPR

-4

as it happens for TYPE_SIZE_UNIT of a struct for gnat.dg/opt9.adb.

This causes an issue with trying to compute the extent/padding of
a trailing bitfield in such a struct when you do

size_diffop (TYPE_SIZE_UNIT (DECL_CONTEXT (field)),
             DECL_FIELD_OFFSET (repr));

where repr is the first field of a bitfield group and field is the last
field of a bitfield group.

The above expression computes non-constant

((ssizetype) (((sizetype) MAX_EXPR <R1b, 0> + 2) * 4) & -4) - (ssizetype) ((sizetype) MAX_EXPR <R1b, 0> * 4)

Not sure if such an expression cannot be reliably required to be constant
in ada though.

Certainly this looks like a missed folding (not sure how often this
odd BIT_AND_EXPR for sizes happen in Ada).

Comment 1 Eric Botcazou 2012-02-06 14:08:13 UTC

> The above expression computes non-constant
> 
> ((ssizetype) (((sizetype) MAX_EXPR <R1b, 0> + 2) * 4) & -4) - (ssizetype)
> ((sizetype) MAX_EXPR <R1b, 0> * 4)
> 
> Not sure if such an expression cannot be reliably required to be constant
> in ada though.

Reassociation/simplification of size expressions can generally be done at will.

> Certainly this looks like a missed folding (not sure how often this
> odd BIT_AND_EXPR for sizes happen in Ada).

Quite a lot, but it's created by the folder itself (round_up).

Comment 2 Andrew Pinski 2012-02-06 19:45:41 UTC

Short testcase:
unsigned f(unsigned t)
{
  return (t*4)&-4;
}
int f1(int t)
{
  return (t*4)&-4;
}

Both should be optimized to just t*4.  In fact we do it on the RTL level.
*4 is changed to <<a 2 on the RTL level and simplified with the & when doing combine.

Comment 3 Richard Biener 2012-03-13 11:42:41 UTC

I have a patch.

Comment 4 Richard Biener 2012-03-13 13:47:43 UTC

Author: rguenth
Date: Tue Mar 13 13:47:35 2012
New Revision: 185334

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185334
Log:
2012-03-13  Richard Guenther  <rguenther@suse.de>

	PR middle-end/52134
	* fold-const.c (fold_binary_loc): Fold (X * Y) & -(1 << CST) to X * Y
	if Y is a constant multiple of 1 << CST.

	* gcc.dg/pr52134.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/pr52134.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/fold-const.c
    trunk/gcc/testsuite/ChangeLog

Comment 5 Richard Biener 2012-03-13 13:54:19 UTC

Fixed.

Comment 6 Andrew Pinski 2012-03-13 22:08:12 UTC

CCP could also remove the &:
Visiting statement:
D.1713_2 = t_1(D) * 4;
which is likely CONSTANT
Lattice value changed to CONSTANT Lattice value changed to CONSTANT 0x00000000000000000 (0x000000000fffffffc).  Adding SSA edges to worklist.

Visiting statement:
D.1712_3 = D.1713_2 & 4294967292;
which is likely CONSTANT
Lattice value changed to CONSTANT Lattice value changed to CONSTANT 0x00000000000000000 (0x000000000fffffffc).  Adding SSA edges to worklist.

..
Visiting statement:
D.1710_2 = t_1(D) * 4;
which is likely CONSTANT
Lattice value changed to CONSTANT Lattice value changed to CONSTANT 0x00000000000000000 (0xfffffffffffffffffffffffffffffffc).  Adding SSA edges to worklist.

Visiting statement:
D.1709_3 = D.1710_2 & -4;
which is likely CONSTANT
Lattice value changed to CONSTANT Lattice value changed to CONSTANT 0x00000000000000000 (0xfffffffffffffffffffffffffffffffc).  Adding SSA edges to worklist.


See how the lattice's already have its last 3 bits unset.  In fact I think we should only do this in the ccp/vrp passes to remove the & rather than fold.

Comment 7 Eric Botcazou 2012-03-13 22:17:35 UTC

> See how the lattice's already have its last 3 bits unset.  In fact I think we
> should only do this in the ccp/vrp passes to remove the & rather than fold.

For size calculations (TYPE_IS_SIZETYPE) the earlier you fold, the better, so that you don't have to drag huge expressions for variable-sized types in the FE.

Comment 8 rguenther@suse.de 2012-03-14 09:39:31 UTC

On Tue, 13 Mar 2012, pinskia at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52134
> 
> --- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-03-13 22:08:12 UTC ---
> CCP could also remove the &:
> Visiting statement:
> D.1713_2 = t_1(D) * 4;
> which is likely CONSTANT
> Lattice value changed to CONSTANT Lattice value changed to CONSTANT
> 0x00000000000000000 (0x000000000fffffffc).  Adding SSA edges to worklist.
> 
> Visiting statement:
> D.1712_3 = D.1713_2 & 4294967292;
> which is likely CONSTANT
> Lattice value changed to CONSTANT Lattice value changed to CONSTANT
> 0x00000000000000000 (0x000000000fffffffc).  Adding SSA edges to worklist.
> 
> ..
> Visiting statement:
> D.1710_2 = t_1(D) * 4;
> which is likely CONSTANT
> Lattice value changed to CONSTANT Lattice value changed to CONSTANT
> 0x00000000000000000 (0xfffffffffffffffffffffffffffffffc).  Adding SSA edges to
> worklist.
> 
> Visiting statement:
> D.1709_3 = D.1710_2 & -4;
> which is likely CONSTANT
> Lattice value changed to CONSTANT Lattice value changed to CONSTANT
> 0x00000000000000000 (0xfffffffffffffffffffffffffffffffc).  Adding SSA edges to
> worklist.
> 
> 
> See how the lattice's already have its last 3 bits unset.  In fact I think we
> should only do this in the ccp/vrp passes to remove the & rather than fold.

Yeah, CCP and VRP can do this as well.  In fact folding bit-operations
with the bit-CCP lattice is an obvious improvement, best done in
ccp_fold_stmt which is called at lattice substitution time.

Richard.