This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [Patch ping] Strength reduction

From: "William J. Schmidt" <wschmidt at linux dot vnet dot ibm dot com>
To: Richard Guenther <richard dot guenther at gmail dot com>
Cc: gcc-patches at gcc dot gnu dot org, rguenther at suse dot de, bergner at vnet dot ibm dot com
Date: Wed, 20 Jun 2012 11:22:49 -0500
Subject: Re: [Patch ping] Strength reduction
References: <1335741467.3009.41.camel@gnopaine> <1339680089.4208.0.camel@oc2474580526.ibm.com> <CAFiYyc3LgH8Jbst9AAk9hqmJspFnkc8GMk4oF-AbpNR=cqVkWA@mail.gmail.com>
On Wed, 2012-06-20 at 13:11 +0200, Richard Guenther wrote:
> On Thu, Jun 14, 2012 at 3:21 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > Pro forma ping. :)
> 
> ;)
> 
> I notice (with all of these functions)
> 
> +unsigned
> +negate_cost (enum machine_mode mode, bool speed)
> +{
> +  static unsigned costs[NUM_MACHINE_MODES];
> +  rtx seq;
> +  unsigned cost;
> +
> +  if (costs[mode])
> +    return costs[mode];
> +
> +  start_sequence ();
> +  force_operand (gen_rtx_fmt_e (NEG, mode,
> +				gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)),
> +		 NULL_RTX);
> +  seq = get_insns ();
> +  end_sequence ();
> +
> +  cost = seq_cost (seq, speed);
> +  if (!cost)
> +    cost = 1;
> 
> that the cost[] array is independent on the speed argument.  Thus whatever
> comes first determines the cost.  Odd, and probably not good.  A fix
> would be appreciated (even for the current code ...) - simply make the
> array costs[NUM_MACHINE_MODES][2].
> 
> As for the renaming - can you name the functions consistently?  Thus
> the above would be negate_reg_cost?  And maybe rename the other
> FIXME function, too?

I agree with all this.  I'll prepare all the cost model changes as a
separate preliminaries patch.

> 
> Index: gcc/tree-ssa-strength-reduction.c
> ===================================================================
> --- gcc/tree-ssa-strength-reduction.c	(revision 0)
> +++ gcc/tree-ssa-strength-reduction.c	(revision 0)
> @@ -0,0 +1,1611 @@
> +/* Straight-line strength reduction.
> +   Copyright (C) 2012  Free Software Foundation, Inc.
> 
> I know we have these 'tree-ssa-' names, but really this is gimple-ssa now ;)
> So, please name it gimple-ssa-strength-reduction.c.

Will do.  Vive la revolution? ;)

> 
> +  /* Access to the statement for subsequent modification.  Cached to
> +     save compile time.  */
> +  gimple_stmt_iterator cand_gsi;
> 
> this is a iterator for cand_stmt?  Then caching it is no longer necessary
> as the iterator is the stmt itself after recent infrastructure changes.

Oh yeah, I remember seeing that go by.  Nice.  Will change.

> 
> +/* Hash table embodying a mapping from statements to candidates.  */
> +static htab_t stmt_cand_map;
> ...
> +static hashval_t
> +stmt_cand_hash (const void *p)
> +{
> +  return htab_hash_pointer (((const_slsr_cand_t) p)->cand_stmt);
> +}
> 
> use a pointer-map instead.
> 
> +/* Callback to produce a hash value for a candidate chain header.  */
> +
> +static hashval_t
> +base_cand_hash (const void *p)
> +{
> +  tree ssa_name = ((const_cand_chain_t) p)->base_name;
> +
> +  if (TREE_CODE (ssa_name) != SSA_NAME)
> +    return (hashval_t) 0;
> +
> +  return (hashval_t) SSA_NAME_VERSION (ssa_name);
> +}
> 
> does it ever happen that ssa_name is not an SSA_NAME?  

Not in this patch, but when I introduce CAND_REF in a later patch it
could happen since the base field of a CAND_REF is a MEM_REF.  It's a
safety valve in case of misuse.  I'll think about this some more.

> I'm not sure
> the memory savings over simply using a fixed-size (num_ssa_names)
> array indexed by SSA_NAME_VERSION pointing to the chain is worth
> using a hashtable for this?

That's reasonable.  I'll do that.

> 
> +  node = (cand_chain_t) pool_alloc (chain_pool);
> +  node->base_name = c->base_name;
> 
> If you never free pool entries it's more efficient to use an obstack.
> alloc-pool
> only pays off if you get freed item re-use.

OK.  I'll change both cand_pool and chain_pool to obstacks.

> 
> +  switch (gimple_assign_rhs_code (gs))
> +    {
> +    case MULT_EXPR:
> +      rhs2 = gimple_assign_rhs2 (gs);
> +
> +      if (TREE_CODE (rhs2) == INTEGER_CST)
> +	return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed);
> +
> +      if (TREE_CODE (rhs1) == INTEGER_CST)
> +	return multiply_by_cost (TREE_INT_CST_LOW (rhs1), lhs_mode, speed);
> 
> In theory all commutative statements should have constant operands only
> at rhs2 ...

I'm glad I'm not the only one who thought that was the theory. ;)  I
wasn't sure, and I've seen violations of this come up in practice.
Should I assert when that happens instead, and track down the offending
optimizations?

> 
> Also you do not verify that the constant fits in a host-wide-int - but maybe
> you do not care?  Thus, I'd do
> 
>    if (host_integerp (rhs2, 0))
>      return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed);
> 
> or make multiply_by[_const?]_cost take a double-int instead.  Likewise below
> for add.

Ok.  Name change looks good also, I'll include that in the cost model
changes.

> 
> +    case MODIFY_EXPR:
> +      /* Be suspicious of assigning costs to copies that may well go away.  */
> +      return 0;
> 
> MODIFY_EXPR is never a gimple_assign_rhs_code.  Simple copies have
> a code of SSA_NAME for example.  But as you assert if you get to an
> unhandled code I wonder why you needed the above ...

I'll remove this, and document that we are deliberately not touching
copies (which was my original intent).

> 
> +static slsr_cand_t
> +base_cand_from_table (tree base_in)
> +{
> +  slsr_cand mapping_key;
> +
> +  gimple def = SSA_NAME_DEF_STMT (base_in);
> +  if (!def)
> +    return (slsr_cand_t) NULL;
> +
> +  mapping_key.cand_stmt = def;
> +  return (slsr_cand_t) htab_find (stmt_cand_map, &mapping_key);
> 
> isn't that reachable via the base-name -> chain mapping for base_in?

Maybe so.  I need to think about it some more (it got evicted from my
mental cache).  It could be I created this before adding the chains
stuff and never cleaned up.

> 
> +  if (TREE_CODE (rhs2) == SSA_NAME && operand_equal_p (rhs1, rhs2, 0))
> +    return;
> 
> SSA_NAMEs can be compared by pointer equality, thus the above is
> equivalent to
> 
>   if (TREE_CODE (rhs2) == SSA_NAME && rhs1 == rhs2)
> 
> or even just
> 
>   if (rhs1 == rhs2)
> 
> applies elsewhere as well.
> 

Ok.  I promise I'll remember this eventually...

> +/* Return TRUE iff PRODUCT is an integral multiple of FACTOR, and return
> +   the multiple in *MULTIPLE.  Otherwise return FALSE and leave *MULTIPLE
> +   unchanged.  */
> +/* ??? - Should this be moved to double-int.c?  */
> 
> I think so.

Ok, I'll include this in the preliminaries patch.

> 
> +static bool
> +double_int_multiple_of (double_int product, double_int factor,
> +			bool unsigned_p, double_int *multiple)
> +{
> +  double_int remainder;
> +  double_int quotient = double_int_divmod (product, factor, unsigned_p,
> +					   TRUNC_DIV_EXPR, &remainder);
> +  if (double_int_zero_p (remainder))
> +    {
> +      *multiple = quotient;
> +      return true;
> +    }
> +
> +  return false;
> +}
> 
> 
> In general I find a lot of code of similar structure, factoring bits may make
> it easier to read.  Obvious candidates include:
> 
> +      /* Add the interpretation to the statement-candidate mapping.  */
> +      slot = htab_find_slot (stmt_cand_map, c, INSERT);
> +      gcc_assert (!*slot);
> +      *slot = c;
> 
> into a function add_cand_for_stmt ()
> 
> +      if (TREE_CODE (rhs1) != SSA_NAME || TREE_CODE (rhs2) != INTEGER_CST)
> +	return;
> 
> doing some simple checks of this kind in the function walking all statements
> and pass in operands and operation code.

Sounds good.  You should have seen it before I already did one round of
factoring... :)

> 
> +/* Return TRUE if GS is a statement that defines an SSA name from
> +   a NOP_EXPR and is legal for us to combine an add and multiply
> +   across.  This is legitimate for casts from a signed type to
> +   a signed or unsigned type of the same or larger size.  It is not
> +   legitimate to convert any unsigned type to a signed type, or
> +   to an unsigned type of a different size.
> +
> +   The reasoning here is that signed integer overflow is undefined,
> +   so any program that was expecting overflow that no longer occurs
> +   is not correct.  Unsigned integers, however, have wrap semantics,
> +   and it is reasonable for programs to assume an overflowing add
> +   will wrap.
> +
> +   With -fwrapv, signed integers also have wrap semantics, so widening
> +   casts are not allowed then either.  */
> +
> +static bool
> +legal_cast_p (gimple gs)
> +{
> +  tree lhs, rhs, lhs_type, rhs_type;
> +  unsigned lhs_size, rhs_size, lhs_uns, rhs_uns;
> +
> +  if (!is_gimple_assign (gs)
> +      || TREE_CODE (gimple_assign_lhs (gs)) != SSA_NAME
> 
> That's always true if the code is NOP_EXPR
> 
> +      || gimple_assign_rhs_code (gs) != NOP_EXPR)
> 
> CONVERT_EXPR_CODE_P ()
> 
> +    return false;
> +
> +  lhs = gimple_assign_lhs (gs);
> +  rhs = gimple_assign_rhs1 (gs);
> +
> +  if (TREE_CODE (rhs) != SSA_NAME)
> +    return false;
> 
> Likewise.
> 
> +  lhs_type = TREE_TYPE (SSA_NAME_VAR (lhs));
> +  rhs_type = TREE_TYPE (SSA_NAME_VAR (rhs));
> 
> Get the type from the SSA_NAMEs themselves, no need to lookup
> SSA_NAME_VAR.  If it happens you ICE that way you are looking
> at released SSA names ...
> 
> +  lhs_size = TYPE_PRECISION (lhs_type);
> +  rhs_size = TYPE_PRECISION (rhs_type);
> +  lhs_uns = TYPE_UNSIGNED (lhs_type);
> +  rhs_uns = TYPE_UNSIGNED (rhs_type);
> +
> +  if (lhs_size < rhs_size
> +      || (rhs_uns && !lhs_uns)
> +      || (rhs_uns && lhs_uns && rhs_size != lhs_size)
> +      || (!rhs_uns && flag_wrapv && rhs_size != lhs_size))
> +    return false;
> 
> So we basically check whether the lhs can represent all values of the rhs?
> So ...
> 
>     if (lhs_size > rhs_size)
>       return true;
> 
> is missing?  Because you'd reject a widening of an unsigned int to an
> unsigned long.

That rejection is intentional.  Assuming 4-byte int and 8-byte long, the
unsigned int would wrap at 32 bits, so allowing the widening could
change semantics.  I've actually encountered problems like this all over
libiberty and other GCC support libraries.

> 
> As for your handling of flag_wrapv and the unsigned flag, please try
> to use the TYPE_OVERFLOW_UNDEFINED/TYPE_OVERFLOW_WRAPS
> type properties instead for the parts that care about overflow behavior
> and not value-representation.
> 
> +  return true;
> +}
> 

OK, I'll look into those and let you know if I have questions.

> I have a deja-vu for seeing similar code elsewhere (widening shift/multiply
> support in the vector pattern recognizer maybe).  While I may understand
> what the textual description says including an example would explain
> things better.  For example
> 
> "Return TRUE if GS is a statement that defines an SSA name from
>  a conversion and is legal for us to associate with an add or multiply.
>  Thus transform name = (T) name'; name * X; to name' * X"
> 
> But I suppose I got the semantics wrong (and thus the example).  Can you
> clarify?

OK, I'll try.  This was definitely the hardest part to get right.

> 
> Otherwise the pass looks quite good.  It looks though that it will optimize
> cross-basic-block strength-reduction opportunities, eventually causing
> cross-BB register livetime increases?  Or is this not an issue?

This is true, and it's made somewhat worse by later optimizations.  I
attempted to reduce the scope where possible by choosing the most
closely dominating candidate as a basis, thinking this would help limit
lifetimes.  However, later optimizations fold up the various adds so
that all candidates in a chain end up directly depending on the
"ultimate basis," resulting in one longer lifetime.

It's something to keep an eye on.  If it appears to be an issue, we may
need to look into some sort of distance heuristic to limit which
candidates can be used as a basis.  It's not a big deal to keep a
register live across a single conditional branch, but reusing a value
from a control-equivalent block when there's an intervening loop is not
so good.  (On the other hand, it's just another opportunity for
intelligent register spill design. :)

As I indicated, though, I don't think this is a problem limited to this
pass, but is a general issue many places in the middle end.

> 
> Thanks for working on this,
> Richard.
> 

And thanks very much for the review!

Bill
Follow-Ups:
- Re: [Patch ping] Strength reduction
  - From: Richard Guenther
References:
- Re: [Patch ping] Strength reduction
  - From: William J. Schmidt
- Re: [Patch ping] Strength reduction
  - From: Richard Guenther
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]