This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.

From: Richard Biener <richard dot guenther at gmail dot com>
To: Jeff Law <law at redhat dot com>
Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
Date: Wed, 17 Feb 2016 15:13:08 +0100
Subject: Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
Authentication-results: sourceware.org; auth=none
References: <56BD1EFB dot 90008 at redhat dot com> <CAFiYyc3iYgzrprV2V=C02nbAmZntPy7QH_=Dxo_UXps7sneh_g at mail dot gmail dot com> <56BE14B0 dot 9040801 at redhat dot com> <56C0ACC1 dot 60905 at redhat dot com> <0D3F08EF-9DC3-4848-AEC7-1AE639464B3D at gmail dot com> <56C4217F dot 2040809 at redhat dot com> <CAFiYyc0D4-Fum24QD9TYsC6DvPd71yuWnhKZeuEk-BpjcONQkQ at mail dot gmail dot com> <56C47D5C dot 7010804 at redhat dot com>

On Wed, Feb 17, 2016 at 3:02 PM, Jeff Law <law@redhat.com> wrote:
> On 02/17/2016 03:48 AM, Richard Biener wrote:
>
>>> I instrumented a bootstrap -- the improved DSE finds ~20k additional DSE
>>> opportunities during a GCC bootstrap that could not be found by the
>>> current
>>> DSE.  Yes, 20k additional statements deleted by tree DSE.  Yow!
>>
>>
>> Well, DCE also can do quite some DSE and it runs after DSE - did that 20k
>> more DSE affect the overall end-result?
>
> I haven't looked at that yet.  I just got the instrumentation data last
> night.
>
>
>>> Of those additional opportunities > 99% are for sizes of 64 bytes or
>>> smaller.  Thus we can pack those into 1 or 2 bitmap elements, depending
>>> on
>>> the starting offset.  So the bitmap side will be efficient with no real
>>> searching if we choose our PARAM value wisely.
>>
>>
>> So then please use a uint64_t or even uint32_t mask please.  Which means
>> a fixed size SBITMAP (32 bits) if you like to use the bitmap interface.
>
> I actually prefer the standard bitmap interface as it seamlessly handles
> differences in the starting offset for the writes.
>
>>
>> Can you share your work-in-progress patch?
>
> Easy 'nuff.  This will bootstrap and regression test.  Was planning to spend
> today generating some additional testcodes from new cases it catches and
> looking at impacts on code generation.
>
> I'm particularly interested in any impact on the zero-sized object clobbers.
> I'd like to remove the bits which filter those out.
>
> It feels like there's some refactoring that ought to happen in this code.
> Both in terms of the mostly duplicated test that a particular ref is
> "interesting" and with mostly duplicated code to extract a ref from a mem*
> or assignment.
>
> jeff
>
>
> commit d49afd895524df98c5e53280b1c77f4b61a45ba3
> Author: Jeff Law <law@tor.usersys.redhat.com>
> Date:   Tue Feb 16 13:44:20 2016 -0500
>
>     Checkpoint
>
>     CHeckpoint
>
>     Another checkpoint
>
>     Checkpoint
>
> diff --git a/gcc/params.def b/gcc/params.def
> index c0494fa..5aa146b 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -520,6 +520,11 @@ DEFPARAM(PARAM_IV_ALWAYS_PRUNE_CAND_SET_BOUND,
>          "If number of candidates in the set is smaller, we always try to
> remove unused ivs during its optimization.",
>          10, 0, 0)
>
> +DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
> +        "dse-max-object-size",
> +        "Maximum size (in bytes) of objects tracked by dead store
> elimination.",
> +        64, 0, 0)
> +
>  DEFPARAM(PARAM_SCEV_MAX_EXPR_SIZE,
>          "scev-max-expr-size",
>          "Bound on size of expressions used in the scalar evolutions
> analyzer.",
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
> b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
> index 87a2638..3155741 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
> @@ -10,4 +10,4 @@ int f(void)
>    return g(&t);
>  }
>
> -/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail
> *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
> b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
> index e2cd403..e6d027f 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
> @@ -8,4 +8,4 @@ int f(void)
>   __imag__ t = 2;
>  }
>
> -/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail
> *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
> index 594c20c..ae48ddd 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
> @@ -11,4 +11,4 @@ foo ()
>  }
>
>  /* We should eliminate the first assignment.  */
> -/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" } } */
> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
> index 372a0be..97a091b 100644
> --- a/gcc/tree-ssa-dse.c
> +++ b/gcc/tree-ssa-dse.c
> @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-dfa.h"
>  #include "domwalk.h"
>  #include "tree-cfgcleanup.h"
> +#include "params.h"
>
>  /* This file implements dead store elimination.
>
> @@ -68,6 +69,58 @@ along with GCC; see the file COPYING3.  If not see
>     remove their dead edges eventually.  */
>  static bitmap need_eh_cleanup;
>
> +/* Clear any bytes written by STMT from the bitmap LIVE_BYTES.  The base
> +   address written by STMT must match the one found in REF, which must
> +   have its base address previously initialized.
> +
> +   This routine must be conservative.  If we don't know the offset or
> +   actual size written, assume nothing was written.  */
> +
> +static void
> +clear_bytes_written_by (bitmap live_bytes, gimple *stmt, ao_ref *ref)
> +{
> +  ao_ref write;
> +  write.base = NULL;
> +
> +  /* It's advantageous to handle certain mem* functions.  */
> +  if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
> +    {
> +      switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)))
> +       {
> +         case BUILT_IN_MEMCPY:
> +         case BUILT_IN_MEMMOVE:
> +         case BUILT_IN_MEMSET:
> +           {
> +             tree size = NULL_TREE;
> +             if (gimple_call_num_args (stmt) == 3)
> +               size = gimple_call_arg (stmt, 2);
> +             tree ptr = gimple_call_arg (stmt, 0);
> +             ao_ref_init_from_ptr_and_size (&write, ptr, size);
> +             ao_ref_base (&write);
> +           }
> +         default:
> +           break;
> +       }
> +    }
> +  else if (is_gimple_assign (stmt))
> +    {
> +      ao_ref_init (&write, gimple_assign_lhs (stmt));
> +      ao_ref_base (&write);
> +    }
> +
> +  /* Verify we have the same base memory address and that the write
> +     has a known size.  If so, then clear the appropriate bytes in
> +     the LIVE_BYTES bitmap.  */
> +  if (write.base
> +      && write.base == ref->base
> +      && write.size == write.max_size
> +      && (write.size % BITS_PER_UNIT) == 0
> +      && (write.offset % BITS_PER_UNIT) == 0
> +      && write.max_size != -1)
> +    bitmap_clear_range (live_bytes,
> +                       write.offset / BITS_PER_UNIT,
> +                       write.size / BITS_PER_UNIT);
> +}
>
>  /* A helper of dse_optimize_stmt.
>     Given a GIMPLE_ASSIGN in STMT that writes to REF, find a candidate
> @@ -79,9 +132,33 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
> gimple **use_stmt)
>  {
>    gimple *temp;
>    unsigned cnt = 0;
> +  bitmap live_bytes = NULL;
>
>    *use_stmt = NULL;
>
> +  /* REF is a memory write.  Go ahead and get its base, size, extent
> +     information and encode the bytes written into LIVE_BYTES.  We can
> +     handle any case where we have a known base and maximum size.
> +
> +     However, experimentation has shown that bit level tracking is not
> +     useful in practice, so we only track at the byte level.
> +
> +     Furthermore, experimentation has shown that 99% of the cases
> +     that require byte tracking are 64 bytes or less.  Tracking 64
> +     bytes also happens to fit nicely into a bitmap element.  */
> +  if (ao_ref_base (ref)
> +      && ref->max_size
> +      && (ref->offset % BITS_PER_UNIT) == 0
> +      && (ref->max_size % BITS_PER_UNIT) == 0
> +      && ref->max_size <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)
> +      && ref->max_size != -1)
> +    {
> +      live_bytes = BITMAP_ALLOC (NULL);
> +      bitmap_set_range (live_bytes,
> +                       ref->offset / BITS_PER_UNIT,
> +                       ref->max_size / BITS_PER_UNIT);
> +    }
> +
>    /* Find the first dominated statement that clobbers (part of) the
>       memory stmt stores to with no intermediate statement that may use
>       part of the memory stmt stores.  That is, find a store that may
> @@ -177,11 +254,18 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
> gimple **use_stmt)
>           temp = stmt;
>           break;
>         }
> +
> +      if (live_bytes && temp)
> +       clear_bytes_written_by (live_bytes, temp, ref);
>      }
> -  /* Continue walking until we reach a kill.  */
> -  while (!stmt_kills_ref_p (temp, ref));
> +  /* Continue walking until we reach a full kill as a single statement
> +     or there are no more live bytes.  */
> +  while (!stmt_kills_ref_p (temp, ref)
> +        && !(live_bytes && bitmap_empty_p (live_bytes)));

Just a short quick comment - the above means you only handle partial stores
with no interveaning uses.  You don't handle, say

struct S { struct R { int x; int y; } r; int z; } s;

 s = { {1, 2}, 3 };
 s.r.x = 1;
 s.r.y = 2;
 struct R r = s.r;
 s.z = 3;

where s = { {1, 2}, 3} is still dead.

That is, you don't make use of the live_bytes in the ref_maybe_used_by_stmt_p
check (you can skip uses of only dead bytes).

Not sure if it makes a difference in practice (compared to the cost it
would take).

Rather than building ao_refs in clear_bytes_written_by just use
get_ref_base_and_extent
directly.

You don't handle stuff like

 s[i] = { 1, 2 };
 s[i].x = 1;
 s[i].y = 1;

either btw.

Richard.

>    *use_stmt = temp;
> +  if (live_bytes)
> +    BITMAP_FREE (live_bytes);
>
>    return true;
>  }
>

Follow-Ups:
- Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Jeff Law

References:
- [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Jeff Law
- Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Richard Biener
- Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Jeff Law
- Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Jeff Law
- Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Richard Biener
- Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Jeff Law
- Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Richard Biener
- Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.
  - From: Jeff Law

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]