This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Tweak BB analysis for dr_analyze_innermost


On Wed, Jun 28, 2017 at 3:36 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> dr_analyze_innermost had a "struct loop *nest" parameter that acted
> like a boolean.  This was added in r179161, with the idea that a
> null nest selected BB-level analysis rather than loop analysis.
>
> The handling seemed strange though.  If the DR was part of a loop,
> we still tried to express the base and offset values as IVs, potentially
> giving a nonzero step.  If that failed for any reason, we'd revert to
> using the original base and offset, just as we would if we hadn't asked
> for an IV in the first place.
>
> It seems more natural to use the !in_loop handling whenever nest is null
> and always set the step to zero.  This actually enables one more SLP
> opportunity in bb-slp-pr65935.c.
>
> I checked out r179161 and tried the patch there.  The test case added
> in that revision still passes, so I don't think there was any particular
> need to check simple_iv.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

I have a few additional comments for consideration.  I remembered code
in vect_compute_data_ref_alignment explicitely looking at DR_STEP in
BB mode:

  /* Similarly we can only use base and misalignment information relative to
     an innermost loop if the misalignment stays the same throughout the
     execution of the loop.  As above, this is the case if the stride of
     the dataref evenly divides by the vector size.  */
  else
    {
      tree step = DR_STEP (dr);
      unsigned vf = loop ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) : 1;

      if (tree_fits_shwi_p (step)
          && ((tree_to_shwi (step) * vf)
              % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                             "step doesn't divide the vector-size.\n");
          misalign = NULL_TREE;
        }
    }

so I guess with the change we may end up with worse (misalign is
always NULL?) or bogus (with DR_STEP == 0 the test is always false)
alignment analysis results for BB vectorization?  I guess worse
or we had the issue before if DR_STEP was just not analyzable.

The testcase that now gets vectorized tried to use the asm to prevent
vectorization but that of course doesn't work for BB vectorization.
Using volatile stores might.  I guess the chance that we miscompile
both loops in a way that we still pass the testcase (even the compare
loop could be BB vectorized I guess) is unlikely.

The patch is ok but I guess we need to keep an eye on BB vectorization
results for targets w/o unaligned vector loads/stores for the above
alignment issue?

Thanks,
Richard.

> Richard
>
>
> 2017-06-28  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * tree-data-ref.c (dr_analyze_innermost): Replace the "nest"
>         parameter with a "loop" parameter and use it instead of the
>         loop containing DR_STMT.  Don't check simple_iv when doing
>         BB analysis.  Describe the two analysis modes in the comment.
>
> gcc/testsuite/
>         * gcc.dg/vect/bb-slp-pr65935.c: Expect SLP to be used in main
>         as well.
>
> Index: gcc/tree-data-ref.c
> ===================================================================
> --- gcc/tree-data-ref.c 2017-06-28 14:33:41.294720044 +0100
> +++ gcc/tree-data-ref.c 2017-06-28 14:35:30.475155670 +0100
> @@ -749,15 +749,29 @@ canonicalize_base_object_address (tree a
>    return build_fold_addr_expr (TREE_OPERAND (addr, 0));
>  }
>
> -/* Analyzes the behavior of the memory reference DR in the innermost loop or
> -   basic block that contains it.  Returns true if analysis succeed or false
> -   otherwise.  */
> +/* Analyze the behavior of memory reference DR.  There are two modes:
> +
> +   - BB analysis.  In this case we simply split the address into base,
> +     init and offset components, without reference to any containing loop.
> +     The resulting base and offset are general expressions and they can
> +     vary arbitrarily from one iteration of the containing loop to the next.
> +     The step is always zero.
> +
> +   - loop analysis.  In this case we analyze the reference both wrt LOOP
> +     and on the basis that the reference occurs (is "used") in LOOP;
> +     see the comment above analyze_scalar_evolution_in_loop for more
> +     information about this distinction.  The base, init, offset and
> +     step fields are all invariant in LOOP.
> +
> +   Perform BB analysis if LOOP is null, or if LOOP is the function's
> +   dummy outermost loop.  In other cases perform loop analysis.
> +
> +   Return true if the analysis succeeded and store the results in DR if so.
> +   BB analysis can only fail for bitfield or reversed-storage accesses.  */
>
>  bool
> -dr_analyze_innermost (struct data_reference *dr, struct loop *nest)
> +dr_analyze_innermost (struct data_reference *dr, struct loop *loop)
>  {
> -  gimple *stmt = DR_STMT (dr);
> -  struct loop *loop = loop_containing_stmt (stmt);
>    tree ref = DR_REF (dr);
>    HOST_WIDE_INT pbitsize, pbitpos;
>    tree base, poffset;
> @@ -806,22 +820,11 @@ dr_analyze_innermost (struct data_refere
>
>    if (in_loop)
>      {
> -      if (!simple_iv (loop, loop_containing_stmt (stmt), base, &base_iv,
> -                      nest ? true : false))
> +      if (!simple_iv (loop, loop, base, &base_iv, true))
>          {
> -          if (nest)
> -            {
> -              if (dump_file && (dump_flags & TDF_DETAILS))
> -                fprintf (dump_file, "failed: evolution of base is not"
> -                                    " affine.\n");
> -              return false;
> -            }
> -          else
> -            {
> -              base_iv.base = base;
> -              base_iv.step = ssize_int (0);
> -              base_iv.no_overflow = true;
> -            }
> +         if (dump_file && (dump_flags & TDF_DETAILS))
> +           fprintf (dump_file, "failed: evolution of base is not affine.\n");
> +         return false;
>          }
>      }
>    else
> @@ -843,22 +846,11 @@ dr_analyze_innermost (struct data_refere
>            offset_iv.base = poffset;
>            offset_iv.step = ssize_int (0);
>          }
> -      else if (!simple_iv (loop, loop_containing_stmt (stmt),
> -                           poffset, &offset_iv,
> -                          nest ? true : false))
> +      else if (!simple_iv (loop, loop, poffset, &offset_iv, true))
>          {
> -          if (nest)
> -            {
> -              if (dump_file && (dump_flags & TDF_DETAILS))
> -                fprintf (dump_file, "failed: evolution of offset is not"
> -                                    " affine.\n");
> -              return false;
> -            }
> -          else
> -            {
> -              offset_iv.base = poffset;
> -              offset_iv.step = ssize_int (0);
> -            }
> +         if (dump_file && (dump_flags & TDF_DETAILS))
> +           fprintf (dump_file, "failed: evolution of offset is not affine.\n");
> +         return false;
>          }
>      }
>
> @@ -1077,7 +1069,7 @@ create_data_ref (loop_p nest, loop_p loo
>    DR_REF (dr) = memref;
>    DR_IS_READ (dr) = is_read;
>
> -  dr_analyze_innermost (dr, nest);
> +  dr_analyze_innermost (dr, nest != NULL ? loop : NULL);
>    dr_analyze_indices (dr, nest, loop);
>    dr_analyze_alias (dr);
>
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c  2017-06-28 14:33:41.294720044 +0100
> +++ gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c  2017-06-28 14:34:09.357523025 +0100
> @@ -57,4 +57,6 @@ int main()
>    return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */
> +/* We should also be able to use 2-lane SLP to initialize the real and
> +   imaginary components in the first loop of main.  */
> +/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp1" } } */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]