Bug 68786 - Aligned masked store is generated for unaligned pointer
Summary: Aligned masked store is generated for unaligned pointer
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 6.0
: P3 normal
Target Milestone: ---
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2015-12-08 09:24 UTC by Ilya Enkovich
Modified: 2015-12-09 14:14 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-12-08 00:00:00


Attachments
gcc6-pr68786.patch (1.74 KB, patch)
2015-12-08 16:57 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ilya Enkovich 2015-12-08 09:24:30 UTC
Here is a testcase:

double *a;
int b;

void
test (void)
{
  for (; b; b++)
    if (b < 7)
      a[b] = 1.0;
}

Produced assembler for that loop when compiled for AVX-512:

>gcc  -O2 -ftree-vectorize -march=skylake-avx512 small.i -S

.L4:
        vpcmpd  $2, %zmm2, %zmm0, %k1
        addl    $1, %r8d
        vpaddd  %zmm3, %zmm0, %zmm0
        vmovupd %zmm1, (%rsi){%k1}
        kshiftrw        $8, %k1, %k1
        vmovapd %zmm1, 64(%rsi){%k1}
        subq    $-128, %rsi
        cmpl    %edx, %r8d
        jb      .L4

We have two store using the same base.  One of them is unaligned and another one is aligned.

The difference comes from GIMPLE.  Here is a vectorized loop:

  <bb 6>:
  # vect_vec_iv_.11_71 = PHI <vect_cst__69(5), vect_vec_iv_.11_72(6)>
  # ivtmp.26_87 = PHI <0(5), ivtmp.26_6(6)>
  # ivtmp.28_7 = PHI <ivtmp.28_24(5), ivtmp.28_8(6)>
  vectp.15_82 = (vector(8) double *) ivtmp.28_7;
  vect_vec_iv_.11_72 = vect_vec_iv_.11_71 + { 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16 };
  mask__24.12_74 = vect_vec_iv_.11_71 <= { 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6 };
  mask_patt_28.14_76 = [vec_unpack_lo_expr] mask__24.12_74;
  mask_patt_28.14_77 = [vec_unpack_hi_expr] mask__24.12_74;
  MASK_STORE (vectp.15_82, 0B, mask_patt_28.14_76, { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0 });
  _25 = ivtmp.28_7 + 64;
  _88 = (vector(8) double *) _25;
  MASK_STORE (_88, 0B, mask_patt_28.14_77, { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0 });
  ivtmp.26_6 = ivtmp.26_87 + 1;
  ivtmp.28_8 = ivtmp.28_7 + 128;
  if (ivtmp.26_6 < bnd.8_31)
    goto <bb 6>;

Pointers used for masked stores have different SSA_NAME_PTR_INFO

For vectp.15_82 we have

{pt = {anything = 0, nonlocal = 1, escaped = 1, ipa_escaped = 0, null = 0, vars_contains_nonlocal = 0, vars_contains_escaped = 0, vars_contains_escaped_heap = 0, vars = 0x7ffff7c13160}, align = 8, misalign = 0}

For _88 we have

{pt = {anything = 0, nonlocal = 1, escaped = 1, ipa_escaped = 0, null = 0, vars_contains_nonlocal = 0, vars_contains_escaped = 0, vars_contains_escaped_heap = 0, vars = 0x7ffff7c13160}, align = 0, misalign = 0}

Zero alignment here for _88 causes TYPE_ALIGN to be used for the second MASK_STORE.  TYPE_ALIGN for vector types is its size and therefore we get incorrect aligned memory access.
Comment 1 Ilya Enkovich 2015-12-08 11:03:23 UTC
Looks like an old bug.  We may see the same problem on avx2 using older compiler.  Here is a test for GCC5:

double *a;
long long c;
int *d;

void
test (void)
{
  int b;
  for (b = 0; b < 1024; b++)
    {
      d[b] = 1;
      if (c > a[b])
        a[b] = 1.0;
    }
}

>gcc  -O2 -ftree-vectorize -march=haswell small.i -S -fdump-rtl-final

In small.i.final dump:

(insn:TI 91 85 89 5 (set (mem:V4DF (plus:DI (reg:DI 0 ax [orig:151 ivtmp.36 ] [151])
                (const_int 32 [0x20])) [3 MEM[(double *)_18]+0 S32 A256])
        (unspec:V4DF [
                (reg:V4DI 22 xmm1 [169])
                (reg:V4DF 24 xmm3 [179])
                (mem:V4DF (plus:DI (reg:DI 0 ax [orig:151 ivtmp.36 ] [151])
                        (const_int 32 [0x20])) [3 MEM[(double *)_18]+0 S32 A256])
            ] UNSPEC_MASKMOV)) small.i:13 4457 {avx_maskstorepd256}
     (expr_list:REG_DEAD (reg:V4DI 22 xmm1 [169])
        (nil)))
(insn:TI 89 91 93 5 (set (mem:V4DF (reg:DI 0 ax [orig:151 ivtmp.36 ] [151]) [3 MEM[(double *)vectp_pretmp.18_75]+0 S32 A64])
        (unspec:V4DF [
                (reg:V4DI 23 xmm2 [173])
                (reg:V4DF 24 xmm3 [179])
                (mem:V4DF (reg:DI 0 ax [orig:151 ivtmp.36 ] [151]) [3 MEM[(double *)vectp_pretmp.18_75]+0 S32 A64])
            ] UNSPEC_MASKMOV)) small.i:13 4457 {avx_maskstorepd256}
     (expr_list:REG_DEAD (reg:V4DI 23 xmm2 [173])
        (nil)))

Again we have different alignment.  For avx2 it doesn't matter because we don't have aligned and unaligned variants of maskmov.  In case of regular stores alignment is correct.  The oldest compiler I tried was 20140625.
Comment 2 Richard Biener 2015-12-08 13:16:43 UTC
The zero comes from bump_vector_ptr I think, a missed optimization.  So the issue is that we don't properly use an unaligned type, likely when we expand:

static void
expand_mask_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
{
  struct expand_operand ops[3];
  tree type, lhs, rhs, maskt;
  rtx mem, reg, mask;

  maskt = gimple_call_arg (stmt, 2);
  rhs = gimple_call_arg (stmt, 3);
  type = TREE_TYPE (rhs);
  lhs = fold_build2 (MEM_REF, type, gimple_call_arg (stmt, 0),
                     gimple_call_arg (stmt, 1));


it just uses 'type' from rhs but that is just the value being stored.  type
needs to be adjusted with build_aligned_type () for the alignment of the access.
Comment 3 Jakub Jelinek 2015-12-08 15:00:27 UTC
For normal vectorized stores, the alignment is preserved through the MEM_REF/TARGET_MEM_REF type, e.g.
5991			  TREE_TYPE (data_ref)
5992			    = build_aligned_type (TREE_TYPE (data_ref),
5993						  align * BITS_PER_UNIT);
but for masked loads/stores we don't have that available, all we record is
the pointer (what we put as TREE_OPERAND (mem_ref, 0)), which can have arbitrary type, and the aliasing pointer (with always zero constant).
The alignment info we record through:
          set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
                                  misalign);
Unfortunately, it seems that this way we rely on the optimizers not to lose the alignment in the points to info, otherwise get_object_alignment_2 assumes the pointer is fully aligned.
In *.vect it is still preserved:
  # PT = nonlocal escaped
  # ALIGN = 8, MISALIGN = 0
  vectp_pretmp.14_77 = vectp_pretmp.14_74 + 32;
  # USE = anything
  # CLB = anything
  MASK_STORE (vectp_pretmp.14_77, 0B, mask__ifc__37.13_72, vect_cst__73);
but then comes ivopts and turns that into:
  _36 = ivtmp.32_7 + 32;
  # PT = nonlocal escaped
  _37 = (vector(4) double *) _36;
  # PT = nonlocal escaped
  # ALIGN = 8, MISALIGN = 0
  vectp_pretmp.14_77 = _37;
  # USE = anything
  # CLB = anything
  MASK_STORE (vectp_pretmp.14_77, 0B, mask__ifc__37.13_72, vect_cst__73);
and then dom3 turns this into:
  _36 = ivtmp.32_7 + 32;
  # PT = nonlocal escaped
  _37 = (vector(4) double *) _36;
  # PT = nonlocal escaped
  # ALIGN = 8, MISALIGN = 0
  vectp_pretmp.14_77 = _37;
  # USE = anything
  # CLB = anything
  MASK_STORE (_37, 0B, mask__36.12_70, { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0 });
and the points to info is there already only on a SSA_NAME that has zero uses.

So, I think that we need to store the original alignment from the vectorization time somewhere else, so it is not an optimization to have that info preserved.
As the second argument of MASK_{LOAD,STORE} ifn is always 0, and all it matters is the type on it, perhaps we could stick the alignment in that argument in there?  I.e. call get_object_alignment for the original load/store during tree-if-conversion.c, in the vectorizer store:
  misalign ? misalign & -misalign : align
as INTEGER_CST with the type we were using on the 0 pointer now, and then during expansion use build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)), 0)
instead of gimple_call_arg (stmt, 1), and the actual value use to build_aligned_type from the rhs type.  Richard, are you ok with that representation?

And then of course is the optimization question, that it is bad that the points to / alignment info is lost due to what ivopts and following optimization passes perform.  Should we have some late ccp pass that recomputes it, or teach ivopts to to add alignment/points to info to the stuff it creates, or teach some optimizations that if they replace one pointer with another defined in the same bb and the old one has more precise points to info (or similarly for integers and value ranges) that it could stick that better info onto the other SSA_NAME?
Comment 4 rguenther@suse.de 2015-12-08 15:11:36 UTC
On Tue, 8 Dec 2015, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68786
> 
> --- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> For normal vectorized stores, the alignment is preserved through the
> MEM_REF/TARGET_MEM_REF type, e.g.
> 5991                      TREE_TYPE (data_ref)
> 5992                        = build_aligned_type (TREE_TYPE (data_ref),
> 5993                                              align * BITS_PER_UNIT);
> but for masked loads/stores we don't have that available, all we record is
> the pointer (what we put as TREE_OPERAND (mem_ref, 0)), which can have
> arbitrary type, and the aliasing pointer (with always zero constant).
> The alignment info we record through:
>           set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
>                                   misalign);
> Unfortunately, it seems that this way we rely on the optimizers not to lose the
> alignment in the points to info, otherwise get_object_alignment_2 assumes the
> pointer is fully aligned.
> In *.vect it is still preserved:
>   # PT = nonlocal escaped
>   # ALIGN = 8, MISALIGN = 0
>   vectp_pretmp.14_77 = vectp_pretmp.14_74 + 32;
>   # USE = anything
>   # CLB = anything
>   MASK_STORE (vectp_pretmp.14_77, 0B, mask__ifc__37.13_72, vect_cst__73);
> but then comes ivopts and turns that into:
>   _36 = ivtmp.32_7 + 32;
>   # PT = nonlocal escaped
>   _37 = (vector(4) double *) _36;
>   # PT = nonlocal escaped
>   # ALIGN = 8, MISALIGN = 0
>   vectp_pretmp.14_77 = _37;
>   # USE = anything
>   # CLB = anything
>   MASK_STORE (vectp_pretmp.14_77, 0B, mask__ifc__37.13_72, vect_cst__73);
> and then dom3 turns this into:
>   _36 = ivtmp.32_7 + 32;
>   # PT = nonlocal escaped
>   _37 = (vector(4) double *) _36;
>   # PT = nonlocal escaped
>   # ALIGN = 8, MISALIGN = 0
>   vectp_pretmp.14_77 = _37;
>   # USE = anything
>   # CLB = anything
>   MASK_STORE (_37, 0B, mask__36.12_70, { 1.0e+0, 1.0e+0, 1.0e+0, 1.0e+0 });
> and the points to info is there already only on a SSA_NAME that has zero uses.
> 
> So, I think that we need to store the original alignment from the vectorization
> time somewhere else, so it is not an optimization to have that info preserved.
> As the second argument of MASK_{LOAD,STORE} ifn is always 0, and all it matters
> is the type on it, perhaps we could stick the alignment in that argument in
> there?  I.e. call get_object_alignment for the original load/store during
> tree-if-conversion.c, in the vectorizer store:
>   misalign ? misalign & -misalign : align
> as INTEGER_CST with the type we were using on the 0 pointer now, and then
> during expansion use build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)), 0)
> instead of gimple_call_arg (stmt, 1), and the actual value use to
> build_aligned_type from the rhs type.  Richard, are you ok with that
> representation?
> 
> And then of course is the optimization question, that it is bad that the points
> to / alignment info is lost due to what ivopts and following optimization
> passes perform.  Should we have some late ccp pass that recomputes it, or teach
> ivopts to to add alignment/points to info to the stuff it creates, or teach
> some optimizations that if they replace one pointer with another defined in the
> same bb and the old one has more precise points to info (or similarly for
> integers and value ranges) that it could stick that better info onto the other
> SSA_NAME?

I think you need the alignment on the actual reference if it is needed
for correctness (or always assume unaligned instead of using an aligned
type).  Yes, the alias type pointer can be used as vehicle for misalign
info.

Sure it's "bad" when we lose the info but this just can happen...
(eventually we can track down offenders and try to preserve it)
Comment 5 Jakub Jelinek 2015-12-08 16:57:28 UTC
Created attachment 36963 [details]
gcc6-pr68786.patch

Untested fix.  Not adding any testcase, because the one included here is bad - Micha's loop splitting optimization is likely going to turn that into two separate loops, plus to avoid undefined behavior one has to call test with b <= 0 and in that case it will be always < 7.
Comment 6 Jakub Jelinek 2015-12-09 13:42:38 UTC
Author: jakub
Date: Wed Dec  9 13:42:06 2015
New Revision: 231454

URL: https://gcc.gnu.org/viewcvs?rev=231454&root=gcc&view=rev
Log:
	PR tree-optimization/68786
	* tree-if-conv.c: Include builtins.h.
	(predicate_mem_writes): Put result of get_object_alignment (ref)
	into second argument's value.
	* tree-vect-stmts.c (vectorizable_mask_load_store): Put minimum
	pointer alignment into second argument's value.
	* tree-data-ref.c (get_references_in_stmt): Use value of second
	argument for build_aligned_type, and only the type to build
	a zero second argument for MEM_REF.
	* internal-fn.c (expand_mask_load_optab_fn,
	expand_mask_store_optab_fn): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/internal-fn.c
    trunk/gcc/tree-data-ref.c
    trunk/gcc/tree-if-conv.c
    trunk/gcc/tree-vect-stmts.c
Comment 7 Jakub Jelinek 2015-12-09 14:14:51 UTC
Fixed.