Bug 47397

Summary: Alignment of array element is not optimal in AVX mode due to use of TARGET_MEM_REFs
Product: gcc Reporter: H.J. Lu <hjl.tools>
Component: middle-endAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: areg.melikadamyan
Priority: P3    
Version: 4.6.0   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2011-01-21 14:21:20

Description H.J. Lu 2011-01-21 13:57:30 UTC
In

---
double a[NUM], b[NUM];

void foo()
{
   for (i = 0; i < N; i++)
   {
      b[i] = a[i+2] * 10.0;
   }
}
---

both "a" and "b" are aligned at 32byte/256bits. However, RTL dump from
"-O3 -mavx" shows that alignment of a[i+2] is 64bits instead of 128bits
as expected:

(insn 39 38 40 4 (set (mem:V4DF (plus:DI (reg/f:DI 95)
                (reg:DI 80 [ ivtmp.21 ])) [2 MEM[symbol: b, index: ivtmp.21_20,
offset: 16B]+0 S32 A64])
        (unspec:V4DF [
                (reg:V4DF 93)
            ] UNSPEC_MOVU)) align.c:64 -1
     (nil))
Comment 1 H.J. Lu 2011-01-21 14:05:21 UTC
The bug seems in get_object_alignment.
Comment 2 H.J. Lu 2011-01-21 14:11:59 UTC
This testcase can compile

---
double a[1024], b[1024];

void foo()
{
  int i;
   for (i = 0; i < sizeof (a)/sizeof (a[0]) ; i++)
   {
      b[i] = a[i+2] * 10.0;
   }
}
---
Comment 3 Richard Biener 2011-01-21 14:15:49 UTC
Investigating.
Comment 4 Richard Biener 2011-01-21 14:21:20 UTC
This is because index is unknown in both

  vect_var_.13_20 = MEM[symbol: a, index: ivtmp.24_12, offset: 16B];

and

  MEM[symbol: b, index: ivtmp.24_12, offset: 0B] = vect_var_.14_22;

we don't have a way to record that ivtmp.24_12 is {0, +, 32} for
RTL expansion.  Which means that IVOPTs is responsible for the
loss of information.  Before IVOPTs we have

  # ALIGN = 32, MISALIGN = 16
  # vect_pa.9_18 = PHI <vect_pa.9_19(4), vect_pa.12_17(2)>
  # ALIGN = 32, MISALIGN = 0
  # vect_pb.16_24 = PHI <vect_pb.16_25(4), vect_pb.19_23(2)>
  vect_var_.13_20 = MEM[(double[1024] *)vect_pa.9_18];
  MEM[(double[1024] *)vect_pb.16_24] = vect_var_.14_22;

so we know that the pointers we dereference are properly aligned.
Comment 5 Richard Biener 2011-01-21 14:22:40 UTC
Thus, the real reason is that we lack alignment information on MEM_REFs/TARGET_MEM_REFs but only have pointer alignment information for now.
Comment 6 H.J. Lu 2011-01-21 14:45:31 UTC
(In reply to comment #5)
> Thus, the real reason is that we lack alignment information on
> MEM_REFs/TARGET_MEM_REFs but only have pointer alignment information for now.

I know it won't solve this bug.  But shouldn't we at least use alignment
of array element instead of pointer alignment?
Comment 7 Richard Biener 2011-01-21 15:35:29 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Thus, the real reason is that we lack alignment information on
> > MEM_REFs/TARGET_MEM_REFs but only have pointer alignment information for now.
> 
> I know it won't solve this bug.  But shouldn't we at least use alignment
> of array element instead of pointer alignment?

Well, part of it is half-way re-introducing type-alignment stuff,
deviating from the intial conservative-correctness.


Thus, we run into

        align = MAX (TYPE_ALIGN (TREE_TYPE (exp)),
                     get_object_alignment (exp, BIGGEST_ALIGNMENT));

for deciding whether to use movmisalign_optab but in set_mem_attrs_minus_bitpos
we do not fall back to TYPE_ALIGNment at all (which is conservative and
by initial design).

You might argue we want consistency here.

Note that this shouldn't be an issue for AVX/SSE as unaligned moves are
as fast as aligned ones if they are really aligned (at least I hope
this is true for AVX, it definitely is for SSE on recent archs - if not
the vectorizer cost model would need adjustment).
Comment 8 Richard Biener 2011-01-21 15:37:33 UTC
And before you start fiddling with set_mem_attributes_minus_bitpos, most
alignment related pieces in it should simply re-use get_object_alignment
(a cleanup I didn't finish in time for 4.6).
Comment 9 H.J. Lu 2011-01-21 16:03:41 UTC
(In reply to comment #7)
> 
> Note that this shouldn't be an issue for AVX/SSE as unaligned moves are
> as fast as aligned ones if they are really aligned (at least I hope
> this is true for AVX, it definitely is for SSE on recent archs - if not
> the vectorizer cost model would need adjustment).

On SNB, we want to split 32byte unaligned load/store.
Comment 10 Richard Biener 2011-01-21 16:21:55 UTC
You can play with a cleanup patch I have lying around, queued for 4.7:

Index: emit-rtl.c
===================================================================
*** emit-rtl.c  (revision 169088)
--- emit-rtl.c  (working copy)
*************** set_mem_attributes_minus_bitpos (rtx ref
*** 1611,1654 ****
  
    /* We can set the alignment from the type if we are making an object,
       this is an INDIRECT_REF, or if TYPE_ALIGN_OK.  */
!   if (objectp || TREE_CODE (t) == INDIRECT_REF || TYPE_ALIGN_OK (type))
!     align = MAX (align, TYPE_ALIGN (type));
! 
!   else if (TREE_CODE (t) == MEM_REF)
      {
!       tree op0 = TREE_OPERAND (t, 0);
!       if (TREE_CODE (op0) == ADDR_EXPR
!         && (DECL_P (TREE_OPERAND (op0, 0))
!             || CONSTANT_CLASS_P (TREE_OPERAND (op0, 0))))
!       {
!         if (DECL_P (TREE_OPERAND (op0, 0)))
!           align = DECL_ALIGN (TREE_OPERAND (op0, 0));
!         else if (CONSTANT_CLASS_P (TREE_OPERAND (op0, 0)))
!           {
!             align = TYPE_ALIGN (TREE_TYPE (TREE_OPERAND (op0, 0)));
! #ifdef CONSTANT_ALIGNMENT
!             align = CONSTANT_ALIGNMENT (TREE_OPERAND (op0, 0), align);
! #endif
!           }
!         if (TREE_INT_CST_LOW (TREE_OPERAND (t, 1)) != 0)
!           {
!             unsigned HOST_WIDE_INT ioff
!               = TREE_INT_CST_LOW (TREE_OPERAND (t, 1));
!             unsigned HOST_WIDE_INT aoff = (ioff & -ioff) * BITS_PER_UNIT;
!             align = MIN (aoff, align);
!           }
!       }
!       else
        /* ??? This isn't fully correct, we can't set the alignment from the
           type in all cases.  */
        align = MAX (align, TYPE_ALIGN (type));
      }
  
-   else if (TREE_CODE (t) == TARGET_MEM_REF)
-     /* ??? This isn't fully correct, we can't set the alignment from the
-        type in all cases.  */
-     align = MAX (align, TYPE_ALIGN (type));
- 
    /* If the size is known, we can set that.  */
    if (TYPE_SIZE_UNIT (type) && host_integerp (TYPE_SIZE_UNIT (type), 1))
      size = GEN_INT (tree_low_cst (TYPE_SIZE_UNIT (type), 1));
--- 1611,1628 ----
  
    /* We can set the alignment from the type if we are making an object,
       this is an INDIRECT_REF, or if TYPE_ALIGN_OK.  */
!   if (!TYPE_P (t))
      {
!       align = MAX (align, get_object_alignment (t, BIGGEST_ALIGNMENT));
!       if (objectp || TYPE_ALIGN_OK (type))
!       align = MAX (align, TYPE_ALIGN (type));
!       else if (TREE_CODE (t) == TARGET_MEM_REF
!              || TREE_CODE (t) == MEM_REF)
        /* ??? This isn't fully correct, we can't set the alignment from the
           type in all cases.  */
        align = MAX (align, TYPE_ALIGN (type));
      }
  
    /* If the size is known, we can set that.  */
    if (TYPE_SIZE_UNIT (type) && host_integerp (TYPE_SIZE_UNIT (type), 1))
      size = GEN_INT (tree_low_cst (TYPE_SIZE_UNIT (type), 1));


but I wonder why we are not falling into the

-   else if (TREE_CODE (t) == TARGET_MEM_REF)
-     /* ??? This isn't fully correct, we can't set the alignment from the
-        type in all cases.  */
-     align = MAX (align, TYPE_ALIGN (type));

case right now, or what I am missing.

Ah, for the offsetted load the vectorizer uses a 64bit aligned type.
So it either cannot compute the alignment or fails to use an aligned type.