Bug 66598 - [5 Regression] With -O3 gcc incorrectly assumes aligned SSE instructions (e.g. movapd) can be used
Summary: [5 Regression] With -O3 gcc incorrectly assumes aligned SSE instructions (e.g...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.2
: P2 normal
Target Milestone: 6.0
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on:
Blocks: 78558
  Show dependency treegraph
 
Reported: 2015-06-19 13:25 UTC by Michael Lehn
Modified: 2017-10-10 14:57 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.8.5, 6.0
Known to fail: 4.9.3, 5.1.0
Last reconfirmed: 2015-06-22 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Lehn 2015-06-19 13:25:16 UTC
Compiled with gcc-4.9 and gcc-5.0 and -O3 the following code causes a "Segmentation fault: 11" on all my Intel machines with SSE:

-----------------------------------
double Q[4*64];
double P[5*64];

int
main()
{
   int i, j;
   double *p = P;
   double *q = Q;

   for (j=0; j<32; ++j) {
       for (i=0; i<4; ++i) {
           q[i] = p[i];
       }
       q += 4;
       p += 5;
   }

   return 0;
}
-----------------------------------

Looking at the assembly code the problem is in

-----------------------------------
L2:
       movapd  16(%rax), %xmm0
       addq    $40, %rax
       addq    $32, %rdx
       movapd  -40(%rax), %xmm1
       movaps  %xmm0, -16(%rdx)
       movaps  %xmm1, -32(%rdx)
       cmpq    %rcx, %rax
       jne     L2
-----------------------------------

So %rax contains the address of p.  But even if p=P is initially alined correctly on a 16-Byte address P+5 is not.  So movapd must not be used.  Changing the assembly code manually to

-----------------------------------
L2:
       movupd  16(%rax), %xmm0
       addq    $40, %rax
       addq    $32, %rdx
       movupd  -40(%rax), %xmm1
       movaps  %xmm0, -16(%rdx)
       movaps  %xmm1, -32(%rdx)
       cmpq    %rcx, %rax
       jne     L2
-----------------------------------
 
fixed the problem.


Cheers,

Michael
Comment 1 Mikael Pettersson 2015-06-20 08:09:58 UTC
I can reproduce the SEGV with current gcc-6, 5, and 4.9, but not with 4.8 or 4.7.
Dropping from -O3 to -O2 also prevents the SEGV.
Comment 2 Mikael Pettersson 2015-06-20 14:11:35 UTC
Started with r197189.
Comment 3 Richard Biener 2015-06-22 11:04:39 UTC
I believe I fixed this on trunk for PR66510.  Confirmed with GCC 5 though.
GCC 6 now generates

.L2:
        movupd  16(%rax), %xmm0
        addq    $40, %rax
        addq    $32, %rdx
        movupd  -40(%rax), %xmm1
        movaps  %xmm0, -16(%rdx)
        movaps  %xmm1, -32(%rdx)
        cmpq    $P+1240, %rax
        jne     .L2
        movupd  P+1256(%rip), %xmm0
        xorl    %eax, %eax
        movupd  P+1240(%rip), %xmm1
        movaps  %xmm0, Q+1008(%rip)
        movaps  %xmm1, Q+992(%rip)
        ret

might be a bit hard to backport the changes though.

Mine nevertheless.
Comment 4 Richard Biener 2015-11-26 12:54:48 UTC
The relevant hunk is the following but as it requires us to know the
final vectorization factor it requires us to re-order (and split)
the pipeline quite a bit.  r223670 has bits of the prerequesites but not all
and r224598 has the fix below (and some unrelated bits).

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c   (revision 230924)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -701,21 +701,22 @@ vect_compute_data_ref_alignment (struct
        }
     }
 
-  /* Similarly, if we're doing basic-block vectorization, we can only use
-     base and misalignment information relative to an innermost loop if the
-     misalignment stays the same throughout the execution of the loop.
-     As above, this is the case if the stride of the dataref evenly divides
-     by the vector size.  */
-  if (!loop)
+  /* Similarly we can only use base and misalignment information relative to
+     an innermost loop if the misalignment stays the same throughout the
+     execution of the loop.  As above, this is the case if the stride of
+     the dataref evenly divides by the vector size.  */
+  else
     {
       tree step = DR_STEP (dr);
-      HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
+      unsigned vf = loop ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) : 1;
 
-      if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0)
+      if (tree_fits_shwi_p (step)
+         && ((tree_to_shwi (step) * vf)
+             % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                            "SLP: step doesn't divide the vector-size.\n");
+                            "step doesn't divide the vector-size.\n");
          misalign = NULL_TREE;
        }
     }
Comment 5 Richard Biener 2015-11-26 13:17:15 UTC
Oh, and it needs r223798.
Comment 6 Richard Biener 2016-08-03 11:48:57 UTC
GCC 4.9 branch is being closed
Comment 7 Michael Lehn 2016-08-03 11:49:46 UTC
Sehr geehrte Damen und Herren,

vielen Dank für Ihre Nachricht. Ich bin bis zum 30. September 2016 nicht im Büro und kann Ihre Nachricht nicht entgegennehmen. Ihre Mail wird nicht gelesen und nicht weiter geleitet.

Vielen Dank.

Mit freundlichen Grüßen
Michael Lehn

------------------------------------------------------------------------

I will be away until September 30 with limited internet access.  Your email will not be read.

Cheers,

Michael
Comment 8 Jakub Jelinek 2017-10-10 13:49:35 UTC
GCC 5 branch has been closed, should be fixed in GCC 6 and later.