66598 – [5 Regression] With -O3 gcc incorrectly assumes aligned SSE instructions (e.g. movapd) can be used

Bug 66598 - [5 Regression] With -O3 gcc incorrectly assumes aligned SSE instructions (e.g. movapd) can be used

Summary: [5 Regression] With -O3 gcc incorrectly assumes aligned SSE instructions (e.g...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.9.2

Importance:	P2 normal
Target Milestone:	6.0
Assignee:	Richard Biener

URL:
Keywords:	wrong-code

Depends on:
Blocks:	78558
	Show dependency tree / graph

Reported:	2015-06-19 13:25 UTC by Michael Lehn
Modified:	2017-10-10 14:57 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:	4.8.5, 6.0
Known to fail:	4.9.3, 5.1.0
Last reconfirmed:	2015-06-22 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Lehn 2015-06-19 13:25:16 UTC

Compiled with gcc-4.9 and gcc-5.0 and -O3 the following code causes a "Segmentation fault: 11" on all my Intel machines with SSE:

-----------------------------------
double Q[4*64];
double P[5*64];

int
main()
{
   int i, j;
   double *p = P;
   double *q = Q;

   for (j=0; j<32; ++j) {
       for (i=0; i<4; ++i) {
           q[i] = p[i];
       }
       q += 4;
       p += 5;
   }

   return 0;
}
-----------------------------------

Looking at the assembly code the problem is in

-----------------------------------
L2:
       movapd  16(%rax), %xmm0
       addq    $40, %rax
       addq    $32, %rdx
       movapd  -40(%rax), %xmm1
       movaps  %xmm0, -16(%rdx)
       movaps  %xmm1, -32(%rdx)
       cmpq    %rcx, %rax
       jne     L2
-----------------------------------

So %rax contains the address of p.  But even if p=P is initially alined correctly on a 16-Byte address P+5 is not.  So movapd must not be used.  Changing the assembly code manually to

-----------------------------------
L2:
       movupd  16(%rax), %xmm0
       addq    $40, %rax
       addq    $32, %rdx
       movupd  -40(%rax), %xmm1
       movaps  %xmm0, -16(%rdx)
       movaps  %xmm1, -32(%rdx)
       cmpq    %rcx, %rax
       jne     L2
-----------------------------------
 
fixed the problem.


Cheers,

Michael

Comment 1 Mikael Pettersson 2015-06-20 08:09:58 UTC

I can reproduce the SEGV with current gcc-6, 5, and 4.9, but not with 4.8 or 4.7.
Dropping from -O3 to -O2 also prevents the SEGV.

Comment 2 Mikael Pettersson 2015-06-20 14:11:35 UTC

Started with r197189.

Comment 3 Richard Biener 2015-06-22 11:04:39 UTC

I believe I fixed this on trunk for PR66510.  Confirmed with GCC 5 though.
GCC 6 now generates

.L2:
        movupd  16(%rax), %xmm0
        addq    $40, %rax
        addq    $32, %rdx
        movupd  -40(%rax), %xmm1
        movaps  %xmm0, -16(%rdx)
        movaps  %xmm1, -32(%rdx)
        cmpq    $P+1240, %rax
        jne     .L2
        movupd  P+1256(%rip), %xmm0
        xorl    %eax, %eax
        movupd  P+1240(%rip), %xmm1
        movaps  %xmm0, Q+1008(%rip)
        movaps  %xmm1, Q+992(%rip)
        ret

might be a bit hard to backport the changes though.

Mine nevertheless.

Comment 4 Richard Biener 2015-11-26 12:54:48 UTC

The relevant hunk is the following but as it requires us to know the
final vectorization factor it requires us to re-order (and split)
the pipeline quite a bit.  r223670 has bits of the prerequesites but not all
and r224598 has the fix below (and some unrelated bits).

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c   (revision 230924)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -701,21 +701,22 @@ vect_compute_data_ref_alignment (struct
        }
     }
 
-  /* Similarly, if we're doing basic-block vectorization, we can only use
-     base and misalignment information relative to an innermost loop if the
-     misalignment stays the same throughout the execution of the loop.
-     As above, this is the case if the stride of the dataref evenly divides
-     by the vector size.  */
-  if (!loop)
+  /* Similarly we can only use base and misalignment information relative to
+     an innermost loop if the misalignment stays the same throughout the
+     execution of the loop.  As above, this is the case if the stride of
+     the dataref evenly divides by the vector size.  */
+  else
     {
       tree step = DR_STEP (dr);
-      HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
+      unsigned vf = loop ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) : 1;
 
-      if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0)
+      if (tree_fits_shwi_p (step)
+         && ((tree_to_shwi (step) * vf)
+             % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                            "SLP: step doesn't divide the vector-size.\n");
+                            "step doesn't divide the vector-size.\n");
          misalign = NULL_TREE;
        }
     }

Comment 5 Richard Biener 2015-11-26 13:17:15 UTC

Oh, and it needs r223798.

Comment 6 Richard Biener 2016-08-03 11:48:57 UTC

GCC 4.9 branch is being closed

Comment 7 Michael Lehn 2016-08-03 11:49:46 UTC

Sehr geehrte Damen und Herren,

vielen Dank für Ihre Nachricht. Ich bin bis zum 30. September 2016 nicht im Büro und kann Ihre Nachricht nicht entgegennehmen. Ihre Mail wird nicht gelesen und nicht weiter geleitet.

Vielen Dank.

Mit freundlichen Grüßen
Michael Lehn

------------------------------------------------------------------------

I will be away until September 30 with limited internet access.  Your email will not be read.

Cheers,

Michael

Comment 8 Jakub Jelinek 2017-10-10 13:49:35 UTC

GCC 5 branch has been closed, should be fixed in GCC 6 and later.