Bug 77550 - [6/7 Regression] std::deque with -O3 has infinite std::distance
Summary: [6/7 Regression] std::deque with -O3 has infinite std::distance
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 6.2.0
: P3 normal
Target Milestone: 6.3
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2016-09-10 12:14 UTC by Daniel Cooke
Modified: 2016-10-25 05:10 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work: 5.3.0
Known to fail: 6.2.1, 7.0
Last reconfirmed: 2016-09-10 00:00:00


Attachments
unreduced testcase (103.07 KB, application/x-xz)
2016-09-10 12:31 UTC, Markus Trippelsdorf
Details
somewhat reduced testcase (2.69 KB, text/plain)
2016-09-12 05:36 UTC, Markus Trippelsdorf
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Cooke 2016-09-10 12:14:33 UTC
The following program, compiled with -O3, never returns:

#include <string>
#include <deque>
#include <iterator>
#include <iostream>

#include <boost/iterator/filter_iterator.hpp>

struct Foo
{
    std::string bar, s = "";
    char a = '\0';
};

int main()
{
    const std::deque<Foo> foos(14, {""});
    const std::string test {};
    const auto p = [test] (const auto& foo) { return foo.bar == test; };
    using boost::make_filter_iterator;
    const auto begin = make_filter_iterator(p, std::cbegin(foos), std::cend(foos));
    const auto end   = make_filter_iterator(p, std::cend(foos), std::cend(foos));
    std::cout << std::distance(begin, end) << std::endl;
}

Observations:

 - GCC with optimisations -O2 or less returns as expected.
 - Clang (3.8) returns the correct answer with any optimisation level.
 - Changing std::deque to std::vector or std::list results in expected behaviour.
 - The 14 is critical; anything less and the problem disappears.
 - The sizeof(Foo) is important; removing s or a makes the problem go away.
 - Capturing test by reference, or just comparing to a constant expression (e.g. foo.bar == " ") results in normal behaviour.
 - There are no compiler warnings (with -Wall -Wextra -pedantic).
 - Valgrind reports no errors.
 - Use fsanitize=undefined and the problem goes away.

From: http://stackoverflow.com/q/39424753/2970186
Comment 1 Markus Trippelsdorf 2016-09-10 12:29:34 UTC
-fno-tree-slp-vectorize "fixes" the issue.
Comment 2 Markus Trippelsdorf 2016-09-10 12:31:24 UTC
Created attachment 39596 [details]
unreduced testcase
Comment 3 Daniel Cooke 2016-09-10 12:32:44 UTC
-fno-strict-aliasing also resolves.
Comment 4 Daniel Cooke 2016-09-10 13:34:00 UTC
@Jonathan Wakely: Why do you not consider this critical?
Comment 5 Markus Trippelsdorf 2016-09-10 13:36:37 UTC
(In reply to Daniel Cooke from comment #4)
> @Jonathan Wakely: Why do you not consider this critical?

We don't use this field. So normal is just the default.
Comment 6 Jonathan Wakely 2016-09-10 14:02:05 UTC
And if we did use it, it would be the severity as determined by the GCC project, not according to the bug reporter. Bug reporters tend to consider their own bugs of higher importance than everyone else's bugs :)
Comment 7 Markus Trippelsdorf 2016-09-10 14:20:11 UTC
Well, the best solution would be to simply disable the field for users.
This what other bugzillas do.
Comment 8 Markus Trippelsdorf 2016-09-10 14:29:53 UTC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77551
Comment 9 Bernd Edlinger 2016-09-11 18:27:22 UTC
I'm unable to reduce the test case...


The deque_iterator has these members:

      _Elt_pointer _M_cur;
      _Elt_pointer _M_first;
      _Elt_pointer _M_last;
      _Map_pointer _M_node;

the first 3 elements have alias set 12
while _M_node has alias set 13.

when all 4 elements are assigned that becomes
2 vector statements, but all use alias set 12.

but _M_node is read back in the loop using alias set 13.
and thus it seems _M_node is constant.

It can be seen first in expand.

_M_node is read without non-vectorized:

(insn 290 289 291 44 (set (reg/f:DI 117 [ prephitmp_166 ])
        (mem/f/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
                (const_int -176 [0xffffffffffffff50])) [12 MEM[(struct _Deque_iterator *)&D.66017]._M_last+0 S8 A128])) -1
     (nil))
(insn 291 290 292 44 (set (reg/f:DI 149 [ _215 ])
        (mem/f/c:DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
                (const_int -168 [0xffffffffffffff58])) [13 MEM[(struct _Deque_iterator *)&D.66017]._M_node+0 S8 A64])) /var/tmp/gcc_test/usr/local/include/c++/7.0.0/bits/stl_deque.h:173 -1
     (nil))


_M_node is written using _M_last's alias set in insn 307:

(insn 305 304 306 46 (set (mem/c:V2DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
                (const_int -192 [0xffffffffffffff40])) [12 MEM[(struct Foo * *)&D.66017]+0 S16 A128])
        (reg:V2DI 237)) /var/tmp/gcc_test/usr/local/include/c++/7.0.0/bits/stl_deque.h:174 -1
     (nil))
(insn 306 305 307 46 (set (reg:V2DI 238)
        (vec_concat:V2DI (reg/f:DI 117 [ prephitmp_166 ])
            (reg:DI 90 [ _25 ]))) /var/tmp/gcc_test/usr/local/include/c++/7.0.0/bits/stl_deque.h:258 -1
     (nil))
(insn 307 306 308 46 (set (mem/c:V2DI (plus:DI (reg/f:DI 82 virtual-stack-vars)
                (const_int -176 [0xffffffffffffff50])) [12 MEM[(struct Foo * *)&D.66017 + 16B]+0 S16 A128])
        (reg:V2DI 238)) /var/tmp/gcc_test/usr/local/include/c++/7.0.0/bits/stl_deque.h:174 -1
     (nil))
Comment 10 Andrew Pinski 2016-09-11 18:36:02 UTC
(In reply to Bernd Edlinger from comment #9)
> I'm unable to reduce the test case...
> 
> 
> The deque_iterator has these members:
> 
>       _Elt_pointer _M_cur;
>       _Elt_pointer _M_first;
>       _Elt_pointer _M_last;
>       _Map_pointer _M_node;
> 
> the first 3 elements have alias set 12
> while _M_node has alias set 13.
> 
> when all 4 elements are assigned that becomes
> 2 vector statements, but all use alias set 12.

The vector stores should have become the aliasing set of the struct instead as suggested on the store widening patch.
Comment 11 Markus Trippelsdorf 2016-09-11 19:56:29 UTC
(In reply to Bernd Edlinger from comment #9)
> I'm unable to reduce the test case...

Creduce is running and hopefully I will have a small testcase tomorrow morning.


trippels@gcc2-power8 ~ % cat check.sh
#!/bin/sh
ulimit -t 2
clang++ -Wall -Werror -std=c++14 -O3 -fsanitize=undefined test.ii
  if ! test $? = 0; then
    exit 1
  fi
./a.out  2>&1 | grep "runtime error"
  if test $? = 0; then
    exit 1
  fi
./a.out  
  if ! test $? = 0; then
    exit 1
  fi
~/gcc_5/usr/local/bin/g++ -std=c++14 -O3 test.ii -w -Wfatal-errors
  if ! test $? = 0; then
    exit 1
  fi
./a.out 
  if ! test $? = 0; then
    exit 1
  fi
g++ -O2 test.ii -w -Wfatal-errors
  if ! test $? = 0; then
    exir 1
  fi
valgrind -q --error-exitcode=1 ./a.out
  if ! test $? = 0; then
    exit 1
  fi
g++ -O3 test.ii -w -Wfatal-errors
  if ! test $? = 0; then
    exit 1
  fi
./a.out 
  if ! test $? = 137; then
    exit 1
  fi
Comment 12 Bernd Edlinger 2016-09-11 19:59:41 UTC
(In reply to Markus Trippelsdorf from comment #11)
> (In reply to Bernd Edlinger from comment #9)
> > I'm unable to reduce the test case...
> 
> Creduce is running and hopefully I will have a small testcase tomorrow
> morning.
> 
> 

good luck.

here is a hacky patch, test case passes, but I don't know
if it is going design-wise in the right direction:


--- tree-vect-stmts.c	2016-09-11 19:53:01.841182370 +0200
+++ tree-vect-stmts.c	2016-09-11 19:34:15.691171637 +0200
@@ -5482,6 +5483,7 @@
   gimple *new_stmt;
   int vf;
   vec_load_store_type vls_type;
+  bool alias_p = false;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -5740,6 +5742,16 @@
       first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
       first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
       group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
+      next_stmt = GROUP_NEXT_ELEMENT (vinfo_for_stmt (first_stmt));
+      for (i = 1; i < group_size; i++)
+	{
+	  struct data_reference *next_dr
+	    = STMT_VINFO_DATA_REF (vinfo_for_stmt (next_stmt));
+	  if (get_alias_set (DR_REF (first_dr))
+	      != get_alias_set (DR_REF (next_dr)))
+	    alias_p = true;
+	  next_stmt = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next_stmt));
+	}
 
       GROUP_STORE_COUNT (vinfo_for_stmt (first_stmt))++;
 
@@ -6174,8 +6188,10 @@
 				      dataref_ptr,
 				      dataref_offset
 				      ? dataref_offset
-				      : build_int_cst (reference_alias_ptr_type
-						       (DR_REF (first_dr)), 0));
+				      : build_int_cst (alias_p
+						       ? ptr_type_node
+						       : reference_alias_ptr_type
+						         (DR_REF (first_dr)), 0));
 	      align = TYPE_ALIGN_UNIT (vectype);
 	      if (aligned_access_p (first_dr))
 		misalign = 0;
Comment 13 Markus Trippelsdorf 2016-09-12 05:36:28 UTC
Created attachment 39606 [details]
somewhat reduced testcase
Comment 14 Markus Trippelsdorf 2016-09-12 05:41:19 UTC
The reduced testcase needs libstdc++.so from trunk. Otherwise you'll get
undefined reference errors to _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_Alloc_hiderC1EPcOS3_ .
Comment 15 Richard Biener 2016-09-13 12:59:19 UTC
The patch should at least work (loads are similarly affected).  At some point we want some infrastructure in the middle-end to chose a better fallback (as Andrew says, if possible we can use the alias-set of the containing aggregate for example).
Comment 16 Bernd Edlinger 2016-09-13 13:49:00 UTC
(In reply to Richard Biener from comment #15)
> The patch should at least work (loads are similarly affected).  At some
> point we want some infrastructure in the middle-end to chose a better
> fallback (as Andrew says, if possible we can use the alias-set of the
> containing aggregate for example).

Good.  Then I go ahead and look for a way how to fix all of the different
load and store code path.  If the whole group does not agree in the
alias_set I would fall back to ptr_type_node, I think if I put that
code in an extra function it can be changed later to the alias-set
of the containing aggregate.
Comment 17 Bernd Edlinger 2016-09-21 13:04:31 UTC
Author: edlinger
Date: Wed Sep 21 13:03:59 2016
New Revision: 240313

URL: https://gcc.gnu.org/viewcvs?rev=240313&root=gcc&view=rev
Log:
gcc:
2016-09-21  Bernd Edlinger  <bernd.edlinger@hotmail.de>

        PR tree-optimization/77550
        * tree-vect-stmts.c (create_array_ref): Change parameters.
        (get_group_alias_ptr_type): New function.
        (vectorizable_store, vectorizable_load): Use get_group_alias_ptr_type.

testsuite:
2016-09-21  Bernd Edlinger  <bernd.edlinger@hotmail.de>
        
        PR tree-optimization/77550
        * g++.dg/pr77550.C: New test.

Added:
    trunk/gcc/testsuite/g++.dg/pr77550.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-stmts.c
Comment 18 Bernd Edlinger 2016-10-24 20:21:38 UTC
Author: edlinger
Date: Mon Oct 24 20:21:06 2016
New Revision: 241495

URL: https://gcc.gnu.org/viewcvs?rev=241495&root=gcc&view=rev
Log:
2016-10-24  Bernd Edlinger  <bernd.edlinger@hotmail.de>

        Backport from mainline
        2016-09-21  Bernd Edlinger  <bernd.edlinger@hotmail.de>

        PR tree-optimization/77550
        * tree-vect-stmts.c (create_array_ref): Change parameters.
        (get_group_alias_ptr_type): New function.
        (vectorizable_store, vectorizable_load): Use get_group_alias_ptr_type.

        testsuite:
        2016-09-21  Bernd Edlinger  <bernd.edlinger@hotmail.de>
        
        PR tree-optimization/77550
        * g++.dg/pr77550.C: New test.

Added:
    branches/gcc-6-branch/gcc/testsuite/g++.dg/pr77550.C
Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/testsuite/ChangeLog
    branches/gcc-6-branch/gcc/tree-vect-stmts.c
Comment 19 Bernd Edlinger 2016-10-25 05:10:23 UTC
fixed