Bug 85478 - ICE with single element vector
Summary: ICE with single element vector
Status: CLOSED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 8.0.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-20 08:50 UTC by Andreas Krebbel
Modified: 2018-06-04 10:55 UTC (History)
3 users (show)

See Also:
Host: s390x-*-*
Target: s390x-*-*
Build: s390x-*-*
Known to work:
Known to fail:
Last reconfirmed: 2018-04-23 00:00:00


Attachments
Autoreduced testcase (409 bytes, text/plain)
2018-04-20 08:50 UTC, Andreas Krebbel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Krebbel 2018-04-20 08:50:32 UTC
Created attachment 43996 [details]
Autoreduced testcase

Compiling the attached testcase triggers an ICE

cc1plus -march=arch12 -O3 -fpermissive t.cc

Performing interprocedural optimizations           
 <*free_lang_data> <visibility> <build_ssa_passes> <opt_local_passes> <targetclone> <free-fnsummary> <whole-program> <profile_estimate> <icf> <devirt> <cp> <fnsummary> <inline> <pure-const> <free-fnsummary> <static-var> <single-use> <comdats>Assembling functions:
 <materialize-all-clones> s<ae>& s<ae>::operator=(const s<t>&) [with t = ab<float>; ae = ab<long double>]during GIMPLE pass: vect

t2.cc: In member function ‘s<ae>& s<ae>::operator=(const s<t>&) [with t = ab<float>; ae = ab<long double>]’:
t2.cc:39:8: internal compiler error: in exact_div, at poly-int.h:2139                                  
 s<ae> &s<ae>::operator=(const s<t> &g) {          
        ^~~~~
0x21e8941 poly_int<1u, poly_result<unsigned long, if_nonpoly<int, int, poly_int_traits<int>::is_poly>::type, poly_coeff_pair_traits<unsigned long, if_nonpoly<int, int, poly_int_traits<int>::is_poly>::type>::result_kind>::type> exact_div<1u, unsigned long, int>(poly_int_pod<1u, unsigned long> const&, int)
        /home/andreas/build/../gcc/gcc/poly-int.h:2139                                                 
0x21e8941 vect_grouped_store_supported(tree_node*, unsigned long)                                      
        /home/andreas/build/../gcc/gcc/tree-vect-data-refs.c:5150                                      
0x1ce5115 vect_analyze_loop_2                      
        /home/andreas/build/../gcc/gcc/tree-vect-loop.c:2495                                           
0x1ce5115 vect_analyze_loop(loop*, _loop_vec_info*)                                                    
        /home/andreas/build/../gcc/gcc/tree-vect-loop.c:2621                                           
0x1d03e13 vectorize_loops()                        
        /home/andreas/build/../gcc/gcc/tree-vectorizer.c:664
Comment 1 Andreas Krebbel 2018-04-20 08:55:32 UTC
The testcases ICEs since r253196:

S/390: Set the preferred mode for float vectors
    
    gcc/ChangeLog:
    
    2017-09-26  Andreas Krebbel  <krebbel@linux.vnet.ibm.com>
    
            * config/s390/s390.c (s390_preferred_simd_mode): Return V4SFmode
            for SFmode.


with:

during RTL pass: reload
t2.cc: In member function ‘dealii::FullMatrix<number>& dealii::FullMatrix<number>::operator=(const dealii::FullMatrix<number2>&) [with number2 = std::complex<float>; number = std::complex<long double>]’:
t2.cc:199:3: internal compiler error: Max. number of generated reload insns per insn is achieved (90)

   }
   ^
0x185f553 lra_constraints(bool)
        /home/andreas/gcc/gcc/lra-constraints.c:4756
0x1845459 lra(_IO_FILE*)
        /home/andreas/gcc/gcc/lra.c:2390
0x17f260b do_reload
        /home/andreas/gcc/gcc/ira.c:5440
0x17f260b execute
        /home/andreas/gcc/gcc/ira.c:5624
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.



With the poly-int patches the ICE is triggered during vectorization already probably papering over the original ICE.

With the patch posted here the vectorization will not continue and does not appear to end up in that situation anymore:

https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00758.html
Comment 2 Andreas Krebbel 2018-04-20 09:22:10 UTC
I've opened another bugzilla for a probably unrelated problem triggered by a testcase reduce from the same source file:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85481
Comment 3 Richard Biener 2018-04-20 09:38:15 UTC
Hmm, it doesn't seem to ICE for me (with a cross from x86_64-linux).  I configured with --target s390x-linux-gnu
Comment 4 Andreas Krebbel 2018-04-20 10:56:11 UTC
Indeed it does not appear to fail with a cross from x86. I've checked with r259518 on s390x as well as on x86. With an x86 cross no tree dump is generated after 012t.ompexp and the generated assembler file does not contain any code.

x86->s390x cross 012.ompexp:
...
;; Function c::f<ab<float>*, ab<long double>*> (_ZN1c1fIP2abIfEPS1_IeEEEiT_T0_, funcdef_no=15, decl_uid
=2862, cgraph_uid=9, symbol_order=9)

c::f<ab<float>*, ab<long double>*> (struct ab * g, struct ab * h)
{
  struct ab * i;
  struct ab D.2925;

  <bb 2> :
  if (i == g)
    goto <bb 4>; [INV]
  else
    goto <bb 3>; [INV]

  <bb 3> :
  ab<long double>::ab (&D.2925, MEM[(const struct ab &)i]);
  *h = D.2925;
  h = h + 16;
  i = i + 16;
  goto <bb 2>; [INV]

  <bb 4> :
  __builtin_unreachable ();

}



;; Function ab<long double>::ab (_ZN2abIeEC2ES_IfE, funcdef_no=6, decl_uid=2666, cgraph_uid=2, symbol_order=2)

ab<long double>::ab (struct ab * const this, struct ab g)
{
  complex double D.2939;

  <bb 2> :
  MEM[(struct  &)this] = {CLOBBER};
  D.2939 = ab<float>::m (&g);
  _1 = REALPART_EXPR <D.2939>;
  _2 = IMAGPART_EXPR <D.2939>;
  _3 = COMPLEX_EXPR <_1, _2>;
  this->n = _3;
  return;

}


s390x native 012.ompexp:

;; Function c::f<ab<float>*, ab<long double>*> (_ZN1c1fIP2abIfEPS1_IgEEEiT_T0_, funcdef_no=15, decl_uid
=2896, cgraph_uid=9, symbol_order=9)

c::f<ab<float>*, ab<long double>*> (struct ab * g, struct ab * h)
{
  struct ab * i;
  struct ab D.2959;

  <bb 2> :
  if (i == g)
    goto <bb 4>; [INV]
  else
    goto <bb 3>; [INV]

  <bb 3> :
  ab<long double>::ab (&D.2959, MEM[(const struct ab &)i]);
  *h = D.2959;
  D.2959 = {CLOBBER};
  h = h + 32;
  i = i + 16;
  goto <bb 2>; [INV]

  <bb 4> :
  __builtin_unreachable ();

}



;; Function ab<long double>::ab (_ZN2abIgEC2ES_IfE, funcdef_no=6, decl_uid=2700, cgraph_uid=2, symbol_order=2)

ab<long double>::ab (struct ab * const this, struct ab g)
{
  complex double D.2973;

  <bb 2> :
  MEM[(struct  &)this] = {CLOBBER};
  D.2973 = ab<float>::m (&g);
  _1 = REALPART_EXPR <D.2973>;
  _2 = (long double) _1;
  _3 = IMAGPART_EXPR <D.2973>;
  _4 = (long double) _3;
  _5 = COMPLEX_EXPR <_2, _4>;
  this->n = _5;
  return;

}
Comment 5 rguenther@suse.de 2018-04-20 11:18:03 UTC
On Fri, 20 Apr 2018, krebbel at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85478
> 
> --- Comment #4 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
> Indeed it does not appear to fail with a cross from x86. I've checked with
> r259518 on s390x as well as on x86. With an x86 cross no tree dump is generated
> after 012t.ompexp and the generated assembler file does not contain any code.

I do see generated code for the explicitely instantiated operator.

sth like

void foo(ab<float> *a, ab<long double> *b){ c::f<ab<float> *, ab<long 
double> *> (a,b); }
void bar(ab<float> x) { ab<__float128> a(x); }

should instantiate both functions below explicitely but __float128
(taken from the demangling) doesn't do it and using 'long double'
ends up with a different mangling (and code).

That said, cross and native shouldn't differ and tracking down the
reason would be interesting.
Comment 6 Andreas Krebbel 2018-04-20 17:00:37 UTC
The difference I have seen so far was triggered by building the cross with
"--without-headers". As a result the detected glibc version is 0.0:

config.log:

configure:28145: checking for target glibc version
configure:28169: result: 0.0

This in turn fails to set the proper default for the long double data type in configure:

if test $glibc_version_major -gt 2 \
  || ( test $glibc_version_major -eq 2 && test $glibc_version_minor -ge 4 ); then :
  gcc_cv_target_ldbl128=yes
else
  ...


configuring the cross --with-long-double-128 makes the first set of differences to disappear. However, the testcase still doesn't ICE when compiled with the cross.

I will retry with a full cross. There appear to be more settings depending on the Glibc version.
Comment 7 Andreas Krebbel 2018-04-23 08:49:43 UTC
The cross from comment #6 did not trigger the problem because I accidentally built it with --disable-checking. Dropping this and adding --with-long-double-128 triggers the ICE on a full cross as well as on a cross without sysroot.
Comment 8 Andreas Krebbel 2018-04-23 09:10:48 UTC
The problem is similar to PR83753 but with a different call-chain. Richard Sandiford fixed it by adding:

	  /* First cope with the degenerate case of a single-element
	     vector.  */
	  if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
	    *memory_access_type = VMAT_CONTIGUOUS;

to get_group_load_store_type.  This prevents vect_grouped_store_supported from being called for single element vectors. 

For this PR vect_grouped_store_supported is called from vect_analyze_loop_2. I don't know if there is also a better way to deal with it in the caller?!

But regardless I think vect_grouped_store_supported should return false for single element vectors as proposed in:

https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00758.html
Comment 9 Richard Biener 2018-04-23 10:29:25 UTC
Ok, confirmed.  The following fixes it:

Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c        (revision 259558)
+++ gcc/tree-vect-loop.c        (working copy)
@@ -2492,6 +2492,7 @@ again:
       unsigned int size = STMT_VINFO_GROUP_SIZE (vinfo);
       tree vectype = STMT_VINFO_VECTYPE (vinfo);
       if (! vect_store_lanes_supported (vectype, size, false)
+         && ! known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U)
          && ! vect_grouped_store_supported (vectype, size))
        return false;
       FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (instance), j, node)

Andreas, can you test this?  It's pre-approved if you make it before RC1.
Comment 10 Andreas Krebbel 2018-04-24 12:18:58 UTC
Author: krebbel
Date: Tue Apr 24 12:18:26 2018
New Revision: 259593

URL: https://gcc.gnu.org/viewcvs?rev=259593&root=gcc&view=rev
Log:
Fix PR85478

gcc/ChangeLog:

2018-04-24  Andreas Krebbel  <krebbel@linux.ibm.com>

	PR tree-optimization/85478
	* tree-vect-loop.c (vect_analyze_loop_2): Do not call
	vect_grouped_store_supported for single element vectors.

gcc/testsuite/ChangeLog:

2018-04-24  Andreas Krebbel  <krebbel@linux.ibm.com>

	PR tree-optimization/85478
	* g++.dg/pr85478.C: New test.


Added:
    trunk/gcc/testsuite/g++.dg/pr85478.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-loop.c
Comment 11 Jeffrey A. Law 2018-05-24 20:54:29 UTC
So is this BZ fixed on Andreas?  If so, let's close it.  I'll also take your patch out of my queue of things to review :-)
Comment 12 Richard Biener 2018-05-25 06:52:18 UTC
Fixed.
Comment 13 Andreas Krebbel 2018-06-04 10:55:11 UTC
Fixed.