Created attachment 43996 [details] Autoreduced testcase Compiling the attached testcase triggers an ICE cc1plus -march=arch12 -O3 -fpermissive t.cc Performing interprocedural optimizations <*free_lang_data> <visibility> <build_ssa_passes> <opt_local_passes> <targetclone> <free-fnsummary> <whole-program> <profile_estimate> <icf> <devirt> <cp> <fnsummary> <inline> <pure-const> <free-fnsummary> <static-var> <single-use> <comdats>Assembling functions: <materialize-all-clones> s<ae>& s<ae>::operator=(const s<t>&) [with t = ab<float>; ae = ab<long double>]during GIMPLE pass: vect t2.cc: In member function ‘s<ae>& s<ae>::operator=(const s<t>&) [with t = ab<float>; ae = ab<long double>]’: t2.cc:39:8: internal compiler error: in exact_div, at poly-int.h:2139 s<ae> &s<ae>::operator=(const s<t> &g) { ^~~~~ 0x21e8941 poly_int<1u, poly_result<unsigned long, if_nonpoly<int, int, poly_int_traits<int>::is_poly>::type, poly_coeff_pair_traits<unsigned long, if_nonpoly<int, int, poly_int_traits<int>::is_poly>::type>::result_kind>::type> exact_div<1u, unsigned long, int>(poly_int_pod<1u, unsigned long> const&, int) /home/andreas/build/../gcc/gcc/poly-int.h:2139 0x21e8941 vect_grouped_store_supported(tree_node*, unsigned long) /home/andreas/build/../gcc/gcc/tree-vect-data-refs.c:5150 0x1ce5115 vect_analyze_loop_2 /home/andreas/build/../gcc/gcc/tree-vect-loop.c:2495 0x1ce5115 vect_analyze_loop(loop*, _loop_vec_info*) /home/andreas/build/../gcc/gcc/tree-vect-loop.c:2621 0x1d03e13 vectorize_loops() /home/andreas/build/../gcc/gcc/tree-vectorizer.c:664
The testcases ICEs since r253196: S/390: Set the preferred mode for float vectors gcc/ChangeLog: 2017-09-26 Andreas Krebbel <krebbel@linux.vnet.ibm.com> * config/s390/s390.c (s390_preferred_simd_mode): Return V4SFmode for SFmode. with: during RTL pass: reload t2.cc: In member function ‘dealii::FullMatrix<number>& dealii::FullMatrix<number>::operator=(const dealii::FullMatrix<number2>&) [with number2 = std::complex<float>; number = std::complex<long double>]’: t2.cc:199:3: internal compiler error: Max. number of generated reload insns per insn is achieved (90) } ^ 0x185f553 lra_constraints(bool) /home/andreas/gcc/gcc/lra-constraints.c:4756 0x1845459 lra(_IO_FILE*) /home/andreas/gcc/gcc/lra.c:2390 0x17f260b do_reload /home/andreas/gcc/gcc/ira.c:5440 0x17f260b execute /home/andreas/gcc/gcc/ira.c:5624 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. With the poly-int patches the ICE is triggered during vectorization already probably papering over the original ICE. With the patch posted here the vectorization will not continue and does not appear to end up in that situation anymore: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00758.html
I've opened another bugzilla for a probably unrelated problem triggered by a testcase reduce from the same source file: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85481
Hmm, it doesn't seem to ICE for me (with a cross from x86_64-linux). I configured with --target s390x-linux-gnu
Indeed it does not appear to fail with a cross from x86. I've checked with r259518 on s390x as well as on x86. With an x86 cross no tree dump is generated after 012t.ompexp and the generated assembler file does not contain any code. x86->s390x cross 012.ompexp: ... ;; Function c::f<ab<float>*, ab<long double>*> (_ZN1c1fIP2abIfEPS1_IeEEEiT_T0_, funcdef_no=15, decl_uid =2862, cgraph_uid=9, symbol_order=9) c::f<ab<float>*, ab<long double>*> (struct ab * g, struct ab * h) { struct ab * i; struct ab D.2925; <bb 2> : if (i == g) goto <bb 4>; [INV] else goto <bb 3>; [INV] <bb 3> : ab<long double>::ab (&D.2925, MEM[(const struct ab &)i]); *h = D.2925; h = h + 16; i = i + 16; goto <bb 2>; [INV] <bb 4> : __builtin_unreachable (); } ;; Function ab<long double>::ab (_ZN2abIeEC2ES_IfE, funcdef_no=6, decl_uid=2666, cgraph_uid=2, symbol_order=2) ab<long double>::ab (struct ab * const this, struct ab g) { complex double D.2939; <bb 2> : MEM[(struct &)this] = {CLOBBER}; D.2939 = ab<float>::m (&g); _1 = REALPART_EXPR <D.2939>; _2 = IMAGPART_EXPR <D.2939>; _3 = COMPLEX_EXPR <_1, _2>; this->n = _3; return; } s390x native 012.ompexp: ;; Function c::f<ab<float>*, ab<long double>*> (_ZN1c1fIP2abIfEPS1_IgEEEiT_T0_, funcdef_no=15, decl_uid =2896, cgraph_uid=9, symbol_order=9) c::f<ab<float>*, ab<long double>*> (struct ab * g, struct ab * h) { struct ab * i; struct ab D.2959; <bb 2> : if (i == g) goto <bb 4>; [INV] else goto <bb 3>; [INV] <bb 3> : ab<long double>::ab (&D.2959, MEM[(const struct ab &)i]); *h = D.2959; D.2959 = {CLOBBER}; h = h + 32; i = i + 16; goto <bb 2>; [INV] <bb 4> : __builtin_unreachable (); } ;; Function ab<long double>::ab (_ZN2abIgEC2ES_IfE, funcdef_no=6, decl_uid=2700, cgraph_uid=2, symbol_order=2) ab<long double>::ab (struct ab * const this, struct ab g) { complex double D.2973; <bb 2> : MEM[(struct &)this] = {CLOBBER}; D.2973 = ab<float>::m (&g); _1 = REALPART_EXPR <D.2973>; _2 = (long double) _1; _3 = IMAGPART_EXPR <D.2973>; _4 = (long double) _3; _5 = COMPLEX_EXPR <_2, _4>; this->n = _5; return; }
On Fri, 20 Apr 2018, krebbel at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85478 > > --- Comment #4 from Andreas Krebbel <krebbel at gcc dot gnu.org> --- > Indeed it does not appear to fail with a cross from x86. I've checked with > r259518 on s390x as well as on x86. With an x86 cross no tree dump is generated > after 012t.ompexp and the generated assembler file does not contain any code. I do see generated code for the explicitely instantiated operator. sth like void foo(ab<float> *a, ab<long double> *b){ c::f<ab<float> *, ab<long double> *> (a,b); } void bar(ab<float> x) { ab<__float128> a(x); } should instantiate both functions below explicitely but __float128 (taken from the demangling) doesn't do it and using 'long double' ends up with a different mangling (and code). That said, cross and native shouldn't differ and tracking down the reason would be interesting.
The difference I have seen so far was triggered by building the cross with "--without-headers". As a result the detected glibc version is 0.0: config.log: configure:28145: checking for target glibc version configure:28169: result: 0.0 This in turn fails to set the proper default for the long double data type in configure: if test $glibc_version_major -gt 2 \ || ( test $glibc_version_major -eq 2 && test $glibc_version_minor -ge 4 ); then : gcc_cv_target_ldbl128=yes else ... configuring the cross --with-long-double-128 makes the first set of differences to disappear. However, the testcase still doesn't ICE when compiled with the cross. I will retry with a full cross. There appear to be more settings depending on the Glibc version.
The cross from comment #6 did not trigger the problem because I accidentally built it with --disable-checking. Dropping this and adding --with-long-double-128 triggers the ICE on a full cross as well as on a cross without sysroot.
The problem is similar to PR83753 but with a different call-chain. Richard Sandiford fixed it by adding: /* First cope with the degenerate case of a single-element vector. */ if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U)) *memory_access_type = VMAT_CONTIGUOUS; to get_group_load_store_type. This prevents vect_grouped_store_supported from being called for single element vectors. For this PR vect_grouped_store_supported is called from vect_analyze_loop_2. I don't know if there is also a better way to deal with it in the caller?! But regardless I think vect_grouped_store_supported should return false for single element vectors as proposed in: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00758.html
Ok, confirmed. The following fixes it: Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c (revision 259558) +++ gcc/tree-vect-loop.c (working copy) @@ -2492,6 +2492,7 @@ again: unsigned int size = STMT_VINFO_GROUP_SIZE (vinfo); tree vectype = STMT_VINFO_VECTYPE (vinfo); if (! vect_store_lanes_supported (vectype, size, false) + && ! known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U) && ! vect_grouped_store_supported (vectype, size)) return false; FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (instance), j, node) Andreas, can you test this? It's pre-approved if you make it before RC1.
Author: krebbel Date: Tue Apr 24 12:18:26 2018 New Revision: 259593 URL: https://gcc.gnu.org/viewcvs?rev=259593&root=gcc&view=rev Log: Fix PR85478 gcc/ChangeLog: 2018-04-24 Andreas Krebbel <krebbel@linux.ibm.com> PR tree-optimization/85478 * tree-vect-loop.c (vect_analyze_loop_2): Do not call vect_grouped_store_supported for single element vectors. gcc/testsuite/ChangeLog: 2018-04-24 Andreas Krebbel <krebbel@linux.ibm.com> PR tree-optimization/85478 * g++.dg/pr85478.C: New test. Added: trunk/gcc/testsuite/g++.dg/pr85478.C Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-loop.c
So is this BZ fixed on Andreas? If so, let's close it. I'll also take your patch out of my queue of things to review :-)
Fixed.