Bug 97085

Summary: [11 Regression] aarch64, SVE: ICE in gimple_expand_vec_cond_expr since r11-2610-ga1ee6d507b
Product: gcc Reporter: Alex Coplan <acoplan>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: glisse, marxin, rguenth, zsojka
Priority: P3 Keywords: ice-on-valid-code
Version: 11.0   
Target Milestone: 11.0   
Host: Target: aarch64
Build: Known to work:
Known to fail: 11.0 Last reconfirmed: 2020-09-23 00:00:00

Description Alex Coplan 2020-09-17 12:54:18 UTC
For the following testcase:

int a, b, c, d;
short e, g;
unsigned short f;
void h() {
  for (; d; d++) {
    g = d;
    e = b == 0 ? 1 : a % b;
    c ^= (f = e) > (g == 5);
  }
}

AArch64 GCC ICEs at -O3 with -march=armv8.2-a+sve since r11-2610-ga1ee6d507b0c26466be519d177f5a08b22f63647:

Author: Marc Glisse <marc.glisse@inria.fr>
Date:   Fri Aug 7 17:49:04 2020

    Disable some VEC_COND_EXPR transformations after vector lowering

To reproduce:

$ aarch64-none-elf-gcc -c test.c -O3 -march=armv8.2-a+sve
during GIMPLE pass: isel
test.c: In function 'h':
test.c:4:6: internal compiler error: in gimple_expand_vec_cond_expr, at gimple-isel.cc:135
    4 | void h() {
      |      ^
0x102c7bd gimple_expand_vec_cond_expr
        /home/alecop01/toolchain/src/gcc/gcc/gimple-isel.cc:133
0x102cbe1 gimple_expand_vec_cond_exprs
        /home/alecop01/toolchain/src/gcc/gcc/gimple-isel.cc:191
0x102cbe1 execute
        /home/alecop01/toolchain/src/gcc/gcc/gimple-isel.cc:240
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
Comment 1 Richard Biener 2020-09-23 13:27:07 UTC
Re-confirmed after my last batch of fixes.
Comment 2 Richard Biener 2020-09-23 13:45:12 UTC
#1  0x00000000015c8944 in gimple_expand_vec_cond_expr (gsi=0x7fffffffda50, 
    vec_cond_ssa_name_uses=0x7fffffffda20)
    at ../../src/trunk/gcc/gimple-isel.cc:133
133       gcc_assert (known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (cmp_op_mode))
(gdb) p mode
$2 = E_VNx8BImode
(gdb) p cmp_op_mode
$3 = E_VNx8HImode
(gdb) p debug_gimple_stmt (stmt)
mask__31.28_102 = VEC_COND_EXPR <mask__23.26_97, { 0, ... }, { -1, ... }>;

So veclower got to see

  vect_cst__96 = { 5, ... };
  vect_cst__98 = { 1, ... };
  vect_cst__99 = { 0, ... };
...
  mask__23.26_97 = vect__50.25_95 == vect_cst__96;
  vect_patt_61.27_100 = VEC_COND_EXPR <mask__23.26_97, vect_cst__98, vect_cst__99>;
  mask__31.28_102 = vect_patt_61.27_100 == vect_cst__101;

and in VRP2 we fold this to

  mask__31.28_102 = VEC_COND_EXPR <mask__23.26_97, { 0, ... }, { -1, ... }>;

feeding

  vect_patt_63.29_105 = VEC_COND_EXPR <mask__31.28_102, { 1, ... }, { 0, ... }>;

so it seems to me we're missing to fold the mask__31.28_102 def to a
mask negation (it's of type vector([8,8]) <signed-boolean:2> mask__31.28).

At least we're not at all expecting to have a VEC_COND_EXPR where
the comparison feeding the mask has different operand modes than the
VEC_COND_EXPR result mode.  But we don't really expect a VEC_COND_EXPR
to have VECTOR_BOOLEAN_TYPE ...

So we could either handle this specially and ISEL it to a compare or a
mask inversion or avoid such folding.  For example

diff --git a/gcc/match.pd b/gcc/match.pd
index 7d63bb973cb..ff19b1cbeb4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3465,22 +3465,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (op (vec_cond:s @0 @1 @2))
   (vec_cond @0 (op! @1) (op! @2))))
 
-/* Sink binary operation to branches, but only if we can fold it.  */
+/* Sink binary operation to branches, but only if we can fold it.  Avoid
+   VEC_COND_EXPRs with boolean vector result.  */
 (for op (tcc_comparison plus minus mult bit_and bit_ior bit_xor
         rdiv trunc_div ceil_div floor_div round_div
         trunc_mod ceil_mod floor_mod round_mod min max)
-/* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
- (simplify
-  (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
-  (vec_cond @0 (op! @1 @3) (op! @2 @4)))
+ (if (!VECTOR_BOOLEAN_TYPE_P (type))
+  /* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
+  (simplify
+   (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
+   (vec_cond @0 (op! @1 @3) (op! @2 @4)))
 
-/* (c ? a : b) op d  -->  c ? (a op d) : (b op d) */
- (simplify
-  (op (vec_cond:s @0 @1 @2) @3)
-  (vec_cond @0 (op! @1 @3) (op! @2 @3)))
- (simplify
-  (op @3 (vec_cond:s @0 @1 @2))
-  (vec_cond @0 (op! @3 @1) (op! @3 @2))))
+  /* (c ? a : b) op d  -->  c ? (a op d) : (b op d) */
+  (simplify
+   (op (vec_cond:s @0 @1 @2) @3)
+   (vec_cond @0 (op! @1 @3) (op! @2 @3)))
+  (simplify
+   (op @3 (vec_cond:s @0 @1 @2))
+   (vec_cond @0 (op! @3 @1) (op! @3 @2)))))
 #endif

fixes this particular testcase (but there are more patterns producing
vec_cond exprs).  We'd also want to add verification if we do not want
VECTOR_BOOLEAN_TYPE_P VEC_COND_EXPR.
Comment 3 Richard Biener 2020-09-23 13:53:47 UTC
The following also works, and we then expand from

  mask__31.28_102 = vect__50.25_95 != { 5, ... };
  vect_patt_63.29_105 = VEC_COND_EXPR <mask__31.28_102, { 1, ... }, { 0, ... }>;

diff --git a/gcc/match.pd b/gcc/match.pd
index 7d63bb973cb..e6dcdd0b855 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3521,6 +3521,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (optimize_vectors_before_lowering_p () && types_match (@0, @1))
   (vec_cond (bit_and (bit_not @0) @1) @2 @3)))
 
+/* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
+   types are compatible.  */
+(simplify
+ (vec_cond @0 VECTOR_CST@1 VECTOR_CST@2)
+ (if (VECTOR_BOOLEAN_TYPE_P (type)
+      && types_match (type, TREE_TYPE (@0)))
+  (if (integer_zerop (@1) && integer_all_onesp (@2))
+   (bit_not @0)
+   (if (integer_all_onesp (@1) && integer_zerop (@2))
+    @0))))
+
 /* Simplification moved from fold_cond_expr_with_comparison.  It may also
    be extended.  */
 /* This pattern implements two kinds simplification:
Comment 4 Marc Glisse 2020-09-23 15:13:57 UTC
I would be happy with a revert of that patch, if the ARM backend gets fixed, but indeed a missed optimization should not cause an ICE.

(In reply to Richard Biener from comment #2)
> At least we're not at all expecting to have a VEC_COND_EXPR where
> the comparison feeding the mask has different operand modes than the
> VEC_COND_EXPR result mode.

Ah, I see why that might cause trouble, although I think supporting this makes sense, when the modes have the same size or when we use AVX512-style bool vectors.

> We'd also want to add verification if we do not want
> VECTOR_BOOLEAN_TYPE_P VEC_COND_EXPR.

I understand that a VEC_COND_EXPR which outputs a vector of bool (à la AVX512) is a different thing from a VEC_COND_EXPR which outputs a "true" vector, but VEC_COND_EXPR still looks like the right tree code to represent both.

(In reply to Richard Biener from comment #3)
> +/* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
> +   types are compatible.  */
> +(simplify
> + (vec_cond @0 VECTOR_CST@1 VECTOR_CST@2)
> + (if (VECTOR_BOOLEAN_TYPE_P (type)
> +      && types_match (type, TREE_TYPE (@0)))
> +  (if (integer_zerop (@1) && integer_all_onesp (@2))
> +   (bit_not @0)
> +   (if (integer_all_onesp (@1) && integer_zerop (@2))
> +    @0))))

Is the test VECTOR_BOOLEAN_TYPE_P necessary?

(sorry, I may not be very reactive these days)
Comment 5 Richard Biener 2020-09-24 06:47:29 UTC
(In reply to Marc Glisse from comment #4)
> I would be happy with a revert of that patch, if the ARM backend gets fixed,
> but indeed a missed optimization should not cause an ICE.

Not sure what the ARM backend issue is.

> (In reply to Richard Biener from comment #2)
> > At least we're not at all expecting to have a VEC_COND_EXPR where
> > the comparison feeding the mask has different operand modes than the
> > VEC_COND_EXPR result mode.
> 
> Ah, I see why that might cause trouble, although I think supporting this
> makes sense, when the modes have the same size or when we use AVX512-style
> bool vectors.

Well, VEC_COND_EXPR (as some other VEC_ tree codes) are special in that
we are (try to...) be careful to only synthesize ones supported "directly"
by the target.  For the mask vectors (VECTOR_BOOLEAN_TYPE_P, in the
AVX512/SVE case) I don't think the targets support ?: natively but they
have bitwise instructions for this case.  That means we could 'simply'
expand mask x ? y : z as (y & x) | (z & ~x) I guess [requires x and y,z
to be of the same type of course].  I wondered whether we ever
need to translate between, say, vector<bool:1> and vector<bool:2>
where lowering ?: this way would require '[un]packing' one of the vectors.

> > We'd also want to add verification if we do not want
> > VECTOR_BOOLEAN_TYPE_P VEC_COND_EXPR.
> 
> I understand that a VEC_COND_EXPR which outputs a vector of bool (à la
> AVX512) is a different thing from a VEC_COND_EXPR which outputs a "true"
> vector, but VEC_COND_EXPR still looks like the right tree code to represent
> both.

True, unless you go to bitwise ops.  For scalar flag ? bool_a : bool_b
?: isn't the natural representation either - at least I'm not aware
of any pattern transforming (a & b) | (c & ~b) to b ? a : c for
precision one integer types ;)

> (In reply to Richard Biener from comment #3)
> > +/* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
> > +   types are compatible.  */
> > +(simplify
> > + (vec_cond @0 VECTOR_CST@1 VECTOR_CST@2)
> > + (if (VECTOR_BOOLEAN_TYPE_P (type)
> > +      && types_match (type, TREE_TYPE (@0)))
> > +  (if (integer_zerop (@1) && integer_all_onesp (@2))
> > +   (bit_not @0)
> > +   (if (integer_all_onesp (@1) && integer_zerop (@2))
> > +    @0))))
> 
> Is the test VECTOR_BOOLEAN_TYPE_P necessary?

Hmm, since the first operand of a VEC_COND_EXPR has to be VECTOR_BOOLEAN_TYPE_P
the types_match may be enough indeed.  Note types_match will treat
vector<bool:8> and vector<char> as compatible but only the first one
will be a VECTOR_BOOLEAN_TYPE_P.  We already have half of this in fold-const
btw:

      /* Convert A ? 1 : 0 to simply A.  */
      if ((code == VEC_COND_EXPR ? integer_all_onesp (op1)
                                 : (integer_onep (op1)
                                    && !VECTOR_TYPE_P (type)))
          && integer_zerop (op2)
          /* If we try to convert OP0 to our type, the
             call to fold will try to move the conversion inside
             a COND, which will recurse.  In that case, the COND_EXPR
             is probably the best choice, so leave it alone.  */
          && type == TREE_TYPE (arg0))
        return pedantic_non_lvalue_loc (loc, arg0);

the other half is guarded with !VECTOR_TYPE_P

      /* Convert A ? 0 : 1 to !A.  This prefers the use of NOT_EXPR
         over COND_EXPR in cases such as floating point comparisons.  */
      if (integer_zerop (op1)
          && code == COND_EXPR
          && integer_onep (op2)
          && !VECTOR_TYPE_P (type)
          && truth_value_p (TREE_CODE (arg0)))
        return pedantic_non_lvalue_loc (loc,
                                    fold_convert_loc (loc, type,
                                              invert_truthvalue_loc (loc,
                                                                     arg0)));

I'm going to test the original proposed simplification to fix the ICE
for now.

> (sorry, I may not be very reactive these days)
Comment 6 Marc Glisse 2020-09-24 07:21:39 UTC
(In reply to Richard Biener from comment #5)
> (In reply to Marc Glisse from comment #4)
> > I would be happy with a revert of that patch, if the ARM backend gets fixed,
> > but indeed a missed optimization should not cause an ICE.
> 
> Not sure what the ARM backend issue is.

PR 96528

> Well, VEC_COND_EXPR (as some other VEC_ tree codes) are special in that
> we are (try to...) be careful to only synthesize ones supported "directly"
> by the target.

After vector lowering, yes. But before that, the front-end can produce vec_cond_expr for vector types that are not supported. Ah, you probably meant synthesize them from optimization passes, ok.

> For the mask vectors (VECTOR_BOOLEAN_TYPE_P, in the
> AVX512/SVE case) I don't think the targets support ?: natively but they
> have bitwise instructions for this case.  That means we could 'simply'
> expand mask x ? y : z as (y & x) | (z & ~x) I guess [requires x and y,z
> to be of the same type of course].  I wondered whether we ever
> need to translate between, say, vector<bool:1> and vector<bool:2>
> where lowering ?: this way would require '[un]packing' one of the vectors.

I still need to go back to the introduction of those types to understand why vector<bool:2> exists at all...

> True, unless you go to bitwise ops.  For scalar flag ? bool_a : bool_b
> ?: isn't the natural representation either - at least I'm not aware
> of any pattern transforming (a & b) | (c & ~b) to b ? a : c for
> precision one integer types ;)

There are PRs asking for this transformation (and for transformations that this one would enable).
Comment 7 Richard Biener 2020-09-24 07:48:13 UTC
(In reply to Marc Glisse from comment #6)
> (In reply to Richard Biener from comment #5)
> > (In reply to Marc Glisse from comment #4)
> > > I would be happy with a revert of that patch, if the ARM backend gets fixed,
> > > but indeed a missed optimization should not cause an ICE.
> > 
> > Not sure what the ARM backend issue is.
> 
> PR 96528
> 
> > Well, VEC_COND_EXPR (as some other VEC_ tree codes) are special in that
> > we are (try to...) be careful to only synthesize ones supported "directly"
> > by the target.
> 
> After vector lowering, yes. But before that, the front-end can produce
> vec_cond_expr for vector types that are not supported. Ah, you probably
> meant synthesize them from optimization passes, ok.

Yeah.  I'm also not sure whether the user can synthesize VEC_COND_EXPR
for say AVX512 vector masks - you cannot even declare those "vector types",
instead the intrinsic header uses

typedef unsigned char  __mmask8;
typedef unsigned short __mmask16;

maybe for SVE this is different, but the SVE header is "hidden" ;)

/* NOTE: This implementation of arm_sve.h is intentionally short.  It does
   not define the SVE types and intrinsic functions directly in C and C++
   code, but instead uses the following pragma to tell GCC to insert the
   necessary type and function definitions itself.  The net effect is the
   same, and the file is a complete implementation of arm_sve.h.  */
#pragma GCC aarch64 "arm_sve.h"

> > For the mask vectors (VECTOR_BOOLEAN_TYPE_P, in the
> > AVX512/SVE case) I don't think the targets support ?: natively but they
> > have bitwise instructions for this case.  That means we could 'simply'
> > expand mask x ? y : z as (y & x) | (z & ~x) I guess [requires x and y,z
> > to be of the same type of course].  I wondered whether we ever
> > need to translate between, say, vector<bool:1> and vector<bool:2>
> > where lowering ?: this way would require '[un]packing' one of the vectors.
> 
> I still need to go back to the introduction of those types to understand why
> vector<bool:2> exists at all...
> 
> > True, unless you go to bitwise ops.  For scalar flag ? bool_a : bool_b
> > ?: isn't the natural representation either - at least I'm not aware
> > of any pattern transforming (a & b) | (c & ~b) to b ? a : c for
> > precision one integer types ;)
> 
> There are PRs asking for this transformation (and for transformations that
> this one would enable).

heh ;)
Comment 8 GCC Commits 2020-09-24 08:20:49 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:10843f8303509fcba880c6c05c08e4b4ccd24f36

commit r11-3426-g10843f8303509fcba880c6c05c08e4b4ccd24f36
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Sep 24 10:14:33 2020 +0200

    tree-optimization/97085 - fold some trivial bool vector ?:
    
    The following aovids the ICE in the testcase by doing some additional
    simplification of VEC_COND_EXPRs for VECTOR_BOOLEAN_TYPE_P which
    we don't really expect, esp. when they are not classical vectors,
    thus AVX512 or SVE masks.
    
    2020-09-24  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/97085
            * match.pd (mask ? { false,..} : { true, ..} -> ~mask): New.
    
            * gcc.dg/vect/pr97085.c: New testcase.
Comment 9 Richard Biener 2020-09-24 08:20:58 UTC
Fixed.
Comment 10 Richard Biener 2020-09-24 11:25:39 UTC
*** Bug 97192 has been marked as a duplicate of this bug. ***
Comment 11 GCC Commits 2020-09-24 11:29:12 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:a8d5c28233f95e3474ee8cbc4d341cbb43ab7bb6

commit r11-3431-ga8d5c28233f95e3474ee8cbc4d341cbb43ab7bb6
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Sep 24 13:27:49 2020 +0200

    target/97192 - new testcase for fixed PR
    
    This adds another testcase for the PR97085 fix.
    
    2020-09-24  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/97085
            * gcc.dg/pr97192.c: New testcase.