Summary: | [11 Regression] aarch64, SVE: ICE in gimple_expand_vec_cond_expr since r11-2610-ga1ee6d507b | ||
---|---|---|---|
Product: | gcc | Reporter: | Alex Coplan <acoplan> |
Component: | tree-optimization | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | glisse, marxin, rguenth, zsojka |
Priority: | P3 | Keywords: | ice-on-valid-code |
Version: | 11.0 | ||
Target Milestone: | 11.0 | ||
Host: | Target: | aarch64 | |
Build: | Known to work: | ||
Known to fail: | 11.0 | Last reconfirmed: | 2020-09-23 00:00:00 |
Description
Alex Coplan
2020-09-17 12:54:18 UTC
Re-confirmed after my last batch of fixes. #1 0x00000000015c8944 in gimple_expand_vec_cond_expr (gsi=0x7fffffffda50, vec_cond_ssa_name_uses=0x7fffffffda20) at ../../src/trunk/gcc/gimple-isel.cc:133 133 gcc_assert (known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (cmp_op_mode)) (gdb) p mode $2 = E_VNx8BImode (gdb) p cmp_op_mode $3 = E_VNx8HImode (gdb) p debug_gimple_stmt (stmt) mask__31.28_102 = VEC_COND_EXPR <mask__23.26_97, { 0, ... }, { -1, ... }>; So veclower got to see vect_cst__96 = { 5, ... }; vect_cst__98 = { 1, ... }; vect_cst__99 = { 0, ... }; ... mask__23.26_97 = vect__50.25_95 == vect_cst__96; vect_patt_61.27_100 = VEC_COND_EXPR <mask__23.26_97, vect_cst__98, vect_cst__99>; mask__31.28_102 = vect_patt_61.27_100 == vect_cst__101; and in VRP2 we fold this to mask__31.28_102 = VEC_COND_EXPR <mask__23.26_97, { 0, ... }, { -1, ... }>; feeding vect_patt_63.29_105 = VEC_COND_EXPR <mask__31.28_102, { 1, ... }, { 0, ... }>; so it seems to me we're missing to fold the mask__31.28_102 def to a mask negation (it's of type vector([8,8]) <signed-boolean:2> mask__31.28). At least we're not at all expecting to have a VEC_COND_EXPR where the comparison feeding the mask has different operand modes than the VEC_COND_EXPR result mode. But we don't really expect a VEC_COND_EXPR to have VECTOR_BOOLEAN_TYPE ... So we could either handle this specially and ISEL it to a compare or a mask inversion or avoid such folding. For example diff --git a/gcc/match.pd b/gcc/match.pd index 7d63bb973cb..ff19b1cbeb4 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3465,22 +3465,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (op (vec_cond:s @0 @1 @2)) (vec_cond @0 (op! @1) (op! @2)))) -/* Sink binary operation to branches, but only if we can fold it. */ +/* Sink binary operation to branches, but only if we can fold it. Avoid + VEC_COND_EXPRs with boolean vector result. */ (for op (tcc_comparison plus minus mult bit_and bit_ior bit_xor rdiv trunc_div ceil_div floor_div round_div trunc_mod ceil_mod floor_mod round_mod min max) -/* (c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e) */ - (simplify - (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4)) - (vec_cond @0 (op! @1 @3) (op! @2 @4))) + (if (!VECTOR_BOOLEAN_TYPE_P (type)) + /* (c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e) */ + (simplify + (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4)) + (vec_cond @0 (op! @1 @3) (op! @2 @4))) -/* (c ? a : b) op d --> c ? (a op d) : (b op d) */ - (simplify - (op (vec_cond:s @0 @1 @2) @3) - (vec_cond @0 (op! @1 @3) (op! @2 @3))) - (simplify - (op @3 (vec_cond:s @0 @1 @2)) - (vec_cond @0 (op! @3 @1) (op! @3 @2)))) + /* (c ? a : b) op d --> c ? (a op d) : (b op d) */ + (simplify + (op (vec_cond:s @0 @1 @2) @3) + (vec_cond @0 (op! @1 @3) (op! @2 @3))) + (simplify + (op @3 (vec_cond:s @0 @1 @2)) + (vec_cond @0 (op! @3 @1) (op! @3 @2))))) #endif fixes this particular testcase (but there are more patterns producing vec_cond exprs). We'd also want to add verification if we do not want VECTOR_BOOLEAN_TYPE_P VEC_COND_EXPR. The following also works, and we then expand from mask__31.28_102 = vect__50.25_95 != { 5, ... }; vect_patt_63.29_105 = VEC_COND_EXPR <mask__31.28_102, { 1, ... }, { 0, ... }>; diff --git a/gcc/match.pd b/gcc/match.pd index 7d63bb973cb..e6dcdd0b855 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3521,6 +3521,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (optimize_vectors_before_lowering_p () && types_match (@0, @1)) (vec_cond (bit_and (bit_not @0) @1) @2 @3))) +/* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask + types are compatible. */ +(simplify + (vec_cond @0 VECTOR_CST@1 VECTOR_CST@2) + (if (VECTOR_BOOLEAN_TYPE_P (type) + && types_match (type, TREE_TYPE (@0))) + (if (integer_zerop (@1) && integer_all_onesp (@2)) + (bit_not @0) + (if (integer_all_onesp (@1) && integer_zerop (@2)) + @0)))) + /* Simplification moved from fold_cond_expr_with_comparison. It may also be extended. */ /* This pattern implements two kinds simplification: I would be happy with a revert of that patch, if the ARM backend gets fixed, but indeed a missed optimization should not cause an ICE. (In reply to Richard Biener from comment #2) > At least we're not at all expecting to have a VEC_COND_EXPR where > the comparison feeding the mask has different operand modes than the > VEC_COND_EXPR result mode. Ah, I see why that might cause trouble, although I think supporting this makes sense, when the modes have the same size or when we use AVX512-style bool vectors. > We'd also want to add verification if we do not want > VECTOR_BOOLEAN_TYPE_P VEC_COND_EXPR. I understand that a VEC_COND_EXPR which outputs a vector of bool (à la AVX512) is a different thing from a VEC_COND_EXPR which outputs a "true" vector, but VEC_COND_EXPR still looks like the right tree code to represent both. (In reply to Richard Biener from comment #3) > +/* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask > + types are compatible. */ > +(simplify > + (vec_cond @0 VECTOR_CST@1 VECTOR_CST@2) > + (if (VECTOR_BOOLEAN_TYPE_P (type) > + && types_match (type, TREE_TYPE (@0))) > + (if (integer_zerop (@1) && integer_all_onesp (@2)) > + (bit_not @0) > + (if (integer_all_onesp (@1) && integer_zerop (@2)) > + @0)))) Is the test VECTOR_BOOLEAN_TYPE_P necessary? (sorry, I may not be very reactive these days) (In reply to Marc Glisse from comment #4) > I would be happy with a revert of that patch, if the ARM backend gets fixed, > but indeed a missed optimization should not cause an ICE. Not sure what the ARM backend issue is. > (In reply to Richard Biener from comment #2) > > At least we're not at all expecting to have a VEC_COND_EXPR where > > the comparison feeding the mask has different operand modes than the > > VEC_COND_EXPR result mode. > > Ah, I see why that might cause trouble, although I think supporting this > makes sense, when the modes have the same size or when we use AVX512-style > bool vectors. Well, VEC_COND_EXPR (as some other VEC_ tree codes) are special in that we are (try to...) be careful to only synthesize ones supported "directly" by the target. For the mask vectors (VECTOR_BOOLEAN_TYPE_P, in the AVX512/SVE case) I don't think the targets support ?: natively but they have bitwise instructions for this case. That means we could 'simply' expand mask x ? y : z as (y & x) | (z & ~x) I guess [requires x and y,z to be of the same type of course]. I wondered whether we ever need to translate between, say, vector<bool:1> and vector<bool:2> where lowering ?: this way would require '[un]packing' one of the vectors. > > We'd also want to add verification if we do not want > > VECTOR_BOOLEAN_TYPE_P VEC_COND_EXPR. > > I understand that a VEC_COND_EXPR which outputs a vector of bool (à la > AVX512) is a different thing from a VEC_COND_EXPR which outputs a "true" > vector, but VEC_COND_EXPR still looks like the right tree code to represent > both. True, unless you go to bitwise ops. For scalar flag ? bool_a : bool_b ?: isn't the natural representation either - at least I'm not aware of any pattern transforming (a & b) | (c & ~b) to b ? a : c for precision one integer types ;) > (In reply to Richard Biener from comment #3) > > +/* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask > > + types are compatible. */ > > +(simplify > > + (vec_cond @0 VECTOR_CST@1 VECTOR_CST@2) > > + (if (VECTOR_BOOLEAN_TYPE_P (type) > > + && types_match (type, TREE_TYPE (@0))) > > + (if (integer_zerop (@1) && integer_all_onesp (@2)) > > + (bit_not @0) > > + (if (integer_all_onesp (@1) && integer_zerop (@2)) > > + @0)))) > > Is the test VECTOR_BOOLEAN_TYPE_P necessary? Hmm, since the first operand of a VEC_COND_EXPR has to be VECTOR_BOOLEAN_TYPE_P the types_match may be enough indeed. Note types_match will treat vector<bool:8> and vector<char> as compatible but only the first one will be a VECTOR_BOOLEAN_TYPE_P. We already have half of this in fold-const btw: /* Convert A ? 1 : 0 to simply A. */ if ((code == VEC_COND_EXPR ? integer_all_onesp (op1) : (integer_onep (op1) && !VECTOR_TYPE_P (type))) && integer_zerop (op2) /* If we try to convert OP0 to our type, the call to fold will try to move the conversion inside a COND, which will recurse. In that case, the COND_EXPR is probably the best choice, so leave it alone. */ && type == TREE_TYPE (arg0)) return pedantic_non_lvalue_loc (loc, arg0); the other half is guarded with !VECTOR_TYPE_P /* Convert A ? 0 : 1 to !A. This prefers the use of NOT_EXPR over COND_EXPR in cases such as floating point comparisons. */ if (integer_zerop (op1) && code == COND_EXPR && integer_onep (op2) && !VECTOR_TYPE_P (type) && truth_value_p (TREE_CODE (arg0))) return pedantic_non_lvalue_loc (loc, fold_convert_loc (loc, type, invert_truthvalue_loc (loc, arg0))); I'm going to test the original proposed simplification to fix the ICE for now. > (sorry, I may not be very reactive these days) (In reply to Richard Biener from comment #5) > (In reply to Marc Glisse from comment #4) > > I would be happy with a revert of that patch, if the ARM backend gets fixed, > > but indeed a missed optimization should not cause an ICE. > > Not sure what the ARM backend issue is. PR 96528 > Well, VEC_COND_EXPR (as some other VEC_ tree codes) are special in that > we are (try to...) be careful to only synthesize ones supported "directly" > by the target. After vector lowering, yes. But before that, the front-end can produce vec_cond_expr for vector types that are not supported. Ah, you probably meant synthesize them from optimization passes, ok. > For the mask vectors (VECTOR_BOOLEAN_TYPE_P, in the > AVX512/SVE case) I don't think the targets support ?: natively but they > have bitwise instructions for this case. That means we could 'simply' > expand mask x ? y : z as (y & x) | (z & ~x) I guess [requires x and y,z > to be of the same type of course]. I wondered whether we ever > need to translate between, say, vector<bool:1> and vector<bool:2> > where lowering ?: this way would require '[un]packing' one of the vectors. I still need to go back to the introduction of those types to understand why vector<bool:2> exists at all... > True, unless you go to bitwise ops. For scalar flag ? bool_a : bool_b > ?: isn't the natural representation either - at least I'm not aware > of any pattern transforming (a & b) | (c & ~b) to b ? a : c for > precision one integer types ;) There are PRs asking for this transformation (and for transformations that this one would enable). (In reply to Marc Glisse from comment #6) > (In reply to Richard Biener from comment #5) > > (In reply to Marc Glisse from comment #4) > > > I would be happy with a revert of that patch, if the ARM backend gets fixed, > > > but indeed a missed optimization should not cause an ICE. > > > > Not sure what the ARM backend issue is. > > PR 96528 > > > Well, VEC_COND_EXPR (as some other VEC_ tree codes) are special in that > > we are (try to...) be careful to only synthesize ones supported "directly" > > by the target. > > After vector lowering, yes. But before that, the front-end can produce > vec_cond_expr for vector types that are not supported. Ah, you probably > meant synthesize them from optimization passes, ok. Yeah. I'm also not sure whether the user can synthesize VEC_COND_EXPR for say AVX512 vector masks - you cannot even declare those "vector types", instead the intrinsic header uses typedef unsigned char __mmask8; typedef unsigned short __mmask16; maybe for SVE this is different, but the SVE header is "hidden" ;) /* NOTE: This implementation of arm_sve.h is intentionally short. It does not define the SVE types and intrinsic functions directly in C and C++ code, but instead uses the following pragma to tell GCC to insert the necessary type and function definitions itself. The net effect is the same, and the file is a complete implementation of arm_sve.h. */ #pragma GCC aarch64 "arm_sve.h" > > For the mask vectors (VECTOR_BOOLEAN_TYPE_P, in the > > AVX512/SVE case) I don't think the targets support ?: natively but they > > have bitwise instructions for this case. That means we could 'simply' > > expand mask x ? y : z as (y & x) | (z & ~x) I guess [requires x and y,z > > to be of the same type of course]. I wondered whether we ever > > need to translate between, say, vector<bool:1> and vector<bool:2> > > where lowering ?: this way would require '[un]packing' one of the vectors. > > I still need to go back to the introduction of those types to understand why > vector<bool:2> exists at all... > > > True, unless you go to bitwise ops. For scalar flag ? bool_a : bool_b > > ?: isn't the natural representation either - at least I'm not aware > > of any pattern transforming (a & b) | (c & ~b) to b ? a : c for > > precision one integer types ;) > > There are PRs asking for this transformation (and for transformations that > this one would enable). heh ;) The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:10843f8303509fcba880c6c05c08e4b4ccd24f36 commit r11-3426-g10843f8303509fcba880c6c05c08e4b4ccd24f36 Author: Richard Biener <rguenther@suse.de> Date: Thu Sep 24 10:14:33 2020 +0200 tree-optimization/97085 - fold some trivial bool vector ?: The following aovids the ICE in the testcase by doing some additional simplification of VEC_COND_EXPRs for VECTOR_BOOLEAN_TYPE_P which we don't really expect, esp. when they are not classical vectors, thus AVX512 or SVE masks. 2020-09-24 Richard Biener <rguenther@suse.de> PR tree-optimization/97085 * match.pd (mask ? { false,..} : { true, ..} -> ~mask): New. * gcc.dg/vect/pr97085.c: New testcase. Fixed. *** Bug 97192 has been marked as a duplicate of this bug. *** The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:a8d5c28233f95e3474ee8cbc4d341cbb43ab7bb6 commit r11-3431-ga8d5c28233f95e3474ee8cbc4d341cbb43ab7bb6 Author: Richard Biener <rguenther@suse.de> Date: Thu Sep 24 13:27:49 2020 +0200 target/97192 - new testcase for fixed PR This adds another testcase for the PR97085 fix. 2020-09-24 Richard Biener <rguenther@suse.de> PR tree-optimization/97085 * gcc.dg/pr97192.c: New testcase. |