Cat test.c, refer to https://godbolt.org/z/fx9cMq --- void swap(short *p, int cnt) { cnt = 1000; while (cnt-- > 0) { *p = ((*p << 8) & 0xFF00) | ((*p >> 8) & 0x00FF); ++p; } } ---- test.c:3:16: missed: couldn't vectorize loop test.c:4:8: missed: not vectorized: compilation time alias: load_dst_23 = MEM[(short int *)p_21]; *p_21 = _8; but they have the same address, and of course they alias. --- <bb 3> [local count: 1063004409]: # p_21 = PHI <p_16(5), p_12(D)(2)> # cnt_24 = PHI <cnt_14(5), 999(2)> # ivtmp_26 = PHI <ivtmp_2(5), 1000(2)> load_dst_23 = MEM[(short int *)p_21]; bswapdst_11 = load_dst_23 r>> 8; _8 = (short int) bswapdst_11; *p_21 = _8; p_16 = p_21 + 2; cnt_14 = cnt_24 + -1; ivtmp_2 = ivtmp_26 - 1; if (ivtmp_2 != 0) ---- So does vectorizer assume there won't be 2 DRs with same reference? Successfully vectorized with below hack --- modified gcc/tree-vect-data-refs.c @@ -3302,6 +3302,10 @@ vect_compile_time_alias (dr_vec_info *a, dr_vec_info *b, const_length_a += access_size_a; const_length_b += access_size_b; + /* It's ok for the same reference. */ + if (known_eq (const_length_a, const_length_b)) + return 0; + if (ranges_known_overlap_p (offset_a, const_length_a, offset_b, const_length_b)) return 1; ---
(compute_affine_dependence stmt_a: load_dst_23 = MEM[(short int *)p_21]; stmt_b: *p_21 = _8; ) -> dependence analysis failed x.c:1:6: missed: versioning for alias required: can't determine dependence between MEM[(short int *)p_21] and *p_21 consider run-time aliasing test between MEM[(short int *)p_21] and *p_21 so the issue is that dependence analysis fails. A dependence distance of zero would be OK. Above we see two different base objects, one accesses the ref as 'short int' and one as 'unsigned short int' (but with short int alias type). We're doing bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a && full_seq.start_b + full_seq.length == num_dimensions_b && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) && types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b)) && (!loop_nest.exists () || (object_address_invariant_in_loop_p (loop_nest[0], base_a)))); and "fail" the types_compatible_p test which for non-aggregate types is probably too strict and could be relaxed to at least consider signed/unsigned type variants as the same. Maybe for non-aggregate types just compare the mode. Richard, you refactored this code last? For example diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index 394470af757..1853f4b4a07 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -3272,8 +3272,11 @@ initialize_data_dependence_relation (struct data_reference *a, && full_seq.start_b + full_seq.length == num_dimensions_b && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) - && types_compatible_p (TREE_TYPE (base_a), - TREE_TYPE (base_b)) + && ((!AGGREGATE_TYPE_P (TREE_TYPE (base_a)) + && !AGGREGATE_TYPE_P (TREE_TYPE (base_b)) + && TYPE_MODE (TREE_TYPE (base_a)) == TYPE_MODE (TREE_TYPE (base_b))) + || types_compatible_p (TREE_TYPE (base_a), + TREE_TYPE (base_b))) && (!loop_nest.exists () || (object_address_invariant_in_loop_p (loop_nest[0], base_a)))); makes this vectorized but in the end the same_base_p check is supposed to verify whether access functions generated are comparable.
Mine.
While GCC 9 vectorizes this case (and thus the vectorization failure is a regression) dependence analysis isn't presented with the problematical access but instead we see _1 = *p_22; _5 = (unsigned short) _1; bswapdst_10 = _5 r>> 8; _8 = (short int) bswapdst_10; *p_22 = _8; where the problematical access is created by the bswap pass which detects 16 bit bswap implementation found at: _8 = _4 | _7; creates the replacement load but then fails half-way, not emitting a bswap!? That's a bug worth fixing IMHO (either do all or none of the transform). I'm nevertheless testing a patch to improve dependence analysis.
(In reply to Richard Biener from comment #3) > While GCC 9 vectorizes this case (and thus the vectorization failure is a > regression) dependence analysis isn't presented with the problematical access > but instead we see > > _1 = *p_22; > _5 = (unsigned short) _1; > bswapdst_10 = _5 r>> 8; > _8 = (short int) bswapdst_10; > *p_22 = _8; > > where the problematical access is created by the bswap pass which > detects > > 16 bit bswap implementation found at: _8 = _4 | _7; > > creates the replacement load but then fails half-way, not emitting a > bswap!? That's a bug worth fixing IMHO (either do all or none of the > transform). Oh, so it produces load_dst_11 = MEM[(short int *)p_22]; bswapdst_10 = load_dst_11 r>> 8; _8 = (short int) bswapdst_10; *p_22 = _8; it's just pattern recog that turns the rotate back to a shift sequence. So indeed dependence analysis is what should be fixed.
(In reply to Richard Biener from comment #1) > Richard, you refactored this code last? Sorry, forgot about this PR. I'm a bit rusty on this, but how important is the type/mode check for non-aggregates? Isn't the operand_equal_p test enough? If not, should the check be based on size rather than mode for non-aggregates?
On Thu, 14 Jan 2021, rsandifo at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98674 > > --- Comment #5 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #1) > > Richard, you refactored this code last? > Sorry, forgot about this PR. I'm a bit rusty on this, but how > important is the type/mode check for non-aggregates? Isn't the > operand_equal_p test enough? > > If not, should the check be based on size rather than mode > for non-aggregates? Yeah, I'm currently testing a patch, will post it and CC you.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:2182274f510c180ea92a4f826a0f6cf5f1f55b66 commit r11-6670-g2182274f510c180ea92a4f826a0f6cf5f1f55b66 Author: Richard Biener <rguenther@suse.de> Date: Thu Jan 14 14:08:41 2021 +0100 tree-optimization/98674 - improve dependence analysis This improves dependence analysis on refs that access the same array but with different typed but same sized accesses. That's obviously safe for the case of types that cannot have any access function based off them. For the testcase this is signed short vs. unsigned short. 2021-01-14 Richard Biener <rguenther@suse.de> PR tree-optimization/98674 * tree-data-ref.c (base_supports_access_fn_components_p): New. (initialize_data_dependence_relation): For two bases without possible access fns resort to type size equality when determining shape compatibility. * gcc.dg/vect/pr98674.c: New testcase.
Fixed on trunk sofar.
Fixed in GCC11.
But eventually needs backporting.
GCC 10.3 is being released, retargeting bugs to GCC 10.4.
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Fixed in GCC 11.