Bug 98674 - [10 Regression] vectorizer failed for compilation time alias
Summary: [10 Regression] vectorizer failed for compilation time alias
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.0
: P2 normal
Target Milestone: 10.4
Assignee: Richard Biener
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2021-01-14 10:19 UTC by Hongtao.liu
Modified: 2021-05-04 12:31 UTC (History)
3 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target:
Build:
Known to work: 11.0, 9.3.0
Known to fail: 10.2.1
Last reconfirmed: 2021-01-14 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hongtao.liu 2021-01-14 10:19:49 UTC
Cat test.c, refer to https://godbolt.org/z/fx9cMq
---
void swap(short *p, int cnt) {
    cnt = 1000;
    while (cnt-- > 0) {
        *p = ((*p << 8) & 0xFF00) |
             ((*p >> 8) & 0x00FF);
        ++p;
    }
}
----


test.c:3:16: missed: couldn't vectorize loop
test.c:4:8: missed: not vectorized: compilation time alias: load_dst_23 = MEM[(short int *)p_21];
*p_21 = _8;

but they have the same address, and of course they alias.

---
  <bb 3> [local count: 1063004409]:
  # p_21 = PHI <p_16(5), p_12(D)(2)>
  # cnt_24 = PHI <cnt_14(5), 999(2)>
  # ivtmp_26 = PHI <ivtmp_2(5), 1000(2)>
  load_dst_23 = MEM[(short int *)p_21];
  bswapdst_11 = load_dst_23 r>> 8;
  _8 = (short int) bswapdst_11;
  *p_21 = _8;
  p_16 = p_21 + 2;
  cnt_14 = cnt_24 + -1;
  ivtmp_2 = ivtmp_26 - 1;
  if (ivtmp_2 != 0)
----

So does vectorizer assume there won't be 2 DRs with same reference?

Successfully vectorized with below hack
---
modified   gcc/tree-vect-data-refs.c
@@ -3302,6 +3302,10 @@ vect_compile_time_alias (dr_vec_info *a, dr_vec_info *b,
   const_length_a += access_size_a;
   const_length_b += access_size_b;
 
+  /* It's ok for the same reference.  */
+  if (known_eq (const_length_a, const_length_b))
+    return 0;
+
   if (ranges_known_overlap_p (offset_a, const_length_a,
 			      offset_b, const_length_b))
     return 1;

---
Comment 1 Richard Biener 2021-01-14 10:55:39 UTC
(compute_affine_dependence
  stmt_a: load_dst_23 = MEM[(short int *)p_21];
  stmt_b: *p_21 = _8;
) -> dependence analysis failed
x.c:1:6: missed:   versioning for alias required: can't determine dependence between MEM[(short int *)p_21] and *p_21
consider run-time aliasing test between MEM[(short int *)p_21] and *p_21

so the issue is that dependence analysis fails.  A dependence distance of
zero would be OK.  Above we see two different base objects, one accesses
the ref as 'short int' and one as 'unsigned short int' (but with
short int alias type).  We're doing

  bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a
                      && full_seq.start_b + full_seq.length == num_dimensions_b
                      && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
                      && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
                      && types_compatible_p (TREE_TYPE (base_a),
                                             TREE_TYPE (base_b))
                      && (!loop_nest.exists ()
                          || (object_address_invariant_in_loop_p
                              (loop_nest[0], base_a))));

and "fail" the types_compatible_p test which for non-aggregate types is
probably too strict and could be relaxed to at least consider
signed/unsigned type variants as the same.  Maybe for non-aggregate
types just compare the mode.

Richard, you refactored this code last?

For example

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 394470af757..1853f4b4a07 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -3272,8 +3272,11 @@ initialize_data_dependence_relation (struct data_reference *a,
                      && full_seq.start_b + full_seq.length == num_dimensions_b
                      && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
                      && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
-                     && types_compatible_p (TREE_TYPE (base_a),
-                                            TREE_TYPE (base_b))
+                     && ((!AGGREGATE_TYPE_P (TREE_TYPE (base_a))
+                          && !AGGREGATE_TYPE_P (TREE_TYPE (base_b))
+                          && TYPE_MODE (TREE_TYPE (base_a)) == TYPE_MODE (TREE_TYPE (base_b)))
+                         || types_compatible_p (TREE_TYPE (base_a),
+                                                TREE_TYPE (base_b)))
                      && (!loop_nest.exists ()
                          || (object_address_invariant_in_loop_p
                              (loop_nest[0], base_a))));

makes this vectorized but in the end the same_base_p check is supposed to
verify whether access functions generated are comparable.
Comment 2 Richard Biener 2021-01-14 12:50:31 UTC
Mine.
Comment 3 Richard Biener 2021-01-14 13:19:42 UTC
While GCC 9 vectorizes this case (and thus the vectorization failure is a regression) dependence analysis isn't presented with the problematical access
but instead we see

  _1 = *p_22;
  _5 = (unsigned short) _1;
  bswapdst_10 = _5 r>> 8;
  _8 = (short int) bswapdst_10;
  *p_22 = _8;

where the problematical access is created by the bswap pass which
detects

16 bit bswap implementation found at: _8 = _4 | _7;

creates the replacement load but then fails half-way, not emitting a
bswap!?  That's a bug worth fixing IMHO (either do all or none of the
transform).

I'm nevertheless testing a patch to improve dependence analysis.
Comment 4 Richard Biener 2021-01-14 13:26:32 UTC
(In reply to Richard Biener from comment #3)
> While GCC 9 vectorizes this case (and thus the vectorization failure is a
> regression) dependence analysis isn't presented with the problematical access
> but instead we see
> 
>   _1 = *p_22;
>   _5 = (unsigned short) _1;
>   bswapdst_10 = _5 r>> 8;
>   _8 = (short int) bswapdst_10;
>   *p_22 = _8;
> 
> where the problematical access is created by the bswap pass which
> detects
> 
> 16 bit bswap implementation found at: _8 = _4 | _7;
> 
> creates the replacement load but then fails half-way, not emitting a
> bswap!?  That's a bug worth fixing IMHO (either do all or none of the
> transform).

Oh, so it produces

  load_dst_11 = MEM[(short int *)p_22];
  bswapdst_10 = load_dst_11 r>> 8;
  _8 = (short int) bswapdst_10;
  *p_22 = _8;

it's just pattern recog that turns the rotate back to a shift sequence.  So
indeed dependence analysis is what should be fixed.
Comment 5 rsandifo@gcc.gnu.org 2021-01-14 13:27:29 UTC
(In reply to Richard Biener from comment #1)
> Richard, you refactored this code last?
Sorry, forgot about this PR.  I'm a bit rusty on this, but how
important is the type/mode check for non-aggregates?  Isn't the
operand_equal_p test enough?

If not, should the check be based on size rather than mode
for non-aggregates?
Comment 6 rguenther@suse.de 2021-01-14 13:38:01 UTC
On Thu, 14 Jan 2021, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98674
> 
> --- Comment #5 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #1)
> > Richard, you refactored this code last?
> Sorry, forgot about this PR.  I'm a bit rusty on this, but how
> important is the type/mode check for non-aggregates?  Isn't the
> operand_equal_p test enough?
> 
> If not, should the check be based on size rather than mode
> for non-aggregates?

Yeah, I'm currently testing a patch, will post it and CC you.
Comment 7 CVS Commits 2021-01-14 15:14:13 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:2182274f510c180ea92a4f826a0f6cf5f1f55b66

commit r11-6670-g2182274f510c180ea92a4f826a0f6cf5f1f55b66
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Jan 14 14:08:41 2021 +0100

    tree-optimization/98674 - improve dependence analysis
    
    This improves dependence analysis on refs that access the same
    array but with different typed but same sized accesses.  That's
    obviously safe for the case of types that cannot have any
    access function based off them.  For the testcase this is
    signed short vs. unsigned short.
    
    2021-01-14  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/98674
            * tree-data-ref.c (base_supports_access_fn_components_p): New.
            (initialize_data_dependence_relation): For two bases without
            possible access fns resort to type size equality when determining
            shape compatibility.
    
            * gcc.dg/vect/pr98674.c: New testcase.
Comment 8 Richard Biener 2021-01-14 15:14:39 UTC
Fixed on trunk sofar.
Comment 9 Hongtao.liu 2021-01-15 02:49:33 UTC
Fixed in GCC11.
Comment 10 Richard Biener 2021-01-15 07:56:13 UTC
But eventually needs backporting.
Comment 11 Richard Biener 2021-04-08 12:02:20 UTC
GCC 10.3 is being released, retargeting bugs to GCC 10.4.