Bug 97236 - [8 Regression] g:e93428a8b056aed83a7678 triggers vlc miscompile
Summary: [8 Regression] g:e93428a8b056aed83a7678 triggers vlc miscompile
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.2.0
: P2 normal
Target Milestone: 8.5
Assignee: Richard Biener
URL:
Keywords: wrong-code
: 85804 98932 98949 99375 (view as bug list)
Depends on:
Blocks: 97043
  Show dependency treegraph
 
Reported: 2020-09-29 07:12 UTC by Richard Biener
Modified: 2024-02-28 08:09 UTC (History)
9 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work: 10.1.0, 11.0, 9.3.1
Known to fail: 10.2.0, 8.4.0, 9.3.0
Last reconfirmed: 2020-09-29 00:00:00


Attachments
Reduced test-case (444 bytes, text/plain)
2020-09-29 13:05 UTC, Martin Liška
Details
Another test-case (366 bytes, text/plain)
2020-09-29 13:05 UTC, Martin Liška
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2020-09-29 07:12:49 UTC
According to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=971027

No testcase or confirmation yet.
Comment 1 Martin Liška 2020-09-29 13:05:13 UTC
Created attachment 49287 [details]
Reduced test-case

Reduced test-case that started to fail on master with r11-205-gbc484e250990393e with -O3.
Comment 2 Martin Liška 2020-09-29 13:05:57 UTC
Created attachment 49288 [details]
Another test-case

Another test-case that started to fail with r10-1361-g9f962469cabc7fdc.
Comment 3 Richard Biener 2020-09-29 13:46:26 UTC
typedef unsigned char __uint8_t;
typedef __uint8_t uint8_t;
typedef struct plane_t {
  uint8_t *p_pixels;
  int i_lines;
  int i_pitch;
} plane_t;

typedef struct {
  plane_t p[5];
} picture_t;

#define N 4

void __attribute__((noipa))
picture_Clone(picture_t *picture, picture_t *res)
{
  for (int i = 0; i < N; i++) {
    res->p[i].p_pixels = picture->p[i].p_pixels;
    res->p[i].i_lines = picture->p[i].i_lines;
    res->p[i].i_pitch = picture->p[i].i_pitch;
  }
}

int
main()
{
  picture_t aaa, bbb;
  uint8_t pixels[10] = {1, 1, 1, 1, 1, 1, 1, 1};

  for (unsigned i = 0; i < N; i++)
    aaa.p[i].p_pixels = pixels;

  picture_Clone (&aaa, &bbb);

  uint8_t c;
  for (unsigned i = 0; i < N; i++)
    c += bbb.p[i].p_pixels[0];

  if (c != N)
    __builtin_abort ();
  return 0;
}

ends up with a NULL pointer in bb.p[1].p_pixels
Comment 4 Richard Biener 2020-09-29 14:14:02 UTC
So what goes wrong is the single-element interleaving code-gen for the pointer copy.  We have

t.c:18:21: note:   Detected single element interleaving picture_7(D)->p[i_18].p_pixels step 16

but for the store:

t.c:18:21: missed:   not consecutive access res_8(D)->p[i_18].p_pixels = _1;
t.c:18:21: note:   using strided accesses

...

t.c:18:21: note:   ==> examining statement: _1 = picture_7(D)->p[i_18].p_pixels;
t.c:18:21: note:   vect_model_load_cost: aligned.
t.c:18:21: note:   vect_model_load_cost: inside_cost = 24, prologue_cost = 0 .

and in group get-load-store type we handle it as (V1DI)

      if (!STMT_VINFO_STRIDED_P (first_stmt_info)
          && (can_overrun_p || !would_overrun_p)
          && compare_step_with_zero (vinfo, stmt_info) > 0)
        {
          /* First cope with the degenerate case of a single-element
             vector.  */
          if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
            *memory_access_type = VMAT_CONTIGUOUS;

looks like this needs to be conditional on gap == 0?  Adding that fixes
the testcase.  This was added by g:6737facbb3c53a1f0158b05e4116c161ed9bc319
Richard?  It looks like the !STMT_VINFO_STRIDED_P check might have supposed
to prevent this?  In vectorizable_load we're also doing

  if (memory_access_type == VMAT_GATHER_SCATTER
      || (!slp && memory_access_type == VMAT_CONTIGUOUS))
    grouped_load = false;

but vect_transform_grouped_load doesn't like this case, possibly because
there's nothing to "permute" (eliding that alone doesn't fix the code-gen
issue).
Comment 5 Richard Biener 2020-09-30 09:20:02 UTC
(In reply to Richard Biener from comment #4)
> So what goes wrong is the single-element interleaving code-gen for the
> pointer copy.  We have
> 
> t.c:18:21: note:   Detected single element interleaving
> picture_7(D)->p[i_18].p_pixels step 16
> 
> but for the store:
> 
> t.c:18:21: missed:   not consecutive access res_8(D)->p[i_18].p_pixels = _1;
> t.c:18:21: note:   using strided accesses
> 
> ...
> 
> t.c:18:21: note:   ==> examining statement: _1 =
> picture_7(D)->p[i_18].p_pixels;
> t.c:18:21: note:   vect_model_load_cost: aligned.
> t.c:18:21: note:   vect_model_load_cost: inside_cost = 24, prologue_cost = 0
> .
> 
> and in group get-load-store type we handle it as (V1DI)
> 
>       if (!STMT_VINFO_STRIDED_P (first_stmt_info)
>           && (can_overrun_p || !would_overrun_p)
>           && compare_step_with_zero (vinfo, stmt_info) > 0)
>         {
>           /* First cope with the degenerate case of a single-element
>              vector.  */
>           if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
>             *memory_access_type = VMAT_CONTIGUOUS;

So both doing && gap == 0 here and removing this special-case alltogether
passes bootstrap / regtest on x86_64.

I have no idea why the special case was needed in the first place?
Was the load-lanes code confused?  I think VMAT_ELEMENTWISE for
single-element vectors is a good enough match?  What's the advantage
of VMAT_CONTIGUOUS here?
Comment 6 GCC Commits 2020-10-01 13:58:39 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:04b99da898a9817e72fedb4063589648b7961ac5

commit r11-3594-g04b99da898a9817e72fedb4063589648b7961ac5
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Oct 1 15:12:35 2020 +0200

    tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUS
    
    This avoids using VMAT_CONTIGUOUS with single-element interleaving
    when using V1mode vectors.  Instead keep VMAT_ELEMENTWISE but
    continue to avoid load-lanes and gathers.
    
    2020-10-01  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/97236
            * tree-vect-stmts.c (get_group_load_store_type): Keep
            VMAT_ELEMENTWISE for single-element vectors.
    
            * gcc.dg/vect/pr97236.c: New testcase.
Comment 7 Richard Biener 2020-10-01 13:59:28 UTC
Fixed on trunk sofar.
Comment 8 GCC Commits 2020-10-06 11:42:42 UTC
The releases/gcc-10 branch has been updated by Matthias Klose <doko@gcc.gnu.org>:

https://gcc.gnu.org/g:1ab88985631dd2c5a5e3b5c0dce47cf8b6ed2f82

commit r10-8859-g1ab88985631dd2c5a5e3b5c0dce47cf8b6ed2f82
Author: Matthias Klose <doko@ubuntu.com>
Date:   Tue Oct 6 13:41:37 2020 +0200

    Backport fix for PR/tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUS
    
    This avoids using VMAT_CONTIGUOUS with single-element interleaving
    when using V1mode vectors.  Instead keep VMAT_ELEMENTWISE but
    continue to avoid load-lanes and gathers.
    
    2020-10-01  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/97236
            * tree-vect-stmts.c (get_group_load_store_type): Keep
            VMAT_ELEMENTWISE for single-element vectors.
    
            * gcc.dg/vect/pr97236.c: New testcase.
Comment 9 Richard Biener 2020-10-06 12:20:26 UTC
Fixed.
Comment 10 Richard Biener 2021-02-03 08:25:51 UTC
*** Bug 98932 has been marked as a duplicate of this bug. ***
Comment 11 Richard Biener 2021-02-03 08:26:47 UTC
Reopen for more backports.
Comment 12 GCC Commits 2021-02-05 15:43:43 UTC
The releases/gcc-9 branch has been updated by Kyrylo Tkachov <ktkachov@gcc.gnu.org>:

https://gcc.gnu.org/g:97b668f9a8c6ec565c278a60e7d1492a6932e409

commit r9-9224-g97b668f9a8c6ec565c278a60e7d1492a6932e409
Author: Matthias Klose <doko@ubuntu.com>
Date:   Tue Oct 6 13:41:37 2020 +0200

    Backport fix for PR/tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUS
    
    This avoids using VMAT_CONTIGUOUS with single-element interleaving
    when using V1mode vectors.  Instead keep VMAT_ELEMENTWISE but
    continue to avoid load-lanes and gathers.
    
    2020-10-01  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/97236
            * tree-vect-stmts.c (get_group_load_store_type): Keep
            VMAT_ELEMENTWISE for single-element vectors.
    
            * gcc.dg/vect/pr97236.c: New testcase.
    
    (cherry picked from commit 1ab88985631dd2c5a5e3b5c0dce47cf8b6ed2f82)
Comment 13 ktkachov 2021-02-05 15:44:42 UTC
*** Bug 98949 has been marked as a duplicate of this bug. ***
Comment 14 ktkachov 2021-02-05 15:45:36 UTC
Fixed for GCC 9.4
Comment 15 GCC Commits 2021-02-08 10:33:41 UTC
The releases/gcc-8 branch has been updated by Kyrylo Tkachov <ktkachov@gcc.gnu.org>:

https://gcc.gnu.org/g:de0ede7625f6c4d4bbd2caaf363032b0da80cf69

commit r8-10757-gde0ede7625f6c4d4bbd2caaf363032b0da80cf69
Author: Matthias Klose <doko@ubuntu.com>
Date:   Tue Oct 6 13:41:37 2020 +0200

    Backport fix for PR/tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUS
    
    This avoids using VMAT_CONTIGUOUS with single-element interleaving
    when using V1mode vectors.  Instead keep VMAT_ELEMENTWISE but
    continue to avoid load-lanes and gathers.
    
    2020-10-01  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/97236
            * tree-vect-stmts.c (get_group_load_store_type): Keep
            VMAT_ELEMENTWISE for single-element vectors.
    
            * gcc.dg/vect/pr97236.c: New testcase.
    
    (cherry picked from commit 1ab88985631dd2c5a5e3b5c0dce47cf8b6ed2f82)
Comment 16 Richard Biener 2021-02-08 10:43:19 UTC
Fixed.
Comment 17 Martin Liška 2021-03-04 08:06:47 UTC
*** Bug 99375 has been marked as a duplicate of this bug. ***
Comment 18 Richard Biener 2021-04-19 06:31:39 UTC
*** Bug 85804 has been marked as a duplicate of this bug. ***