According to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=971027 No testcase or confirmation yet.
Created attachment 49287 [details] Reduced test-case Reduced test-case that started to fail on master with r11-205-gbc484e250990393e with -O3.
Created attachment 49288 [details] Another test-case Another test-case that started to fail with r10-1361-g9f962469cabc7fdc.
typedef unsigned char __uint8_t; typedef __uint8_t uint8_t; typedef struct plane_t { uint8_t *p_pixels; int i_lines; int i_pitch; } plane_t; typedef struct { plane_t p[5]; } picture_t; #define N 4 void __attribute__((noipa)) picture_Clone(picture_t *picture, picture_t *res) { for (int i = 0; i < N; i++) { res->p[i].p_pixels = picture->p[i].p_pixels; res->p[i].i_lines = picture->p[i].i_lines; res->p[i].i_pitch = picture->p[i].i_pitch; } } int main() { picture_t aaa, bbb; uint8_t pixels[10] = {1, 1, 1, 1, 1, 1, 1, 1}; for (unsigned i = 0; i < N; i++) aaa.p[i].p_pixels = pixels; picture_Clone (&aaa, &bbb); uint8_t c; for (unsigned i = 0; i < N; i++) c += bbb.p[i].p_pixels[0]; if (c != N) __builtin_abort (); return 0; } ends up with a NULL pointer in bb.p[1].p_pixels
So what goes wrong is the single-element interleaving code-gen for the pointer copy. We have t.c:18:21: note: Detected single element interleaving picture_7(D)->p[i_18].p_pixels step 16 but for the store: t.c:18:21: missed: not consecutive access res_8(D)->p[i_18].p_pixels = _1; t.c:18:21: note: using strided accesses ... t.c:18:21: note: ==> examining statement: _1 = picture_7(D)->p[i_18].p_pixels; t.c:18:21: note: vect_model_load_cost: aligned. t.c:18:21: note: vect_model_load_cost: inside_cost = 24, prologue_cost = 0 . and in group get-load-store type we handle it as (V1DI) if (!STMT_VINFO_STRIDED_P (first_stmt_info) && (can_overrun_p || !would_overrun_p) && compare_step_with_zero (vinfo, stmt_info) > 0) { /* First cope with the degenerate case of a single-element vector. */ if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U)) *memory_access_type = VMAT_CONTIGUOUS; looks like this needs to be conditional on gap == 0? Adding that fixes the testcase. This was added by g:6737facbb3c53a1f0158b05e4116c161ed9bc319 Richard? It looks like the !STMT_VINFO_STRIDED_P check might have supposed to prevent this? In vectorizable_load we're also doing if (memory_access_type == VMAT_GATHER_SCATTER || (!slp && memory_access_type == VMAT_CONTIGUOUS)) grouped_load = false; but vect_transform_grouped_load doesn't like this case, possibly because there's nothing to "permute" (eliding that alone doesn't fix the code-gen issue).
(In reply to Richard Biener from comment #4) > So what goes wrong is the single-element interleaving code-gen for the > pointer copy. We have > > t.c:18:21: note: Detected single element interleaving > picture_7(D)->p[i_18].p_pixels step 16 > > but for the store: > > t.c:18:21: missed: not consecutive access res_8(D)->p[i_18].p_pixels = _1; > t.c:18:21: note: using strided accesses > > ... > > t.c:18:21: note: ==> examining statement: _1 = > picture_7(D)->p[i_18].p_pixels; > t.c:18:21: note: vect_model_load_cost: aligned. > t.c:18:21: note: vect_model_load_cost: inside_cost = 24, prologue_cost = 0 > . > > and in group get-load-store type we handle it as (V1DI) > > if (!STMT_VINFO_STRIDED_P (first_stmt_info) > && (can_overrun_p || !would_overrun_p) > && compare_step_with_zero (vinfo, stmt_info) > 0) > { > /* First cope with the degenerate case of a single-element > vector. */ > if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U)) > *memory_access_type = VMAT_CONTIGUOUS; So both doing && gap == 0 here and removing this special-case alltogether passes bootstrap / regtest on x86_64. I have no idea why the special case was needed in the first place? Was the load-lanes code confused? I think VMAT_ELEMENTWISE for single-element vectors is a good enough match? What's the advantage of VMAT_CONTIGUOUS here?
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:04b99da898a9817e72fedb4063589648b7961ac5 commit r11-3594-g04b99da898a9817e72fedb4063589648b7961ac5 Author: Richard Biener <rguenther@suse.de> Date: Thu Oct 1 15:12:35 2020 +0200 tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUS This avoids using VMAT_CONTIGUOUS with single-element interleaving when using V1mode vectors. Instead keep VMAT_ELEMENTWISE but continue to avoid load-lanes and gathers. 2020-10-01 Richard Biener <rguenther@suse.de> PR tree-optimization/97236 * tree-vect-stmts.c (get_group_load_store_type): Keep VMAT_ELEMENTWISE for single-element vectors. * gcc.dg/vect/pr97236.c: New testcase.
Fixed on trunk sofar.
The releases/gcc-10 branch has been updated by Matthias Klose <doko@gcc.gnu.org>: https://gcc.gnu.org/g:1ab88985631dd2c5a5e3b5c0dce47cf8b6ed2f82 commit r10-8859-g1ab88985631dd2c5a5e3b5c0dce47cf8b6ed2f82 Author: Matthias Klose <doko@ubuntu.com> Date: Tue Oct 6 13:41:37 2020 +0200 Backport fix for PR/tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUS This avoids using VMAT_CONTIGUOUS with single-element interleaving when using V1mode vectors. Instead keep VMAT_ELEMENTWISE but continue to avoid load-lanes and gathers. 2020-10-01 Richard Biener <rguenther@suse.de> PR tree-optimization/97236 * tree-vect-stmts.c (get_group_load_store_type): Keep VMAT_ELEMENTWISE for single-element vectors. * gcc.dg/vect/pr97236.c: New testcase.
Fixed.
*** Bug 98932 has been marked as a duplicate of this bug. ***
Reopen for more backports.
The releases/gcc-9 branch has been updated by Kyrylo Tkachov <ktkachov@gcc.gnu.org>: https://gcc.gnu.org/g:97b668f9a8c6ec565c278a60e7d1492a6932e409 commit r9-9224-g97b668f9a8c6ec565c278a60e7d1492a6932e409 Author: Matthias Klose <doko@ubuntu.com> Date: Tue Oct 6 13:41:37 2020 +0200 Backport fix for PR/tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUS This avoids using VMAT_CONTIGUOUS with single-element interleaving when using V1mode vectors. Instead keep VMAT_ELEMENTWISE but continue to avoid load-lanes and gathers. 2020-10-01 Richard Biener <rguenther@suse.de> PR tree-optimization/97236 * tree-vect-stmts.c (get_group_load_store_type): Keep VMAT_ELEMENTWISE for single-element vectors. * gcc.dg/vect/pr97236.c: New testcase. (cherry picked from commit 1ab88985631dd2c5a5e3b5c0dce47cf8b6ed2f82)
*** Bug 98949 has been marked as a duplicate of this bug. ***
Fixed for GCC 9.4
The releases/gcc-8 branch has been updated by Kyrylo Tkachov <ktkachov@gcc.gnu.org>: https://gcc.gnu.org/g:de0ede7625f6c4d4bbd2caaf363032b0da80cf69 commit r8-10757-gde0ede7625f6c4d4bbd2caaf363032b0da80cf69 Author: Matthias Klose <doko@ubuntu.com> Date: Tue Oct 6 13:41:37 2020 +0200 Backport fix for PR/tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUS This avoids using VMAT_CONTIGUOUS with single-element interleaving when using V1mode vectors. Instead keep VMAT_ELEMENTWISE but continue to avoid load-lanes and gathers. 2020-10-01 Richard Biener <rguenther@suse.de> PR tree-optimization/97236 * tree-vect-stmts.c (get_group_load_store_type): Keep VMAT_ELEMENTWISE for single-element vectors. * gcc.dg/vect/pr97236.c: New testcase. (cherry picked from commit 1ab88985631dd2c5a5e3b5c0dce47cf8b6ed2f82)
*** Bug 99375 has been marked as a duplicate of this bug. ***
*** Bug 85804 has been marked as a duplicate of this bug. ***