116575 – [15 Regression] blender in SPEC2017 fails to use mask_load_lanes

Bug 116575 - [15 Regression] blender in SPEC2017 fails to use mask_load_lanes

Summary: [15 Regression] blender in SPEC2017 fails to use mask_load_lanes

Status:	ASSIGNED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	15.0

Importance:	P3 normal
Target Milestone:	15.0
Assignee:	Richard Biener

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	116578
	Show dependency tree / graph

Reported:	2024-09-03 05:59 UTC by Tamar Christina
Modified:	2024-09-06 07:40 UTC (History)
CC List:	2 users (show)

See Also:	116628
Host:
Target:	aarch64*
Build:
Known to work:
Known to fail:
Last reconfirmed:	2024-09-03 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tamar Christina 2024-09-03 05:59:19 UTC

Blender from spec2017 ICEs when compiled with -Ofast -flto -mcpu=neoverse-v1 with

during GIMPLE pass: vect
blender/source/blender/editors/object/object_bake_api.c: In function 'write_internal_bake_pixels':
blender/source/blender/editors/object/object_bake_api.c:173:13: internal compiler error: in vect_analyze_slp, at tree-vect-slp.cc:4765
  173 | static bool write_internal_bake_pixels(
      |             ^
0x1c0a0f7 internal_error(char const*, ...)
	/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/diagnostic-global-context.cc:492
0x7bb0c7 fancy_abort(char const*, int, char const*)
	/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/diagnostic.cc:1658
0xfaf1bb vect_analyze_slp(vec_info*, unsigned int)
	/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:4765
0xf83a6b vect_analyze_loop_2
	/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-loop.cc:2862
0xf85123 vect_analyze_loop_1
	/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-loop.cc:3409
0xf85857 vect_analyze_loop(loop*, vec_info_shared*)
	/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-loop.cc:3567
0xfc3cef try_vectorize_loop_1
	/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vectorizer.cc:1068
0xfc3cef try_vectorize_loop
	/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vectorizer.cc:1184
0xfc4223 execute
	/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vectorizer.cc:1300

Creating a reducer and bisecting

Comment 1 Tamar Christina 2024-09-03 06:21:46 UTC

---
int a;
float *b, *c;
void d() {
  char *e;
  for (; a; a++, b += 4, c += 4)
    if (*e++) {
      float *f = c;
      f[0] = b[0];
      f[1] = b[1];
      f[2] = b[2];
      f[3] = b[3];
    }
}

compiled with -mcpu=neoverse-v1 -Ofast reproduces the ICE

Comment 2 Andrew Pinski 2024-09-03 06:26:37 UTC

(In reply to Tamar Christina from comment #1)
> compiled with -mcpu=neoverse-v1 -Ofast reproduces the ICE

here is a better testcase with no undefinedness:
```
int a;
float *b, *c;
void d(char * __restrict e) {
  for (; a; a++, b += 4, c += 4)
    if (*e++) {
      float *f = c;
      f[0] = b[0];
      f[1] = b[1];
      f[2] = b[2];
      f[3] = b[3];
    }
}
```

Just `-O3  -march=armv9-a` is needed.

Comment 3 Richard Biener 2024-09-03 06:57:29 UTC

I will have a look.

Comment 4 Richard Biener 2024-09-03 07:21:46 UTC

OK, so we fail single-lane SLP discovery where we succeeded with multi-lane.
This is because the loads appear permuted during discovery and we have a
masked load feeding a masked store.  But we do not handle permuting masked
operations so for single-lane discovery it appears as such.

I'll for now avoid the situation and leave the actual fix for later.

Comment 5 GCC Commits 2024-09-03 09:31:29 UTC

The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:ac6cd62a351a8f1f3637a2552c74eb5eb51cfdda

commit r15-3411-gac6cd62a351a8f1f3637a2552c74eb5eb51cfdda
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Sep 3 09:23:20 2024 +0200

    tree-optimization/116575 - avoid ICE with SLP mask_load_lane
    
    The following avoids performing re-discovery with single lanes in
    the attempt to for the use of mask_load_lane as rediscovery will
    fail since a single lane of a mask load will appear permuted which
    isn't supported.
    
            PR tree-optimization/116575
            * tree-vect-slp.cc (vect_analyze_slp): Properly compute
            the mask argument for vect_load/store_lanes_supported.
            When the load is masked for now avoid rediscovery.
    
            * gcc.dg/vect/pr116575.c: New testcase.

Comment 6 Richard Biener 2024-09-03 10:04:34 UTC

I've morphed this into the appropriate missed-optimization bug.