Bug 109505 - (t | 15) & svcntb() causes an OOM/ICE
Summary: (t | 15) & svcntb() causes an OOM/ICE
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 12.2.0
: P3 normal
Target Milestone: ---
Assignee: Richard Biener
URL:
Keywords: compile-time-hog, memory-hog
: 109794 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-04-13 19:21 UTC by Jose Dapena Paz
Modified: 2023-12-27 08:09 UTC (History)
9 users (show)

See Also:
Host:
Target: aarch64
Build:
Known to work: 11.4.1
Known to fail: 11.1.0, 13.0
Last reconfirmed: 2023-04-14 00:00:00


Attachments
evaluate_prg_hwy.ii (compressed with gzip) (701.29 KB, application/gzip)
2023-04-14 08:23 UTC, Jose Dapena Paz
Details
gcc14-pr109505.patch (1.39 KB, patch)
2023-05-20 08:15 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jose Dapena Paz 2023-04-13 19:21:05 UTC
Steps to reproduce:
1. Install kas tool
2. Clone https://github.com/Igalia/meta-chromium
3. Kick checkout of repositories:
  kas checkout kas/chromium.yml:kas/commercial.yml
3. Kick build for raspberrypi4-64:

KAS_MACHINE=raspberrypi4-64 kas build kas/chromium.yml:kas/commercial.yml

Compilation will progress, but then fail on building Chromium:

FAILED: obj/third_party/distributed_point_functions/distributed_point_functions/evaluate_prg_hwy.o 
aarch64-poky-linux-g++  -mcpu=cortex-a72 -march=armv8-a+crc -fstack-protector-strong   -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/recipe-sysroot -MMD -MF obj/third_party/distributed_point_functions/distributed_point_functions/evaluate_prg_hwy.o.d -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_OZONE=1 -DOFFICIAL_BUILD -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DNO_UNWIND_TABLES -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -DGLIB_VERSION_MAX_ALLOWED=GLIB_VERSION_2_56 -DGLIB_VERSION_MIN_REQUIRED=GLIB_VERSION_2_56 -DBASE_USE_PERFETTO_CLIENT_LIBRARY=1 -DGOOGLE_PROTOBUF_NO_RTTI -DGOOGLE_PROTOBUF_NO_STATIC_INITIALIZER -DGOOGLE_PROTOBUF_INTERNAL_DONATE_STEAL_INLINE=0 -DHAVE_PTHREAD -I../chromium-114.0.5696.0 -Igen -I../chromium-114.0.5696.0/third_party/distributed_point_functions -I../chromium-114.0.5696.0/third_party/distributed_point_functions/code -Igen/third_party/distributed_point_functions -I../chromium-114.0.5696.0/third_party/perfetto/include -Igen/third_party/perfetto/build_config -Igen/third_party/perfetto -I../chromium-114.0.5696.0/third_party/protobuf/src -Igen/protoc_out -I../chromium-114.0.5696.0/third_party/abseil-cpp -I../chromium-114.0.5696.0/third_party/highway/src -I../chromium-114.0.5696.0/third_party/boringssl/src/include -fno-ident -fno-strict-aliasing --param=ssp-buffer-size=4 -fstack-protector -fno-unwind-tables -fno-asynchronous-unwind-tables -fPIC -pipe -pthread -mbranch-protection=standard -O2 -fdata-sections -ffunction-sections -fno-omit-frame-pointer -gdwarf-4 -g1 -fvisibility=hidden -Wno-unused-local-typedefs -Wno-maybe-uninitialized -Wno-deprecated-declarations -Wno-comments -Wno-packed-not-aligned -Wno-missing-field-initializers -Wno-unused-parameter -Wno-psabi -I/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/recipe-sysroot/usr/include/glib-2.0 -I/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/recipe-sysroot/usr/lib/glib-2.0/include -std=gnu++2a -fno-exceptions -fno-rtti -fvisibility-inlines-hidden -Wno-narrowing -Wno-class-memaccess   -feliminate-unused-debug-types -fmacro-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/chromium-114.0.5696.0=/usr/src/debug/chromium-dev/114.0.5696.0-r0  -fdebug-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/chromium-114.0.5696.0=/usr/src/debug/chromium-dev/114.0.5696.0-r0  -fmacro-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/build=/usr/src/debug/chromium-dev/114.0.5696.0-r0  -fdebug-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/build=/usr/src/debug/chromium-dev/114.0.5696.0-r0  -fdebug-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/recipe-sysroot=  -fmacro-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/recipe-sysroot=  -fdebug-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/recipe-sysroot-native=  -fvisibility-inlines-hidden -c ../chromium-114.0.5696.0/third_party/distributed_point_functions/code/dpf/internal/evaluate_prg_hwy.cc -o obj/third_party/distributed_point_functions/distributed_point_functions/evaluate_prg_hwy.o
{standard input}: Assembler messages:
{standard input}: Error: open CFI at the end of file; missing .cfi_endproc directive
aarch64-poky-linux-g++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
ninja: build stopped: subcommand failed.
WARNING: exit code 1 from a shell command.

This is after exhausting all the available memory in system. Attaching gdb to GCC I see it fails to finish running pass_forwprop::execute in this backtrace:

#0  (anonymous namespace)::pass_forwprop::execute (this=<optimized out>, fun=<optimized out>) at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/tree-ssa-forwprop.cc:3636
#1  0x0000000000d0b653 in execute_one_pass (pass=0x2f96120) at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/passes.cc:2638
#2  0x0000000000d0bea0 in execute_pass_list_1 (pass=0x2f96120) at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/passes.cc:2738
#3  0x0000000000d0beb2 in execute_pass_list_1 (pass=0x2f951b0) at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/passes.cc:2739
#4  0x0000000000d0bedd in execute_pass_list (fn=0x7fe3397a88a0, pass=<optimized out>) at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/passes.cc:2749
#5  0x00000000009b7e28 in cgraph_node::expand (this=0x7fe331da2990) at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/context.h:48
#6  cgraph_node::expand (this=0x7fe331da2990) at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/cgraphunit.cc:1788
#7  0x00000000009b9387 in expand_all_functions () at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/cgraphunit.cc:1999
#8  symbol_table::compile (this=0x7fe33ffa3000) at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/cgraphunit.cc:2349
#9  0x00000000009bb91c in symbol_table::compile (this=0x7fe33ffa3000) at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/cgraphunit.cc:2262
#10 symbol_table::finalize_compilation_unit (this=0x7fe33ffa3000) at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/cgraphunit.cc:2530
#11 0x0000000000ddcbca in compile_file () at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/toplev.cc:479
#12 0x00000000006c8a8e in do_compile (no_backend=false) at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/toplev.cc:2144
#13 toplev::main (this=this@entry=0x7fffd81b45d6, argc=<optimized out>, argc@entry=140, argv=<optimized out>, argv@entry=0x7fffd81b4708) at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/toplev.cc:2296
#14 0x00000000006ca1af in main (argc=140, argv=0x7fffd81b4708) at ../../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/gcc/main.cc:39

Apparently the loop will never finish as gsi gets members added and removed forever before OOM.
Comment 1 Andrew Pinski 2023-04-13 20:49:30 UTC
Can you provide the information as requested at https://gcc.gnu.org/bugs/ ?
Comment 2 Jose Dapena Paz 2023-04-14 08:21:06 UTC
Information collected:

### g++ -v

aarch64-poky-linux-g++ -v
Using built-in specs.
COLLECT_GCC=./home/dape/Development/yocto/meta-chromium/build/tmp/work/x86_64-linux/gcc-cross-aarch64/12.2.0-r0/recipe-sysroot-native/usr/bin/aarch64-poky-linux/aarch64-poky-linux-g++
COLLECT_LTO_WRAPPER=/home/dape/Development/yocto/meta-chromium/build/tmp/work/x86_64-linux/gcc-cross-aarch64/12.2.0-r0/image/home/dape/Development/yocto/meta-chromium/build/tmp/work/x86_64-linux/gcc-cross-aarch64/12.2.0-r0/recipe-sysroot-native/usr/bin/aarch64-poky-linux/../../libexec/aarch64-poky-linux/gcc/aarch64-poky-linux/12.2.0/lto-wrapper
Target: aarch64-poky-linux
Configured with: ../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/configure --build=x86_64-linux --host=x86_64-linux --target=aarch64-poky-linux --prefix=/host-native/usr --exec_prefix=/host-native/usr --bindir=/host-native/usr/bin/aarch64-poky-linux --sbindir=/host-native/usr/bin/aarch64-poky-linux --libexecdir=/host-native/usr/libexec/aarch64-poky-linux --datadir=/host-native/usr/share --sysconfdir=/host-native/etc --sharedstatedir=/host-native/com --localstatedir=/host-native/var --libdir=/host-native/usr/lib/aarch64-poky-linux --includedir=/host-native/usr/include --oldincludedir=/host-native/usr/include --infodir=/host-native/usr/share/info --mandir=/host-native/usr/share/man --disable-silent-rules --disable-dependency-tracking --with-libtool-sysroot=/host-native --enable-clocale=generic --with-gnu-ld --enable-shared --enable-languages=c,c++ --enable-threads=posix --disable-multilib --enable-default-pie --enable-c99 --enable-long-long --enable-symvers=gnu --enable-libstdcxx-pch --program-prefix=aarch64-poky-linux- --without-local-prefix --disable-install-libiberty --disable-libssp --enable-libitm --enable-lto --disable-bootstrap --with-system-zlib --with-linker-hash-style=sysv --enable-linker-build-id --with-ppl=no --with-cloog=no --enable-checking=release --enable-cheaders=c_global --without-isl --with-gxx-include-dir=/not/exist/usr/include/c++/12.2.0 --with-sysroot=/not/exist --with-build-sysroot=/host --enable-standard-branch-protection --enable-poison-system-directories=error --with-system-zlib --disable-static --disable-nls --with-glibc-version=2.28 --enable-initfini-array --enable-__cxa_atexit
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (GCC) 

### Extracted information

#### GCC version

12.2.0

#### System type

Toolchain built by Yocto Langdale with the mentioned steps, on Ubuntu 22.10

#### GCC configure options

Configured with: ../../../../../../work-shared/gcc-12.2.0-r0/gcc-12.2.0/configure --build=x86_64-linux --host=x86_64-linux --target=aarch64-poky-linux --prefix=/host-native/usr --exec_prefix=/host-native/usr --bindir=/host-native/usr/bin/aarch64-poky-linux --sbindir=/host-native/usr/bin/aarch64-poky-linux --libexecdir=/host-native/usr/libexec/aarch64-poky-linux --datadir=/host-native/usr/share --sysconfdir=/host-native/etc --sharedstatedir=/host-native/com --localstatedir=/host-native/var --libdir=/host-native/usr/lib/aarch64-poky-linux --includedir=/host-native/usr/include --oldincludedir=/host-native/usr/include --infodir=/host-native/usr/share/info --mandir=/host-native/usr/share/man --disable-silent-rules --disable-dependency-tracking --with-libtool-sysroot=/host-native --enable-clocale=generic --with-gnu-ld --enable-shared --enable-languages=c,c++ --enable-threads=posix --disable-multilib --enable-default-pie --enable-c99 --enable-long-long --enable-symvers=gnu --enable-libstdcxx-pch --program-prefix=aarch64-poky-linux- --without-local-prefix --disable-install-libiberty --disable-libssp --enable-libitm --enable-lto --disable-bootstrap --with-system-zlib --with-linker-hash-style=sysv --enable-linker-build-id --with-ppl=no --with-cloog=no --enable-checking=release --enable-cheaders=c_global --without-isl --with-gxx-include-dir=/not/exist/usr/include/c++/12.2.0 --with-sysroot=/not/exist --with-build-sysroot=/host --enable-standard-branch-protection --enable-poison-system-directories=error --with-system-zlib --disable-static --disable-nls --with-glibc-version=2.28 --enable-initfini-array --enable-__cxa_atexit

#### Complete command line that triggers the bug

aarch64-poky-linux-g++  -mcpu=cortex-a72 -march=armv8-a+crc -fstack-protector-strong   -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/recipe-sysroot -MMD -MF obj/third_party/distributed_point_functions/distributed_point_functions/evaluate_prg_hwy.o.d -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_OZONE=1 -DOFFICIAL_BUILD -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DNO_UNWIND_TABLES -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -DGLIB_VERSION_MAX_ALLOWED=GLIB_VERSION_2_56 -DGLIB_VERSION_MIN_REQUIRED=GLIB_VERSION_2_56 -DBASE_USE_PERFETTO_CLIENT_LIBRARY=1 -DGOOGLE_PROTOBUF_NO_RTTI -DGOOGLE_PROTOBUF_NO_STATIC_INITIALIZER -DGOOGLE_PROTOBUF_INTERNAL_DONATE_STEAL_INLINE=0 -DHAVE_PTHREAD -I../chromium-114.0.5696.0 -Igen -I../chromium-114.0.5696.0/third_party/distributed_point_functions -I../chromium-114.0.5696.0/third_party/distributed_point_functions/code -Igen/third_party/distributed_point_functions -I../chromium-114.0.5696.0/third_party/perfetto/include -Igen/third_party/perfetto/build_config -Igen/third_party/perfetto -I../chromium-114.0.5696.0/third_party/protobuf/src -Igen/protoc_out -I../chromium-114.0.5696.0/third_party/abseil-cpp -I../chromium-114.0.5696.0/third_party/highway/src -I../chromium-114.0.5696.0/third_party/boringssl/src/include -fno-ident -fno-strict-aliasing --param=ssp-buffer-size=4 -fstack-protector -fno-unwind-tables -fno-asynchronous-unwind-tables -fPIC -pipe -pthread -mbranch-protection=standard -O2 -fdata-sections -ffunction-sections -fno-omit-frame-pointer -gdwarf-4 -g1 -fvisibility=hidden -Wno-unused-local-typedefs -Wno-maybe-uninitialized -Wno-deprecated-declarations -Wno-comments -Wno-packed-not-aligned -Wno-missing-field-initializers -Wno-unused-parameter -Wno-psabi -I/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/recipe-sysroot/usr/include/glib-2.0 -I/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/recipe-sysroot/usr/lib/glib-2.0/include -std=gnu++2a -fno-exceptions -fno-rtti -fvisibility-inlines-hidden -Wno-narrowing -Wno-class-memaccess   -feliminate-unused-debug-types -fmacro-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/chromium-114.0.5696.0=/usr/src/debug/chromium-dev/114.0.5696.0-r0  -fdebug-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/chromium-114.0.5696.0=/usr/src/debug/chromium-dev/114.0.5696.0-r0  -fmacro-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/build=/usr/src/debug/chromium-dev/114.0.5696.0-r0  -fdebug-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/build=/usr/src/debug/chromium-dev/114.0.5696.0-r0  -fdebug-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/recipe-sysroot=  -fmacro-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/recipe-sysroot=  -fdebug-prefix-map=/home/dape/Development/yocto/meta-chromium/build/tmp/work/cortexa72-poky-linux/chromium-dev/114.0.5696.0-r0/recipe-sysroot-native=  -fvisibility-inlines-hidden -c ../chromium-114.0.5696.0/third_party/distributed_point_functions/code/dpf/internal/evaluate_prg_hwy.cc -o obj/third_party/distributed_point_functions/distributed_point_functions/evaluate_prg_hwy.o

#### Compiler output

{standard input}: Assembler messages:
{standard input}: Error: open CFI at the end of file; missing .cfi_endproc directive
aarch64-poky-linux-g++: fatal error: Killed signal terminated program cc1plus
compilation terminated.

#### Preprocessed files

evaluate_prg_hwy.ii (attached)
Comment 3 Jose Dapena Paz 2023-04-14 08:23:06 UTC
Created attachment 54858 [details]
evaluate_prg_hwy.ii (compressed with gzip)
Comment 4 Andrew Pinski 2023-04-14 21:53:24 UTC
I am getting the feeling there is an infinite loop between some two different folding.
Maybe even caused by my r12-5430-g74faa9834a9ad2 .

#5  0x0000000000d7e764 in gimple_build_with_ops_stat (num_ops=3, subcode=100, code=GIMPLE_ASSIGN) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/gcc/gimple.cc:469
#6  gimple_build_assign_1 (op3=0x0, op2=0xffffef193810, op1=0xffffdaf27860, subcode=BIT_AND_EXPR, lhs=0xfff8ee62ed18) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/gcc/gimple.cc:469
#7  gimple_build_assign (lhs=lhs@entry=0xfff8ee62ed18, subcode=subcode@entry=BIT_AND_EXPR, op1=0xffffdaf27860, op2=0xffffef193810, op3=op3@entry=0x0) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/gcc/gimple.cc:496
#8  0x00000000015a2b40 in maybe_push_res_to_seq (res_op=0xffffffffe670, seq=0xfffffffff160, res=0xfff8ee62ed18) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/gcc/gimple-match.h:303
#9  0x0000000001717e10 in gimple_simplify_BIT_AND_EXPR (res_op=res_op@entry=0xffffffffe700, seq=0xfffffffff160, valueize=0x12366c0 <fwprop_ssa_val(tree)>, type=0xfffff59f07e0, _p0=0xfff8ee62ecd0, _p1=0xffffef193810, code=...) at gimple-match.cc:194111
#10 0x0000000001601ab4 in gimple_simplify (res_op=res_op@entry=0xffffffffe700, seq=seq@entry=0xfffffffff160, valueize=valueize@entry=0x12366c0 <fwprop_ssa_val(tree)>, code=..., type=<optimized out>, _p0=<optimized out>, _p1=<optimized out>) at gimple-match.cc:211334
#11 0x00000000016038d0 in gimple_resimplify2 (seq=0xfffffffff160, res_op=0xffffffffe980, valueize=0x12366c0 <fwprop_ssa_val(tree)>) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/gcc/gimple-match-head.cc:323
#12 0x0000000001678b4c in gimple_simplify_222 (res_op=res_op@entry=0xffffffffe980, seq=seq@entry=0xfffffffff160, valueize=valueize@entry=0x12366c0 <fwprop_ssa_val(tree)>, type=type@entry=0xfffff59f07e0, captures=captures@entry=0xffffffffe8f0, op=op@entry=BIT_IOR_EXPR, rop=rop@entry=BIT_AND_EXPR)
    at gimple-match.cc:54338
#13 0x00000000017af1a8 in gimple_simplify_BIT_IOR_EXPR (res_op=res_op@entry=0xffffffffe980, seq=0xfffffffff160, valueize=0x12366c0 <fwprop_ssa_val(tree)>, type=0xfffff59f07e0, _p0=0xfff8ee62ec40, _p1=0xfff8ee62ec88, code=...) at gimple-match.cc:115608
#14 0x0000000001601a6c in gimple_simplify (res_op=res_op@entry=0xffffffffe980, seq=seq@entry=0xfffffffff160, valueize=valueize@entry=0x12366c0 <fwprop_ssa_val(tree)>, code=..., type=<optimized out>, _p0=<optimized out>, _p1=<optimized out>) at gimple-match.cc:211248
#15 0x00000000016038d0 in gimple_resimplify2 (seq=0xfffffffff160, res_op=0xffffffffeb60, valueize=0x12366c0 <fwprop_ssa_val(tree)>) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/gcc/gimple-match-head.cc:323
#16 0x0000000001717e98 in gimple_simplify_BIT_AND_EXPR (res_op=res_op@entry=0xffffffffeb60, seq=0xfffffffff160, valueize=0x12366c0 <fwprop_ssa_val(tree)>, type=0xfffff59f07e0, _p0=0xfff8ee62ebf8, _p1=0xffffef193810, code=...) at gimple-match.cc:194125
#17 0x0000000001601ab4 in gimple_simplify (res_op=res_op@entry=0xffffffffeb60, seq=seq@entry=0xfffffffff160, valueize=valueize@entry=0x12366c0 <fwprop_ssa_val(tree)>, code=..., type=<optimized out>, _p0=<optimized out>, _p1=<optimized out>) at gimple-match.cc:211334
#18 0x00000000016038d0 in gimple_resimplify2 (seq=0xfffffffff160, res_op=0xffffffffede0, valueize=0x12366c0 <fwprop_ssa_val(tree)>) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/gcc/gimple-match-head.cc:323
#19 0x0000000001678b4c in gimple_simplify_222 (res_op=res_op@entry=0xffffffffede0, seq=seq@entry=0xfffffffff160, valueize=valueize@entry=0x12366c0 <fwprop_ssa_val(tree)>, type=type@entry=0xfffff59f07e0, captures=captures@entry=0xffffffffed50, op=op@entry=BIT_IOR_EXPR, rop=rop@entry=BIT_AND_EXPR)
    at gimple-match.cc:54338
#20 0x00000000017af1a8 in gimple_simplify_BIT_IOR_EXPR (res_op=res_op@entry=0xffffffffede0, seq=0xfffffffff160, valueize=0x12366c0 <fwprop_ssa_val(tree)>, type=0xfffff59f07e0, _p0=0xfff8ee62eb68, _p1=0xfff8ee62ebb0, code=...) at gimple-match.cc:115608
#21 0x0000000001601a6c in gimple_simplify (res_op=res_op@entry=0xffffffffede0, seq=seq@entry=0xfffffffff160, valueize=valueize@entry=0x12366c0 <fwprop_ssa_val(tree)>, code=..., type=<optimized out>, _p0=<optimized out>, _p1=<optimized out>) at gimple-match.cc:211248
#22 0x00000000016038d0 in gimple_resimplify2 (seq=0xfffffffff160, res_op=0xffffffffefc0, valueize=0x12366c0 <fwprop_ssa_val(tree)>) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/gcc/gimple-match-head.cc:323
#23 0x0000000001717e98 in gimple_simplify_BIT_AND_EXPR (res_op=res_op@entry=0xffffffffefc0, seq=0xfffffffff160, valueize=0x12366c0 <fwprop_ssa_val(tree)>, type=0xfffff59f07e0, _p0=0xfff8ee62eb20, _p1=0xffffef193810, code=...) at gimple-match.cc:194125
#24 0x0000000001601ab4 in gimple_simplify (res_op=res_op@entry=0xffffffffefc0, seq=seq@entry=0xfffffffff160, valueize=valueize@entry=0x12366c0 <fwprop_ssa_val(tree)>, code=..., type=<optimized out>, _p0=<optimized out>, _p1=<optimized out>) at gimple-match.cc:211334
#25 0x00000000016038d0 in gimple_resimplify2 (seq=0xfffffffff160, res_op=0xfffffffff170, valueize=0x12366c0 <fwprop_ssa_val(tree)>) at /home/ubuntu/src/upstream-gcc-aarch64/gcc/gcc/gimple-match-head.cc:323
Comment 5 Andrew Pinski 2023-04-14 22:33:47 UTC
/* (X & Y) & Y -> X & Y
   (X | Y) | Y -> X | Y  */
(for op (bit_and bit_ior)
 (simplify
  (op:c (convert1?@2 (op:c @0 @@1)) (convert2? @1))
  @2))
...
/* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both
   operands are another bit-wise operation with a common input.  If so,
   distribute the bit operations to save an operation and possibly two if
   constants are involved.  For example, convert
     (A | B) & (A | C) into A | (B & C)
   Further simplification will occur if B and C are constants.  */
(for op (bit_and bit_ior bit_xor)
     rop (bit_ior bit_and bit_and)
 (simplify
  (op (convert? (rop:c @@0 @1)) (convert? (rop:c @0 @2)))
  (if (tree_nop_conversion_p (type, TREE_TYPE (@1))
       && tree_nop_conversion_p (type, TREE_TYPE (@2)))
   (rop (convert @0) (op (convert @1) (convert @2))))))
...
/* (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2) */
(simplify
  (bit_and (bit_ior @0 CONSTANT_CLASS_P@1) CONSTANT_CLASS_P@2)
  (bit_ior (bit_and @0 @2) (bit_and @1 @2)))
Comment 6 Andrew Pinski 2023-04-14 22:40:29 UTC
_4347 = POLY_INT_CST [16, 16] & 15;
_4348 = _3441 | POLY_INT_CST [16, 16];

So this is with SVE.
Comment 7 Andrew Pinski 2023-04-14 22:52:25 UTC
Simple reduced testcase:
```
#include <arm_sve.h>

unsigned long f(unsigned long tt)
{
        unsigned long t=  svcntb();
        return (tt | 15) & t;
}
```

Confirmed.

Fails even in GCC 11.1.0.
It is due to match.pd's

/* (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2) */
(simplify
  (bit_and (bit_ior @0 CONSTANT_CLASS_P@1) CONSTANT_CLASS_P@2)
  (bit_ior (bit_and @0 @2) (bit_and @1 @2)))

POLY_INT_CST is a CONSTANT_CLASS but does not simplify on the bit_and.
So maybe it should include a ! on the (bit_and @1 @2) .
Comment 8 Andrew Pinski 2023-04-16 00:38:42 UTC
(In reply to Andrew Pinski from comment #7)
> Fails even in GCC 11.1.0.
> It is due to match.pd's
> 
> /* (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2) */
> (simplify
>   (bit_and (bit_ior @0 CONSTANT_CLASS_P@1) CONSTANT_CLASS_P@2)
>   (bit_ior (bit_and @0 @2) (bit_and @1 @2)))
> 
> POLY_INT_CST is a CONSTANT_CLASS but does not simplify on the bit_and.
> So maybe it should include a ! on the (bit_and @1 @2) .

Which then will go into a loop with:
(A | B) & (A | C) into A | (B & C)
Comment 9 Richard Biener 2023-04-17 07:03:51 UTC
(In reply to Andrew Pinski from comment #8)
> (In reply to Andrew Pinski from comment #7)
> > Fails even in GCC 11.1.0.
> > It is due to match.pd's
> > 
> > /* (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2) */
> > (simplify
> >   (bit_and (bit_ior @0 CONSTANT_CLASS_P@1) CONSTANT_CLASS_P@2)
> >   (bit_ior (bit_and @0 @2) (bit_and @1 @2)))
> > 
> > POLY_INT_CST is a CONSTANT_CLASS but does not simplify on the bit_and.
> > So maybe it should include a ! on the (bit_and @1 @2) .
> 
> Which then will go into a loop with:
> (A | B) & (A | C) into A | (B & C)

We expect these to always fold :/  If they don't for POLY_INTs then maybe
add a (if (!POLY_INT_CST_P (...) guard.  There are many patterns that
are affected by this.
Comment 10 Richard Sandiford 2023-04-17 07:16:04 UTC
Might be a daft question, but which cases besides
INTEGER_CST are supposed to be captured by the CONSTANT_CLASS_P?
Comment 11 Andrew Pinski 2023-04-17 07:25:05 UTC
(In reply to rsandifo@gcc.gnu.org from comment #10)
> Might be a daft question, but which cases besides
> INTEGER_CST are supposed to be captured by the CONSTANT_CLASS_P?

For bit_and/bit_ior, VECTOR_CST (I would assume).
Comment 12 Richard Sandiford 2023-04-17 07:39:00 UTC
(In reply to Andrew Pinski from comment #11)
> For bit_and/bit_ior, VECTOR_CST (I would assume).
Ah, yeah.  But then I don't think a top-level POLY_INT_CST_P
cuts it.  We'd have the same problem with VECTOR_CSTs containing
POLY_INT_CSTs.
Comment 13 Jakub Jelinek 2023-04-17 08:01:20 UTC
(In reply to rsandifo@gcc.gnu.org from comment #12)
> (In reply to Andrew Pinski from comment #11)
> > For bit_and/bit_ior, VECTOR_CST (I would assume).
> Ah, yeah.  But then I don't think a top-level POLY_INT_CST_P
> cuts it.  We'd have the same problem with VECTOR_CSTs containing
> POLY_INT_CSTs.

Do we really support that?
Comment 14 rguenther@suse.de 2023-04-17 08:05:07 UTC
On Mon, 17 Apr 2023, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109505
> 
> --- Comment #10 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
> Might be a daft question, but which cases besides
> INTEGER_CST are supposed to be captured by the CONSTANT_CLASS_P?

For the bitops?  I suppose FIXED_CST, VECTOR_CST, COMPLEX_CST (for
_Complex int), basically all constants for which bitops are valid.
Comment 15 Richard Sandiford 2023-04-17 08:07:16 UTC
(In reply to Jakub Jelinek from comment #13)
> (In reply to rsandifo@gcc.gnu.org from comment #12)
> > (In reply to Andrew Pinski from comment #11)
> > > For bit_and/bit_ior, VECTOR_CST (I would assume).
> > Ah, yeah.  But then I don't think a top-level POLY_INT_CST_P
> > cuts it.  We'd have the same problem with VECTOR_CSTs containing
> > POLY_INT_CSTs.
> Do we really support that?
Sure.  At least AIUI, VECTOR_CST can contain whatever constants the
associated scalar supports.

A specific example is:

#include <arm_sve.h>
svint32_t f() { return svdup_s32(svcntw()); }

which gives:

  return { POLY_INT_CST [4, 4], ... };

https://godbolt.org/z/T1s3n5Pfx
Comment 16 Andrew Pinski 2023-05-09 23:47:04 UTC
*** Bug 109794 has been marked as a duplicate of this bug. ***
Comment 17 Sam James 2023-05-20 07:27:49 UTC
Is there by chance a workaround we can apply for this downstream (some flag)? It prevents building Chromium on arm64 for us w/ gcc unfortunately.
Comment 18 Jakub Jelinek 2023-05-20 08:15:51 UTC
Created attachment 55124 [details]
gcc14-pr109505.patch

I actually think it isn't that bad, we don't have that many, I've looked at
match.pd patterns which check for 2 CONSTANT_CLASS_P operands and then try
to combine them using some operation and counted just 10 spots which I think
need ! added.
Comment 19 GCC Commits 2023-05-21 11:37:47 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:f211757f6fa9515e3fd1a4f66f1a8b48e500c9de

commit r14-1023-gf211757f6fa9515e3fd1a4f66f1a8b48e500c9de
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sun May 21 13:36:56 2023 +0200

    atch.pd: Ensure (op CONSTANT_CLASS_P CONSTANT_CLASS_P) is simplified [PR109505]
    
    On the following testcase we hang, because POLY_INT_CST is CONSTANT_CLASS_P,
    but BIT_AND_EXPR with it and INTEGER_CST doesn't simplify and the
    (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2)
    simplification actually relies on the (CST1 & CST2) simplification,
    otherwise it is a deoptimization, trading 2 ops for 3 and furthermore
    running into
    /* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both
       operands are another bit-wise operation with a common input.  If so,
       distribute the bit operations to save an operation and possibly two if
       constants are involved.  For example, convert
         (A | B) & (A | C) into A | (B & C)
       Further simplification will occur if B and C are constants.  */
    simplification which simplifies that
    (x & CST2) | (CST1 & CST2) back to
    CST2 & (x | CST1).
    I went through all other places I could find where we have a simplification
    with 2 CONSTANT_CLASS_P operands and perform some operation on those two,
    while the other spots aren't that severe (just trade 2 operations for
    another 2 if the two constants don't simplify, rather than as in the above
    case trading 2 ops for 3), I still think all those spots really intend
    to optimize only if the 2 constants simplify.
    
    So, the following patch adds to those a ! modifier to ensure that,
    even at GENERIC that modifier means !EXPR_P which is exactly what we want
    IMHO.
    
    2023-05-21  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/109505
            * match.pd ((x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2),
            Combine successive equal operations with constants,
            (A +- CST1) +- CST2 -> A + CST3, (CST1 - A) +- CST2 -> CST3 - A,
            CST1 - (CST2 - A) -> CST3 + A): Use ! on ops with 2 CONSTANT_CLASS_P
            operands.
    
            * gcc.target/aarch64/sve/pr109505.c: New test.
Comment 20 GCC Commits 2023-05-22 13:52:27 UTC
The releases/gcc-13 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:0feece18e6993d02f24a9381ddb5420bb4509554

commit r13-7365-g0feece18e6993d02f24a9381ddb5420bb4509554
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sun May 21 13:36:56 2023 +0200

    atch.pd: Ensure (op CONSTANT_CLASS_P CONSTANT_CLASS_P) is simplified [PR109505]
    
    On the following testcase we hang, because POLY_INT_CST is CONSTANT_CLASS_P,
    but BIT_AND_EXPR with it and INTEGER_CST doesn't simplify and the
    (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2)
    simplification actually relies on the (CST1 & CST2) simplification,
    otherwise it is a deoptimization, trading 2 ops for 3 and furthermore
    running into
    /* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both
       operands are another bit-wise operation with a common input.  If so,
       distribute the bit operations to save an operation and possibly two if
       constants are involved.  For example, convert
         (A | B) & (A | C) into A | (B & C)
       Further simplification will occur if B and C are constants.  */
    simplification which simplifies that
    (x & CST2) | (CST1 & CST2) back to
    CST2 & (x | CST1).
    I went through all other places I could find where we have a simplification
    with 2 CONSTANT_CLASS_P operands and perform some operation on those two,
    while the other spots aren't that severe (just trade 2 operations for
    another 2 if the two constants don't simplify, rather than as in the above
    case trading 2 ops for 3), I still think all those spots really intend
    to optimize only if the 2 constants simplify.
    
    So, the following patch adds to those a ! modifier to ensure that,
    even at GENERIC that modifier means !EXPR_P which is exactly what we want
    IMHO.
    
    2023-05-21  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/109505
            * match.pd ((x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2),
            Combine successive equal operations with constants,
            (A +- CST1) +- CST2 -> A + CST3, (CST1 - A) +- CST2 -> CST3 - A,
            CST1 - (CST2 - A) -> CST3 + A): Use ! on ops with 2 CONSTANT_CLASS_P
            operands.
    
            * gcc.target/aarch64/sve/pr109505.c: New test.
    
    (cherry picked from commit f211757f6fa9515e3fd1a4f66f1a8b48e500c9de)
Comment 21 GCC Commits 2023-05-22 14:09:55 UTC
The releases/gcc-12 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:6ef4e2e11c653f1d51f9a304a8d1cf44a53b4ad7

commit r12-9634-g6ef4e2e11c653f1d51f9a304a8d1cf44a53b4ad7
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sun May 21 13:36:56 2023 +0200

    atch.pd: Ensure (op CONSTANT_CLASS_P CONSTANT_CLASS_P) is simplified [PR109505]
    
    On the following testcase we hang, because POLY_INT_CST is CONSTANT_CLASS_P,
    but BIT_AND_EXPR with it and INTEGER_CST doesn't simplify and the
    (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2)
    simplification actually relies on the (CST1 & CST2) simplification,
    otherwise it is a deoptimization, trading 2 ops for 3 and furthermore
    running into
    /* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both
       operands are another bit-wise operation with a common input.  If so,
       distribute the bit operations to save an operation and possibly two if
       constants are involved.  For example, convert
         (A | B) & (A | C) into A | (B & C)
       Further simplification will occur if B and C are constants.  */
    simplification which simplifies that
    (x & CST2) | (CST1 & CST2) back to
    CST2 & (x | CST1).
    I went through all other places I could find where we have a simplification
    with 2 CONSTANT_CLASS_P operands and perform some operation on those two,
    while the other spots aren't that severe (just trade 2 operations for
    another 2 if the two constants don't simplify, rather than as in the above
    case trading 2 ops for 3), I still think all those spots really intend
    to optimize only if the 2 constants simplify.
    
    So, the following patch adds to those a ! modifier to ensure that,
    even at GENERIC that modifier means !EXPR_P which is exactly what we want
    IMHO.
    
    2023-05-21  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/109505
            * match.pd ((x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2),
            Combine successive equal operations with constants,
            (A +- CST1) +- CST2 -> A + CST3, (CST1 - A) +- CST2 -> CST3 - A,
            CST1 - (CST2 - A) -> CST3 + A): Use ! on ops with 2 CONSTANT_CLASS_P
            operands.
    
            * gcc.target/aarch64/sve/pr109505.c: New test.
    
    (cherry picked from commit f211757f6fa9515e3fd1a4f66f1a8b48e500c9de)
Comment 22 Jakub Jelinek 2023-05-22 14:11:09 UTC
Fixed for 12.4, 13.2 and 14.1.
Comment 23 tt_1 2023-06-01 08:57:49 UTC
Are there any plans to backport this fix to the gcc-11 branch as well? Seems it is affected, if you go by the known to fail list.
Comment 24 Jakub Jelinek 2023-06-01 10:13:00 UTC
The fix doesn't really work for 11/10:
../../gcc/match.pd:1546:36 error: forcing simplification to a leaf is not supported for GENERIC
  (bit_ior (bit_and @0 @2) (bit_and! @1 @2)))
                                   ^
So, we'd either have to guard those patterns with #ifndef GENERIC, but that would be quite risky change this late on those branches, or perhaps could do something like
#ifdef GENERIC
  (with { tree a = fold_binary (BIT_AND_EXPR, type, @1, @2); }
   (if (a && CONSTANT_CLASS_P (a))
    (bit_ior (bit_and @0 @2) { a; })))
#else
  (bit_ior (bit_and @0 @2) (bit_and! @1 @2)))
#endif
But for all those spots the patch changed.  I'm afraid I really don't have time to do that.
Comment 25 Richard Biener 2023-06-02 06:56:41 UTC
I'm testing a backport.
Comment 26 GCC Commits 2023-06-02 07:46:08 UTC
The releases/gcc-11 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:bfa476528ceeac96865a48c49f3f1a15d566d209

commit r11-10840-gbfa476528ceeac96865a48c49f3f1a15d566d209
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Feb 23 13:47:01 2022 +0100

    middle-end/109505 - backport match.pd ! support for GENERIC
    
    The patch adds support for the ! modifier to GENERIC, backported
    from r12-7361-gfdc46830f1b793.
    
    2023-06-02  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/109505
            * doc/match-and-simplify.texi: Amend ! documentation.
            * genmatch.c (expr::gen_transform): Code-generate ! support
            for GENERIC.
            (parser::parse_expr): Allow ! for GENERIC.
Comment 27 GCC Commits 2023-06-02 07:46:13 UTC
The releases/gcc-11 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:ca4a4cc0060cb8ae1a326d6dbfcd9459452e1574

commit r11-10841-gca4a4cc0060cb8ae1a326d6dbfcd9459452e1574
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Sun May 21 13:36:56 2023 +0200

    match.pd: Ensure (op CONSTANT_CLASS_P CONSTANT_CLASS_P) is simplified [PR109505]
    
    On the following testcase we hang, because POLY_INT_CST is CONSTANT_CLASS_P,
    but BIT_AND_EXPR with it and INTEGER_CST doesn't simplify and the
    (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2)
    simplification actually relies on the (CST1 & CST2) simplification,
    otherwise it is a deoptimization, trading 2 ops for 3 and furthermore
    running into
    /* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both
       operands are another bit-wise operation with a common input.  If so,
       distribute the bit operations to save an operation and possibly two if
       constants are involved.  For example, convert
         (A | B) & (A | C) into A | (B & C)
       Further simplification will occur if B and C are constants.  */
    simplification which simplifies that
    (x & CST2) | (CST1 & CST2) back to
    CST2 & (x | CST1).
    I went through all other places I could find where we have a simplification
    with 2 CONSTANT_CLASS_P operands and perform some operation on those two,
    while the other spots aren't that severe (just trade 2 operations for
    another 2 if the two constants don't simplify, rather than as in the above
    case trading 2 ops for 3), I still think all those spots really intend
    to optimize only if the 2 constants simplify.
    
    So, the following patch adds to those a ! modifier to ensure that,
    even at GENERIC that modifier means !EXPR_P which is exactly what we want
    IMHO.
    
    2023-05-21  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/109505
            * match.pd ((x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2),
            Combine successive equal operations with constants,
            (A +- CST1) +- CST2 -> A + CST3, (CST1 - A) +- CST2 -> CST3 - A,
            CST1 - (CST2 - A) -> CST3 + A): Use ! on ops with 2 CONSTANT_CLASS_P
            operands.
    
            * gcc.target/aarch64/sve/pr109505.c: New test.
    
    (cherry picked from commit f211757f6fa9515e3fd1a4f66f1a8b48e500c9de)
Comment 28 Richard Biener 2023-06-02 07:47:19 UTC
Fixed.