Commit r14-1493 caused -4% degradation on SPEC2017 fp bmk 503.bwaves_r at option -Ofast --param=vect-partial-vector-usage=2 on Power10, as a follow up of [1], I looked into it and confirmed it had nothing to do with existing load density heuristics. The gap is from the hotspot mat_times_vec_, perf showed that the different iv choices leading more latency for the length and further uses. By further checking, I think it exposed one issue that currently we only checks the addressing mode supported or not against the mode, it's without any other information like gimple statement. Unfortunately for IFNs len_{load,store} which are generated with lxvl/stxvl only supporting the addressing mode: base register (+ length register, which isn't even index register), but when determining the group cost with cand (determine_group_iv_cost_address), it's unable to consider this characteristic, as the current valid_mem_ref_p (valid_mem_ref_p-> memory_address_addr_space_p ->legitimate_address_p) only checking mode, address_space, constructed rtx. For V16QImode, the normal vector load/store do support addressing modes "base + offset (DQ-form)", "base + index" since power9, so ivopts would consider it's fine to use base + index addressing mode for LEN_{load,store} uses and the related cost of adopting the scalar (no address object based) candidate with step 16 for those LEN_{load,store} uses is zero. For example: | Group 1: | Type: POINTER ARGUMENT ADDRESS | Use 1.0: | At stmt: vect_434 = .LEN_LOAD (vectp_y.124_438, 64B, loop_len_436, 0); | At pos: vectp_y.124_438 | IV struct: | Type: vector(2) real(kind=8) * | Base: (vector(2) real(kind=8) *) vectp_y.125_195 | Step: 32 | Object: (void *) vectp_y.125_195 | Biv: N | Overflowness wrto loop niter: Overflow | Use 1.1: | At stmt: .LEN_STORE (vectp_y.173_213, 64B, loop_len_436, vect_211, 0); | At pos: vectp_y.173_213 | IV struct: | Type: vector(2) real(kind=8) * | Base: (vector(2) real(kind=8) *) vectp_y.125_195 | Step: 32 | Object: (void *) vectp_y.125_195 | Biv: N | Overflowness wrto loop niter: Overflow | Candidate 7: | Var befor: ivtmp.182 | Var after: ivtmp.182 | Incr POS: before exit test | IV struct: | Type: sizetype | Base: 0 | Step: 32 | Biv: N | Overflowness wrto loop niter: No-overflow Group 1: cand cost compl. inv.expr. inv.vars 1 8 2 NIL; 1 2 12 2 18; NIL; 3 8 2 NIL; 1 4 12 2 19; NIL; 5 12 2 19; NIL; 6 0 2 NIL; NIL; 7 0 2 NIL; 1 ==> zero cost 8 0 0 NIL; NIL; 9 0 0 NIL; NIL; 31 0 0 NIL; NIL; [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620305.html
Can you extend the current hook legitimate_address_p with one default value nullptr gimple* argument? When middle-end passes like ivopts want to query with the constructed address reference, it can pass the gimple statement as the context?
You mean add a default argument = nullptr to targethook legitimate_address_p to let powerPC do more middle-end address ivopts analysis? If yes, I am ok with that. Let's see Richard and Richi more comments. Thanks.
I think the main issue is that IVOPTs treats all of the memory access internal functions as calls and thus the pointer "argument" as address calculation only and thus costs it as such. To fix this IVOPTs needs to treat those internal functions as memory accesses and accordingly represent this to the target when building the fake RTL to cost (and there neither masks nor lens are considered). The GIMPLE in the end will have the address calculation split out (as if it could be CSEd between different references) to sth like _1 = &TARGET_MEM_REF <....>; .LEN_STORE (_1, ...); where &TARGET_MEM_REF <....> is then a LEA and the .LEN_STORE the simplest register indirect reference. Currently IVOPTs should get the LEA part right but it doesn't cost the simple register indirect reference. Of course in the end we'd like to anticipate merging the LEA with the .LEN_STORE (for the cases valid). Do I understand the issue is that only the last bit doesn't work but costing the LEA does?
ivopts does have code to treat ifn pointer arguments specially, see get_mem_type_for_internal_fn &co. But like Kewen says, it's still only based on the mode. Personally I'd prefer an internal_fn rather than a gimple* though. It can be useful to test these things on a possible future statement, not just on one that already exists.
(In reply to rsandifo@gcc.gnu.org from comment #5) > ivopts does have code to treat ifn pointer arguments specially, > see get_mem_type_for_internal_fn &co. But like Kewen says, > it's still only based on the mode. But the dump quoted says 'Type: POINTER ARGUMENT ADDRESS' which means that doesn't work as desired. That is, the use is classified wrongly and the above function only seems to be useful to get the mode of the memory access done by the internal function? USE_PTR_ADDRESS doesn't seem to be a very good match to capture constraints for a memory dereference? > Personally I'd prefer an internal_fn rather than a gimple* though. > It can be useful to test these things on a possible future statement, > not just on one that already exists. Well yes, passing down a gimple * looks like too generic here. Maybe use a code_helper so we can pass down MEM_REF for dereference context and nothing (ERROR_MARK_NODE?) for plain address use?
(In reply to Richard Biener from comment #6) > (In reply to rsandifo@gcc.gnu.org from comment #5) > > ivopts does have code to treat ifn pointer arguments specially, > > see get_mem_type_for_internal_fn &co. But like Kewen says, > > it's still only based on the mode. > > But the dump quoted says 'Type: POINTER ARGUMENT ADDRESS' which means > that doesn't work as desired. That is, the use is classified wrongly > and the above function only seems to be useful to get the mode of > the memory access done by the internal function? As Richard mentioned above, currently ivopts special-cases LEN and MASK load/store IFNs, USE_PTR_ADDRESS is the special use type for address-like use, currently it only handles the point arguments of these LEN and MASK load/store IFNs which will become an address when expanding the IFN statement. function find_address_like_use detects this kind of argument, determine_group_iv_cost and the others will treat USE_PTR_ADDRESS as the same as USE_REF_ADDRESS, excepting when rewriting it we will generate it with ADDR_EXPR (LEA). So costing doesn't distinguish USE_PTR_ADDRESS and USE_REF_ADDRESS. One thing seems to be improved is that unlike USE_REF_ADDRESS USE_PTR_ADDRESS uses will need ADDR_EXPR, it can generate something CSE'd, we can consider this during costing. > > USE_PTR_ADDRESS doesn't seem to be a very good match to capture > constraints for a memory dereference? > > > Personally I'd prefer an internal_fn rather than a gimple* though. > > It can be useful to test these things on a possible future statement, > > not just on one that already exists. > > Well yes, passing down a gimple * looks like too generic here. Maybe > use a code_helper so we can pass down MEM_REF for dereference context > and nothing (ERROR_MARK_NODE?) for plain address use? Thanks for the suggestions with internal_fn and code_helper! I thought gimple* can be generic but as the above comments I agree code_helper is better.
The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>: https://gcc.gnu.org/g:165b1f6ad1d3969e2c23417797362d0528e65c79 commit r14-3092-g165b1f6ad1d3969e2c23417797362d0528e65c79 Author: Kewen Lin <linkw@linux.ibm.com> Date: Wed Aug 9 00:02:26 2023 -0500 targhooks: Extend legitimate_address_p with code_helper [PR110248] As PR110248 shows, some middle-end passes like IVOPTs can query the target hook legitimate_address_p with some artificially constructed rtx to determine whether some addressing modes are supported by target for some gimple statement. But for now the existing legitimate_address_p only checks the given mode, it's unable to distinguish some special cases unfortunately, for example, for LEN_LOAD ifn on Power port, we would expand it with lxvl hardware insn, which only supports one register to hold the address (the other register is holding the length), that is we don't support base (reg) + index (reg) addressing mode for sure. But hook legitimate_address_p only considers the given mode which would be some vector mode for LEN_LOAD ifn, and we do support base + index addressing mode for normal vector load and store insns, so the hook will return true for the query unexpectedly. This patch is to introduce one extra argument of type code_helper for hook legitimate_address_p, it makes targets able to handle some special case like what's described above. PR tree-optimization/110248 gcc/ChangeLog: * coretypes.h (class code_helper): Add forward declaration. * doc/tm.texi: Regenerate. * lra-constraints.cc (valid_address_p): Call target hook targetm.addr_space.legitimate_address_p with an extra parameter ERROR_MARK as its prototype changes. * recog.cc (memory_address_addr_space_p): Likewise. * reload.cc (strict_memory_address_addr_space_p): Likewise. * target.def (legitimate_address_p, addr_space.legitimate_address_p): Extend with one more argument of type code_helper, update the documentation accordingly. * targhooks.cc (default_legitimate_address_p): Adjust for the new code_helper argument. (default_addr_space_legitimate_address_p): Likewise. * targhooks.h (default_legitimate_address_p): Likewise. (default_addr_space_legitimate_address_p): Likewise. * config/aarch64/aarch64.cc (aarch64_legitimate_address_hook_p): Adjust with extra unnamed code_helper argument with default ERROR_MARK. * config/alpha/alpha.cc (alpha_legitimate_address_p): Likewise. * config/arc/arc.cc (arc_legitimate_address_p): Likewise. * config/arm/arm-protos.h (arm_legitimate_address_p): Likewise. (tree.h): New include for tree_code ERROR_MARK. * config/arm/arm.cc (arm_legitimate_address_p): Adjust with extra unnamed code_helper argument with default ERROR_MARK. * config/avr/avr.cc (avr_addr_space_legitimate_address_p): Likewise. * config/bfin/bfin.cc (bfin_legitimate_address_p): Likewise. * config/bpf/bpf.cc (bpf_legitimate_address_p): Likewise. * config/c6x/c6x.cc (c6x_legitimate_address_p): Likewise. * config/cris/cris-protos.h (cris_legitimate_address_p): Likewise. (tree.h): New include for tree_code ERROR_MARK. * config/cris/cris.cc (cris_legitimate_address_p): Adjust with extra unnamed code_helper argument with default ERROR_MARK. * config/csky/csky.cc (csky_legitimate_address_p): Likewise. * config/epiphany/epiphany.cc (epiphany_legitimate_address_p): Likewise. * config/frv/frv.cc (frv_legitimate_address_p): Likewise. * config/ft32/ft32.cc (ft32_addr_space_legitimate_address_p): Likewise. * config/gcn/gcn.cc (gcn_addr_space_legitimate_address_p): Likewise. * config/h8300/h8300.cc (h8300_legitimate_address_p): Likewise. * config/i386/i386.cc (ix86_legitimate_address_p): Likewise. * config/ia64/ia64.cc (ia64_legitimate_address_p): Likewise. * config/iq2000/iq2000.cc (iq2000_legitimate_address_p): Likewise. * config/lm32/lm32.cc (lm32_legitimate_address_p): Likewise. * config/loongarch/loongarch.cc (loongarch_legitimate_address_p): Likewise. * config/m32c/m32c.cc (m32c_legitimate_address_p): Likewise. (m32c_addr_space_legitimate_address_p): Likewise. * config/m32r/m32r.cc (m32r_legitimate_address_p): Likewise. * config/m68k/m68k.cc (m68k_legitimate_address_p): Likewise. * config/mcore/mcore.cc (mcore_legitimate_address_p): Likewise. * config/microblaze/microblaze-protos.h (tree.h): New include for tree_code ERROR_MARK. (microblaze_legitimate_address_p): Adjust with extra unnamed code_helper argument with default ERROR_MARK. * config/microblaze/microblaze.cc (microblaze_legitimate_address_p): Likewise. * config/mips/mips.cc (mips_legitimate_address_p): Likewise. * config/mmix/mmix.cc (mmix_legitimate_address_p): Likewise. * config/mn10300/mn10300.cc (mn10300_legitimate_address_p): Likewise. * config/moxie/moxie.cc (moxie_legitimate_address_p): Likewise. * config/msp430/msp430.cc (msp430_legitimate_address_p): Likewise. (msp430_addr_space_legitimate_address_p): Adjust with extra code_helper argument with default ERROR_MARK and adjust the call to function msp430_legitimate_address_p. * config/nds32/nds32.cc (nds32_legitimate_address_p): Adjust with extra unnamed code_helper argument with default ERROR_MARK. * config/nios2/nios2.cc (nios2_legitimate_address_p): Likewise. * config/nvptx/nvptx.cc (nvptx_legitimate_address_p): Likewise. * config/or1k/or1k.cc (or1k_legitimate_address_p): Likewise. * config/pa/pa.cc (pa_legitimate_address_p): Likewise. * config/pdp11/pdp11.cc (pdp11_legitimate_address_p): Likewise. * config/pru/pru.cc (pru_addr_space_legitimate_address_p): Likewise. * config/riscv/riscv.cc (riscv_legitimate_address_p): Likewise. * config/rl78/rl78-protos.h (rl78_as_legitimate_address): Likewise. (tree.h): New include for tree_code ERROR_MARK. * config/rl78/rl78.cc (rl78_as_legitimate_address): Adjust with extra unnamed code_helper argument with default ERROR_MARK. * config/rs6000/rs6000.cc (rs6000_legitimate_address_p): Likewise. (rs6000_debug_legitimate_address_p): Adjust with extra code_helper argument and adjust the call to function rs6000_legitimate_address_p. * config/rx/rx.cc (rx_is_legitimate_address): Adjust with extra unnamed code_helper argument with default ERROR_MARK. * config/s390/s390.cc (s390_legitimate_address_p): Likewise. * config/sh/sh.cc (sh_legitimate_address_p): Likewise. * config/sparc/sparc.cc (sparc_legitimate_address_p): Likewise. * config/v850/v850.cc (v850_legitimate_address_p): Likewise. * config/vax/vax.cc (vax_legitimate_address_p): Likewise. * config/visium/visium.cc (visium_legitimate_address_p): Likewise. * config/xtensa/xtensa.cc (xtensa_legitimate_address_p): Likewise. * config/stormy16/stormy16-protos.h (xstormy16_legitimate_address_p): Likewise. (tree.h): New include for tree_code ERROR_MARK. * config/stormy16/stormy16.cc (xstormy16_legitimate_address_p): Adjust with extra unnamed code_helper argument with default ERROR_MARK.
The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>: https://gcc.gnu.org/g:4a8e6fa8016f8daf184dec49f371ca71b1cb0f01 commit r14-3093-g4a8e6fa8016f8daf184dec49f371ca71b1cb0f01 Author: Kewen Lin <linkw@linux.ibm.com> Date: Wed Aug 9 00:41:52 2023 -0500 ivopts: Call valid_mem_ref_p with ifn [PR110248] As PR110248 shows, to get the expected query results for that internal functions LEN_{LOAD,STORE} is able to adopt some addressing modes, we need to pass down the related IFN code as well. This patch is to make IVOPTs pass down ifn code for USE_PTR_ADDRESS type uses, it adjusts the related functions {strict_,}memory_address_addr_space_p, and valid_mem_ref_p as well. PR tree-optimization/110248 gcc/ChangeLog: * recog.cc (memory_address_addr_space_p): Add one more argument ch of type code_helper and pass it to targetm.addr_space.legitimate_address_p instead of ERROR_MARK. (offsettable_address_addr_space_p): Update one function pointer with one more argument of type code_helper as its assignees memory_address_addr_space_p and strict_memory_address_addr_space_p have been adjusted, and adjust some call sites with ERROR_MARK. * recog.h (tree.h): New include header file for tree_code ERROR_MARK. (memory_address_addr_space_p): Adjust with one more unnamed argument of type code_helper with default ERROR_MARK. (strict_memory_address_addr_space_p): Likewise. * reload.cc (strict_memory_address_addr_space_p): Add one unnamed argument of type code_helper. * tree-ssa-address.cc (valid_mem_ref_p): Add one more argument ch of type code_helper and pass it to memory_address_addr_space_p. * tree-ssa-address.h (valid_mem_ref_p): Adjust the declaration with one more unnamed argument of type code_helper with default value ERROR_MARK. * tree-ssa-loop-ivopts.cc (get_address_cost): Use ERROR_MARK as code by default, change it with ifn code for USE_PTR_ADDRESS type use, and pass it to all valid_mem_ref_p calls.
The master branch has been updated by Kewen Lin <linkw@gcc.gnu.org>: https://gcc.gnu.org/g:0412f0e374de1f66e20c407e0b519324af3fd5b6 commit r14-3094-g0412f0e374de1f66e20c407e0b519324af3fd5b6 Author: Kewen Lin <linkw@linux.ibm.com> Date: Wed Aug 9 01:15:46 2023 -0500 rs6000: Teach legitimate_address_p about LEN_{LOAD,STORE} [PR110248] This patch is to teach rs6000_legitimate_address_p to handle the queried rtx constructed for LEN_{LOAD,STORE}, since lxvl and stxvl doesn't support x-form or ds-form, so consider it as not legitimate when outer code is PLUS. PR tree-optimization/110248 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_legitimate_address_p): Check if the given code is for ifn LEN_{LOAD,STORE}, if yes then make it not legitimate when outer code is PLUS.
Should be fixed on trunk, thanks all!
The second patch (that pulls in tree.h into recog.h) breaks building for me on an usual amd64 Linux system: configure '--with-pkgversion=basepoints/gcc-14-3093-g4a8e6fa8016, built at 1691996332' \ --prefix=/var/lib/laminar/run/gcc-local/82/toolchain-install \ --enable-werror-always \ --enable-languages=all \ --disable-multilib make V=1 all-gcc echo timestamp > s-preds-h TARGET_CPU_DEFAULT="" \ HEADERS="config/i386/i386-d.h" DEFINES="" \ /bin/bash ../../gcc/gcc/mkconfig.sh tm_d.h /var/lib/laminar/run/gcc-local/82/local-toolchain-install/bin/g++ -std=c++11 -c -g -O2 -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -I. -Ibuild -I../../gcc/gcc -I../../gcc/gcc/build -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include \ -o build/genflags.o ../../gcc/gcc/genflags.cc /var/lib/laminar/run/gcc-local/82/local-toolchain-install/bin/g++ -std=c++11 -g -O2 -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -static-libstdc++ -static-libgcc -o build/genflags \ build/genflags.o build/rtl.o build/read-rtl.o build/ggc-none.o build/vec.o build/min-insn-modes.o build/gensupport.o build/print-rtl.o build/hash-table.o build/sort.o build/read-md.o build/errors.o ../build-x86_64-pc-linux-gnu/libiberty/libiberty.a /var/lib/laminar/run/gcc-local/82/local-toolchain-install/bin/g++ -std=c++11 -c -g -O2 -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -I. -Ibuild -I../../gcc/gcc -I../../gcc/gcc/build -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include \ -o build/genconditions.o ../../gcc/gcc/genconditions.cc /var/lib/laminar/run/gcc-local/82/local-toolchain-install/bin/g++ -std=c++11 -g -O2 -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -static-libstdc++ -static-libgcc -o build/genconditions \ build/genconditions.o build/rtl.o build/read-rtl.o build/ggc-none.o build/vec.o build/min-insn-modes.o build/gensupport.o build/print-rtl.o build/hash-table.o build/sort.o build/read-md.o build/errors.o ../build-x86_64-pc-linux-gnu/libiberty/libiberty.a build/genconditions ../../gcc/gcc/common.md ../../gcc/gcc/config/i386/i386.md > tmp-condmd.cc /bin/bash ../../gcc/gcc/../move-if-change tmp-condmd.cc build/gencondmd.cc echo timestamp > s-conditions build/genpreds -c ../../gcc/gcc/common.md ../../gcc/gcc/config/i386/i386.md > tmp-constrs.h /bin/bash ../../gcc/gcc/../move-if-change tmp-constrs.h tm-constrs.h echo timestamp > s-constrs-h /var/lib/laminar/run/gcc-local/82/local-toolchain-install/bin/g++ -std=c++11 -c -g -O2 -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -I. -Ibuild -I../../gcc/gcc -I../../gcc/gcc/build -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include \ -o build/gencondmd.o build/gencondmd.cc In file included from ../../gcc/gcc/tree.h:23, from ../../gcc/gcc/recog.h:24, from build/gencondmd.cc:40: ../../gcc/gcc/tree-core.h:145:10: fatal error: all-tree.def: No such file or directory 145 | #include "all-tree.def" | ^~~~~~~~~~~~~~ compilation terminated. make[1]: *** [Makefile:2929: build/gencondmd.o] Error 1 make[1]: Leaving directory '/var/lib/laminar/run/gcc-local/82/toolchain-build/gcc' make: *** [Makefile:4992: all-gcc] Error 2 (Cf. http://toolchain.lug-owl.de/laminar/jobs/gcc-local/82)