Robin Dapp [Mon, 13 May 2024 20:09:35 +0000 (22:09 +0200)]
RISC-V: Add vwsll combine helpers.
This patch enables the usage of vwsll in autovec context by adding the
necessary combine patterns and tests.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vwsll_zext1_<mode>): New
pattern.
(*vwsll_zext2_<mode>): Ditto.
(*vwsll_zext1_scalar_<mode>): Ditto.
(*vwsll_zext1_trunc_<mode>): Ditto.
(*vwsll_zext2_trunc_<mode>): Ditto.
(*vwsll_zext1_trunc_scalar_<mode>): Ditto.
* config/riscv/vector-crypto.md: Make pattern similar to other
narrowing/widening patterns.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vwsll-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vwsll-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vwsll-template.h: New test.
Robin Dapp [Thu, 16 May 2024 10:43:43 +0000 (12:43 +0200)]
RISC-V: Split vwadd.wx and vwsub.wx and add helpers.
vwadd.wx and vwsub.wx have the same problem vfwadd.wf had. This patch
splits the insn pattern in the same way vfwadd.wf was split.
It also adds two patterns to recognize extended scalars. In practice
those do not provide a lot of improvement over what we already have but
in some instances we can get rid of redundant extensions.
gcc/ChangeLog:
* config/riscv/vector.md: Split vwadd.wx/vwsub.wx pattern and
add extended_scalar patterns.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr115068.c: Add vwadd.wx/vwsub.wx
tests.
* gcc.target/riscv/rvv/base/pr115068-run.c: Include pr115068.c.
* gcc.target/riscv/rvv/base/vwaddsub-1.c: New test.
Qing Zhao [Tue, 28 May 2024 18:39:31 +0000 (18:39 +0000)]
Add the 6th argument to .ACCESS_WITH_SIZE
to carry the TYPE of the flexible array.
Such information is needed during tree-object-size.cc.
We cannot use the result type or the type of the 1st argument
of the routine .ACCESS_WITH_SIZE to decide the element type
of the original array due to possible type casting in the
source code.
gcc/c/ChangeLog:
* c-typeck.cc (build_access_with_size_for_counted_by): Add the 6th
argument to .ACCESS_WITH_SIZE.
gcc/ChangeLog:
* tree-object-size.cc (access_with_size_object_size): Use the type
of the 6th argument for the type of the element.
* internal-fn.cc (expand_ACCESS_WITH_SIZE): Update the comment with
the 6th argument.
Qing Zhao [Tue, 28 May 2024 18:37:14 +0000 (18:37 +0000)]
Use the .ACCESS_WITH_SIZE in bound sanitizer.
gcc/c-family/ChangeLog:
* c-ubsan.cc (get_bound_from_access_with_size): New function.
(ubsan_instrument_bounds): Handle call to .ACCESS_WITH_SIZE.
gcc/testsuite/ChangeLog:
* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-4.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
Qing Zhao [Tue, 28 May 2024 18:36:00 +0000 (18:36 +0000)]
Use the .ACCESS_WITH_SIZE in builtin object size.
gcc/ChangeLog:
* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.
gcc/testsuite/ChangeLog:
* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
* gcc.dg/flex-array-counted-by-5.c: New test.
Qing Zhao [Tue, 28 May 2024 18:34:09 +0000 (18:34 +0000)]
Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.
Including the following changes:
* The definition of the new internal function .ACCESS_WITH_SIZE
in internal-fn.def.
* C FE converts every reference to a FAM with a "counted_by" attribute
to a call to the internal function .ACCESS_WITH_SIZE.
(build_component_ref in c_typeck.cc)
This includes the case when the object is statically allocated and
initialized.
In order to make this working, the routine digest_init in c-typeck.cc
is updated to fold calls to .ACCESS_WITH_SIZE to its first argument
when require_constant is TRUE.
However, for the reference inside "offsetof", the "counted_by" attribute is
ignored since it's not useful at all.
(c_parser_postfix_expression in c/c-parser.cc)
In addtion to "offsetof", for the reference inside operator "typeof" and
"alignof", we ignore counted_by attribute too.
When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
replace the call with its first argument.
* Convert every call to .ACCESS_WITH_SIZE to its first argument.
(expand_ACCESS_WITH_SIZE in internal-fn.cc)
* Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
get the reference from the call to .ACCESS_WITH_SIZE.
(is_access_with_size_p and get_ref_from_access_with_size in tree.cc)
gcc/c/ChangeLog:
* c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
attribute when build_component_ref inside offsetof operator.
* c-tree.h (build_component_ref): Add one more parameter.
* c-typeck.cc (build_counted_by_ref): New function.
(build_access_with_size_for_counted_by): New function.
(build_component_ref): Check the counted-by attribute and build
call to .ACCESS_WITH_SIZE.
(build_unary_op): When building ADDR_EXPR for
.ACCESS_WITH_SIZE, use its first argument.
(lvalue_p): Accept call to .ACCESS_WITH_SIZE.
(digest_init): Fold call to .ACCESS_WITH_SIZE to its first
argument when require_constant is TRUE.
gcc/ChangeLog:
* internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
* internal-fn.def (ACCESS_WITH_SIZE): New internal function.
* tree.cc (is_access_with_size_p): New function.
(get_ref_from_access_with_size): New function.
* tree.h (is_access_with_size_p): New prototype.
(get_ref_from_access_with_size): New prototype.
Qing Zhao [Tue, 28 May 2024 18:30:05 +0000 (18:30 +0000)]
Provide counted_by attribute to flexible array member field
'counted_by (COUNT)'
The 'counted_by' attribute may be attached to the C99 flexible
array member of a structure. It indicates that the number of the
elements of the array is given by the field "COUNT" in the
same structure as the flexible array member.
GCC may use this information to improve detection of object size information
for such structures and provide better results in compile-time diagnostics
and runtime features like the array bound sanitizer and
the '__builtin_dynamic_object_size'.
specifies that the 'array' is a flexible array member whose number
of elements is given by the field 'count' in the same structure.
The field that represents the number of the elements should have an
integer type. Otherwise, the compiler reports an error and
ignores the attribute.
When the field that represents the number of the elements is assigned a
negative integer value, the compiler treats the value as zero.
An explicit 'counted_by' annotation defines a relationship between
two objects, 'p->array' and 'p->count', and there are the following
requirementthat on the relationship between this pair:
* 'p->count' must be initialized before the first reference to
'p->array';
* 'p->array' has _at least_ 'p->count' number of elements
available all the time. This relationship must hold even
after any of these related objects are updated during the
program.
It's the user's responsibility to make sure the above requirements
to be kept all the time. Otherwise the compiler reports
warnings, at the same time, the results of the array bound
sanitizer and the '__builtin_dynamic_object_size' is undefined.
One important feature of the attribute is, a reference to the
flexible array member field uses the latest value assigned to
the field that represents the number of the elements before that
reference. For example,
p->count = val1;
p->array[20] = 0; // ref1 to p->array
p->count = val2;
p->array[30] = 0; // ref2 to p->array
in the above, 'ref1' uses 'val1' as the number of the elements
in 'p->array', and 'ref2' uses 'val2' as the number of elements
in 'p->array'.
gcc/c-family/ChangeLog:
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.
gcc/c/ChangeLog:
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.
* c-tree.h (lookup_field): New prototype.
* c-typeck.cc (lookup_field): Expose as extern function.
(tagged_types_tu_compatible_p): Check counted_by attribute for
structure type.
gcc/ChangeLog:
* doc/extend.texi: Document attribute counted_by.
gcc/testsuite/ChangeLog:
* gcc.dg/flex-array-counted-by.c: New test.
* gcc.dg/flex-array-counted-by-7.c: New test.
* gcc.dg/flex-array-counted-by-8.c: New test.
where SImode divmod_operator (div,mod,udiv,umod) has DImode operands.
Wrap input operand with truncate:SI to make machine modes consistent.
PR target/115297
gcc/ChangeLog:
* config/alpha/alpha.md (<any_divmod:code>si3): Wrap DImode
operands 3 and 4 with truncate:SI RTX.
(*divmodsi_internal_er): Ditto for operands 1 and 2.
(*divmodsi_internal_er_1): Ditto.
(*divmodsi_internal): Ditto.
* config/alpha/constraints.md ("b"): Correct register
number in the description.
Thomas Schwinge [Tue, 28 May 2024 21:20:29 +0000 (23:20 +0200)]
nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'
The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this. Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more
"sorry, unimplemented: global constructors not supported on this target".
For proper execution test results, this depends on
<https://github.com/SourceryTools/nvptx-tools/commit/96f8fc59a757767b9e98157d95c21e9fef22a93b>
"ld: Global constructor/destructor support".
gcc/
* config/nvptx/nvptx.h: Configure global constructor, destructor
support.
gcc/testsuite/
* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
in identifiers.
* lib/target-supports.exp
(check_effective_target_global_constructor): Enable for nvptx.
libgcc/
* config/nvptx/crt0.c (__gbl_ctors): New weak function.
(__main): Invoke it.
* config/nvptx/gbl-ctors.c: New.
* config/nvptx/t-nvptx: Configure global constructor, destructor
support.
Marc Poulhiès [Thu, 23 May 2024 09:57:54 +0000 (11:57 +0200)]
fix: valid compiler optimization may fail the test
cxa4001 may fail with "Exception not raised" when the compiler omits the
calls to To_Mapping, in accordance with 10.2.1(18/3):
"If a library unit is declared pure, then the implementation is
permitted to omit a call on a library-level subprogram of the library
unit if the results are not needed after the call"
Using the result of both To_Mapping calls prevents the compiler from
omitting them.
"The corrected test will be available on the ACAA web site
(http://www.ada-auth.org/), and will be issued with the Modified Tests List
version 2.6K, 3.1DD, and 4.1GG."
gcc/testsuite/ChangeLog:
* ada/acats/tests/cxa/cxa4001.a: Use function result.
Rainer Orth [Fri, 31 May 2024 09:29:19 +0000 (11:29 +0200)]
build: Include minor version in config.gcc unsupported message
It has been pointed out to me that when moving Solaris 11.3 from
config.gcc's obsolete to unsupported list, I'd forgotten to also move
the minor version info, leading to confusing
*** Configuration i386-pc-solaris2.11 not supported
instead of the correct
*** Configuration i386-pc-solaris2.11.3 not supported
Andrew Pinski [Thu, 30 May 2024 03:40:31 +0000 (20:40 -0700)]
Fix some opindex for some options [PR115022]
While looking at the index I noticed that some options had
`-` in the front for the index which is wrong. And then
I noticed there was no index for `mcmodel=` for targets or had
used `-mcmodel` incorrectly.
This fixes both of those and regnerates the urls files see that
`-mcmodel=` option now has an url associated with it.
[testsuite] conditionalize dg-additional-sources on target and type
added two additional args to dg-additional-files-options.
Unfortunately, this completely broke several testsuites like
ERROR: tcl error sourcing /vol/gcc/src/hg/master/local/libatomic/testsuite/../../gcc/testsuite/lib/gcc-dg.exp.
wrong # args: should be "dg-additional-files-options options source dest type"
since the patch forgot to adjust some of the callers.
This patch fixes that.
Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
x86_64-pc-linux-gnu.
xtensa: Use epilogue_completed rather than cfun->machine->epilogue_done
In commit ad89d820bf, an "epilogue_done" member was added to the
machine_function structure, but it is sufficient to use the existing
"epilogue_completed" global variable.
xtensa: Use REG_P(), MEM_P(), etc. instead of comparing GET_CODE()
Instead of comparing directly, this patch replaces as much as possible with
macros that determine RTX code such as REG_P(), SUBREG_P() or MEM_P(), etc.
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_valid_move, constantpool_address_p,
xtensa_tls_symbol_p, gen_int_relational, xtensa_emit_move_sequence,
xtensa_copy_incoming_a7, xtensa_expand_block_move,
xtensa_expand_nonlocal_goto, xtensa_emit_call,
xtensa_legitimate_address_p, xtensa_legitimize_address,
xtensa_tls_referenced_p, print_operand, print_operand_address,
xtensa_output_literal):
Replace RTX code comparisons with their predicate macros such as
REG_P().
* config/xtensa/xtensa.h (CONSTANT_ADDRESS_P,
LEGITIMATE_PIC_OPERAND_P): Ditto.
* config/xtensa/xtensa.md (reload<mode>_literal, indirect_jump):
Ditto.
Martin Uecker [Fri, 24 May 2024 10:35:27 +0000 (12:35 +0200)]
C23: allow aliasing for types derived from structs with variable size
Previously, we set the aliasing set of structures with variable size
struct foo { int x[n]; char b; };
to zero. The reason is that such types can be compatible to diffrent
structure types which are incompatible.
struct foo { int x[2]; char b; };
struct foo { int x[3]; char b; };
But it is not enough to set the aliasing set to zero, because derived
types would then still end up in different equivalence classes even
though they might be compatible. Instead those types should be set
to structural equivalency. We also add checking assertions that
ensure that TYPE_CANONICAL is set correctly for all tagged types.
gcc/c/
* c-decl.cc (finish_struct): Do not set TYPE_CANONICAL for
structure or unions with variable size.
* c-objc-common.cc (c_get_alias_set): Do not set alias set to zero.
* c-typeck.cc (comptypes_verify): New function.
(comptypes,comptypes_same_p,comptypes_check_enum_int): Add assertion.
(comptypes_equiv_p): Add assertion that ensures that compatible
types have the same equivalence class.
(tagged_types_tu_compatible_p): Remove now unneeded special case.
gcc/testsuite/
* gcc.dg/gnu23-tag-alias-8.c: New test.
Martin Uecker [Sun, 19 May 2024 21:13:22 +0000 (23:13 +0200)]
C: allow aliasing of compatible types derived from enumeral types [PR115157]
Aliasing of enumeral types with the underlying integer is now allowed
by setting the aliasing set to zero. But this does not allow aliasing
of derived types which are compatible as required by ISO C. Instead,
initially set structural equality. Then set TYPE_CANONICAL and update
pointers and main variants when the type is completed (as done for
structures and unions in C23).
gcc/c/
* c-decl.cc (shadow_tag-warned,parse_xref_tag,start_enum,
finish_enum): Set SET_TYPE_STRUCTURAL_EQUALITY / TYPE_CANONICAL.
* c-objc-common.cc (get_alias_set): Remove special case.
(get_aka_type): Add special case.
gcc/c-family/
* c-attribs.cc (handle_hardbool_attribute): Set TYPE_CANONICAL
for hardbools.
gcc/
* godump.cc (go_output_typedef): Use TYPE_MAIN_VARIANT instead
of TYPE_CANONICAL.
gcc/testsuite/
* gcc.dg/enum-alias-1.c: New test.
* gcc.dg/enum-alias-2.c: New test.
* gcc.dg/enum-alias-3.c: New test.
* gcc.dg/enum-alias-4.c: New test.
The expansion now always goes through a clobberless form of the bswaphi
instruction. The instruction is conditionally converted to a rotate at
peephole2 pass. This significantly simplifies bswaphisi2_lowpart
insn pattern attributes.
PR target/115102
gcc/ChangeLog:
* config/i386/i386.md (bswaphi2): Also enable for !TARGET_MOVBE.
(*bswaphi2): New insn pattern.
(bswaphisi2_lowpart): Rename from bswaphi_lowpart. Rewrite
insn RTX to match the expected form of the combine pass.
Remove rol{w} alternative and corresponding attributes.
(bswsaphisi2_lowpart peephole2): New peephole2 pattern to
conditionally convert bswaphisi2_lowpart to rotlhi3_1_slp.
(bswapsi2): Update expander for rename.
(rotlhi3_1_slp splitter): Conditionally split to bswaphi2.
But I think this is testing the wrong natural size. If we exclude
paradoxical subregs (which will get an offset of zero regardless),
it's the inner register that is being split, so it should be the
inner register's natural size that we use.
This matters in the testcase because we have an SFmode lowpart
subreg into the last of three variable-sized vectors. The
SUBREG_BYTE is therefore equal to the size of two variable-sized
vectors. Dividing by the vector size gives a register offset of 2,
as expected, but dividing by the size of a scalar FPR would give
a variable offset.
I think something similar could happen for fixed-size targets if
REGMODE_NATURAL_SIZE is different for vectors and integers (say),
although that case would trade an ICE for an incorrect offset.
gcc/
PR rtl-optimization/115281
* ira-conflicts.cc (go_through_subreg): Use the natural size of
the inner mode rather than the outer mode.
gcc/testsuite/
PR rtl-optimization/115281
* gfortran.dg/pr115281.f90: New test.
* pair-fusion.h: Generic header code for load store pair fusion
that can be shared across different architectures.
* pair-fusion.cc: Generic source code implementation for
load store pair fusion that can be shared across different architectures.
* Makefile.in: Add new object file pair-fusion.o.
* config/aarch64/aarch64-ldp-fusion.cc: Delete generic code and move it
to pair-fusion.cc in the middle-end.
* config/aarch64/t-aarch64: Add header file dependency on pair-fusion.h.
Remove unnecessary header file dependency.
potentially with colorization of the connecting lines.
It adds a new template for typename T:
void text_art::dump<T> (const T&);
for using this to dump any object to stderr that supports a
make_dump_widget method, with similar templates for dumping to
a pretty_printer * and a FILE *.
It uses this within the analyzer to add two new families of dumping
methods: one for program states, e.g.:
I've already found both of these useful when debugging analyzer issues.
The patch uses the former to update the output of
-fdump-analyzer-exploded-nodes-2 and
-fdump-analyzer-exploded-nodes-3.
The older dumping functions within the analyzer are retained in case
they turn out to still be useful for debugging.
gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add text-art/tree-widget.o.
* doc/analyzer.texi: Rewrite discussion of dumping state to
cover the text_art::tree_widget-based dumps, with a more
interesting example.
* text-art/dump-widget-info.h: New file.
* text-art/dump.h: New file.
* text-art/selftests.cc (selftest::text_art_tests): Call
text_art_tree_widget_cc_tests.
* text-art/selftests.h (selftest::text_art_tree_widget_cc_tests):
New decl.
* text-art/theme.cc (ascii_theme::get_cppchar): Handle the various
cell_kind::TREE_*.
(unicode_theme::get_cppchar): Likewise.
* text-art/theme.h (enum class theme::cell_kind): Add
TREE_CHILD_NON_FINAL, TREE_CHILD_FINAL, TREE_X_CONNECTOR, and
TREE_Y_CONNECTOR.
* text-art/tree-widget.cc: New file.
gcc/analyzer/ChangeLog:
* call-details.cc: Define INCLUDE_VECTOR.
* call-info.cc: Likewise.
* call-summary.cc: Likewise.
* checker-event.cc: Likewise.
* checker-path.cc: Likewise.
* complexity.cc: Likewise.
* constraint-manager.cc: Likewise.
(bounded_range::make_dump_widget): New.
(bounded_ranges::add_to_dump_widget): New.
(equiv_class::make_dump_widget): New.
(constraint::make_dump_widget): New.
(bounded_ranges_constraint::make_dump_widget): New.
(constraint_manager::make_dump_widget): New.
* constraint-manager.h (bounded_range::make_dump_widget): New
decl.
(bounded_ranges::add_to_dump_widget): New decl.
(equiv_class::make_dump_widget): New decl.
(constraint::make_dump_widget): New decl.
(bounded_ranges_constraint::make_dump_widget): New decl.
(constraint_manager::make_dump_widget): New decl.
* diagnostic-manager.cc: Define INCLUDE_VECTOR.
* engine.cc: Likewise. Include "text-art/dump.h".
(setjmp_svalue::print_dump_widget_label): New.
(setjmp_svalue::add_dump_widget_children): New.
(exploded_graph::dump_exploded_nodes): Use text_art::dump_to_file
for -fdump-analyzer-exploded-nodes-2 and
-fdump-analyzer-exploded-nodes-3. Fix overlong line.
* feasible-graph.cc: Define INCLUDE_VECTOR.
* infinite-recursion.cc: Likewise.
* kf-analyzer.cc: Likewise.
* kf-lang-cp.cc: Likewise.
* kf.cc: Likewise.
* known-function-manager.cc: Likewise.
* pending-diagnostic.cc: Likewise.
* program-point.cc: Likewise.
* program-state.cc: Likewise. Include "text-art/tree-widget" and
"text-art/dump.h".
(sm_state_map::make_dump_widget): New.
(program_state::dump): New.
(program_state::make_dump_widget): New.
* program-state.h: Include "text-art/widget.h".
(sm_state_map::make_dump_widget): New decl.
(program_state::dump): New decl.
(program_state::make_dump_widget): New decl.
* ranges.cc: Define INCLUDE_VECTOR.
* record-layout.cc: Likewise.
* region-model-asm.cc: Likewise.
* region-model-manager.cc: Likewise.
* region-model-reachability.cc: Likewise.
* region-model.cc: Likewise. Include "text-art/tree-widget.h".
(region_to_value_map::make_dump_widget): New.
(region_model::dump): New.
(region_model::make_dump_widget): New.
(selftest::test_dump): Add test of dump_to_pp<region_model>.
* region-model.h: Include "text-art/widget.h" and
"text-art/dump.h".
(region_to_value_map::make_dump_widget): New decl.
(region_model::dump): New decl.
(region_model::make_dump_widget): New decl.
* region.cc: Define INCLUDE_VECTOR and include "text-art/dump.h".
(region::dump): New.
(region::make_dump_widget): New.
(region::add_dump_widget_children): New.
(frame_region::print_dump_widget_label): New.
(globals_region::print_dump_widget_label): New.
(code_region::print_dump_widget_label): New.
(function_region::print_dump_widget_label): New.
(label_region::print_dump_widget_label): New.
(stack_region::print_dump_widget_label): New.
(heap_region::print_dump_widget_label): New.
(root_region::print_dump_widget_label): New.
(thread_local_region::print_dump_widget_label): New.
(symbolic_region::print_dump_widget_label): New.
(symbolic_region::add_dump_widget_children): New.
(decl_region::print_dump_widget_label): New.
(field_region::print_dump_widget_label): New.
(element_region::print_dump_widget_label): New.
(element_region::add_dump_widget_children): New.
(offset_region::print_dump_widget_label): New.
(offset_region::add_dump_widget_children): New.
(sized_region::print_dump_widget_label): New.
(sized_region::add_dump_widget_children): New.
(cast_region::print_dump_widget_label): New.
(cast_region::add_dump_widget_children): New.
(heap_allocated_region::print_dump_widget_label): New.
(alloca_region::print_dump_widget_label): New.
(string_region::print_dump_widget_label): New.
(bit_range_region::print_dump_widget_label): New.
(var_arg_region::print_dump_widget_label): New.
(errno_region::print_dump_widget_label): New.
(private_region::print_dump_widget_label): New.
(unknown_region::print_dump_widget_label): New.
* region.h: Include "text-art/widget.h".
(region::dump): New decl.
(region::make_dump_widget): New decl.
(region::add_dump_widget_children): New decl.
(frame_region::print_dump_widget_label): New decl.
(globals_region::print_dump_widget_label): New decl.
(code_region::print_dump_widget_label): New decl.
(function_region::print_dump_widget_label): New decl.
(label_region::print_dump_widget_label): New decl.
(stack_region::print_dump_widget_label): New decl.
(heap_region::print_dump_widget_label): New decl.
(root_region::print_dump_widget_label): New decl.
(thread_local_region::print_dump_widget_label): New decl.
(symbolic_region::print_dump_widget_label): New decl.
(symbolic_region::add_dump_widget_children): New decl.
(decl_region::print_dump_widget_label): New decl.
(field_region::print_dump_widget_label): New decl.
(element_region::print_dump_widget_label): New decl.
(element_region::add_dump_widget_children): New decl.
(offset_region::print_dump_widget_label): New decl.
(offset_region::add_dump_widget_children): New decl.
(sized_region::print_dump_widget_label): New decl.
(sized_region::add_dump_widget_children): New decl.
(cast_region::print_dump_widget_label): New decl.
(cast_region::add_dump_widget_children): New decl.
(heap_allocated_region::print_dump_widget_label): New decl.
(alloca_region::print_dump_widget_label): New decl.
(string_region::print_dump_widget_label): New decl.
(bit_range_region::print_dump_widget_label): New decl.
(var_arg_region::print_dump_widget_label): New decl.
(errno_region::print_dump_widget_label): New decl.
(private_region::print_dump_widget_label): New decl.
(unknown_region::print_dump_widget_label): New decl.
* sm-fd.cc: Define INCLUDE_VECTOR.
* sm-file.cc: Likewise.
* sm-malloc.cc: Likewise.
* sm-pattern-test.cc: Likewise.
* sm-signal.cc: Likewise.
* sm-taint.cc: Likewise.
* sm.cc: Likewise.
* state-purge.cc: Likewise.
* store.cc: Likewise. Include "text-art/tree-widget.h".
(add_binding_to_tree_widget): New.
(binding_map::add_to_tree_widget): New.
(binding_cluster::make_dump_widget): New.
(store::make_dump_widget): New.
* store.h: Include "text-art/tree-widget.h".
(binding_map::add_to_tree_widget): New decl.
(binding_cluster::make_dump_widget): New decl.
(store::make_dump_widget): New decl.
* svalue.cc: Define INCLUDE_VECTOR. Include "make-unique.h" and
"text-art/dump.h".
(svalue::dump): New.
(svalue::make_dump_widget): New.
(region_svalue::print_dump_widget_label): New.
(region_svalue::add_dump_widget_children): New.
(constant_svalue::print_dump_widget_label): New.
(constant_svalue::add_dump_widget_children): New.
(unknown_svalue::print_dump_widget_label): New.
(unknown_svalue::add_dump_widget_children): New.
(poisoned_svalue::print_dump_widget_label): New.
(poisoned_svalue::add_dump_widget_children): New.
(initial_svalue::print_dump_widget_label): New.
(initial_svalue::add_dump_widget_children): New.
(unaryop_svalue::print_dump_widget_label): New.
(unaryop_svalue::add_dump_widget_children): New.
(binop_svalue::print_dump_widget_label): New.
(binop_svalue::add_dump_widget_children): New.
(sub_svalue::print_dump_widget_label): New.
(sub_svalue::add_dump_widget_children): New.
(repeated_svalue::print_dump_widget_label): New.
(repeated_svalue::add_dump_widget_children): New.
(bits_within_svalue::print_dump_widget_label): New.
(bits_within_svalue::add_dump_widget_children): New.
(widening_svalue::print_dump_widget_label): New.
(widening_svalue::add_dump_widget_children): New.
(placeholder_svalue::print_dump_widget_label): New.
(placeholder_svalue::add_dump_widget_children): New.
(unmergeable_svalue::print_dump_widget_label): New.
(unmergeable_svalue::add_dump_widget_children): New.
(compound_svalue::print_dump_widget_label): New.
(compound_svalue::add_dump_widget_children): New.
(conjured_svalue::print_dump_widget_label): New.
(conjured_svalue::add_dump_widget_children): New.
(asm_output_svalue::print_dump_widget_label): New.
(asm_output_svalue::add_dump_widget_children): New.
(const_fn_result_svalue::print_dump_widget_label): New.
(const_fn_result_svalue::add_dump_widget_children): New.
* svalue.h: Include "text-art/widget.h". Add "using
text_art::dump_widget_info".
(svalue::dump): New decl.
(svalue::make_dump_widget): New decl.
(svalue::print_dump_widget_label): New decl.
(svalue::print_dump_widget_label): New decl.
(svalue::add_dump_widget_children): New decl.
(region_svalue::print_dump_widget_label): New decl.
(region_svalue::add_dump_widget_children): New decl.
(constant_svalue::print_dump_widget_label): New decl.
(constant_svalue::add_dump_widget_children): New decl.
(unknown_svalue::print_dump_widget_label): New decl.
(unknown_svalue::add_dump_widget_children): New decl.
(poisoned_svalue::print_dump_widget_label): New decl.
(poisoned_svalue::add_dump_widget_children): New decl.
(initial_svalue::print_dump_widget_label): New decl.
(initial_svalue::add_dump_widget_children): New decl.
(unaryop_svalue::print_dump_widget_label): New decl.
(unaryop_svalue::add_dump_widget_children): New decl.
(binop_svalue::print_dump_widget_label): New decl.
(binop_svalue::add_dump_widget_children): New decl.
(sub_svalue::print_dump_widget_label): New decl.
(sub_svalue::add_dump_widget_children): New decl.
(repeated_svalue::print_dump_widget_label): New decl.
(repeated_svalue::add_dump_widget_children): New decl.
(bits_within_svalue::print_dump_widget_label): New decl.
(bits_within_svalue::add_dump_widget_children): New decl.
(widening_svalue::print_dump_widget_label): New decl.
(widening_svalue::add_dump_widget_children): New decl.
(placeholder_svalue::print_dump_widget_label): New decl.
(placeholder_svalue::add_dump_widget_children): New decl.
(unmergeable_svalue::print_dump_widget_label): New decl.
(unmergeable_svalue::add_dump_widget_children): New decl.
(compound_svalue::print_dump_widget_label): New decl.
(compound_svalue::add_dump_widget_children): New decl.
(conjured_svalue::print_dump_widget_label): New decl.
(conjured_svalue::add_dump_widget_children): New decl.
(asm_output_svalue::print_dump_widget_label): New decl.
(asm_output_svalue::add_dump_widget_children): New decl.
(const_fn_result_svalue::print_dump_widget_label): New decl.
(const_fn_result_svalue::add_dump_widget_children): New decl.
* trimmed-graph.cc: Define INCLUDE_VECTOR.
* varargs.cc: Likewise.
Tobias Burnus [Thu, 30 May 2024 11:21:43 +0000 (13:21 +0200)]
libgomp.texi: Impl. update for USM and missing 5.2 item
libgomp/ChangeLog:
* libgomp.texi (OpenMP 5.0 status): Mark 'requires' as done and
link to 'Offload-Target Specifics'.
(OpenMP 5.2 status): Add item about additional map-type modifiers
in 'declare mapper'.
Alexandre Oliva [Wed, 29 May 2024 05:52:18 +0000 (02:52 -0300)]
[testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*
Codegen changes caused add instruction count mismatches on
ppc-*-linux-gnu and other 32-bit ppc targets. At some point the
expected counts were adjusted for lp64, but ilp32 differences
remained, and published test results confirm it.
Alexandre Oliva [Thu, 30 May 2024 07:01:19 +0000 (04:01 -0300)]
[testsuite] conditionalize dg-additional-sources on target and type
g++.dg/vect/pr95401.cc has dg-additional-sources, and that fails when
check_vect_support_and_set_flags finds vector support lacking for
execution tests: tests decay to compile tests, and additional sources
are rejected by the compiler when compiling to a named output file.
At first I considered using some effective target to conditionalize
the additional sources. There was no support for target-specific
additional sources, so I added that.
But then, I found that adding an effective target to check whether the
test involves linking would just make for busy work in this case, and
so I went ahead and adjusted the handling of additional sources to
refrain from adding them on compile tests, reporting them as
unsupported.
That solves the problem without using the newly-added machinery for
per-target additional sources, but I figured since I'd implemented it
I might as well contribute it, since there might be other uses for it.
for gcc/ChangeLog
* doc/sourcebuild.texi (dg-additional-sources): Document
newly-added support for target selectors, and implicit discard
on non-linking tests that name the compiler output explicitly.
for gcc/testsuite/ChangeLog
* lib/gcc-defs.exp (dg-additional-sources): Support target
selectors. Make it cumulative.
(dg-additional-files-options): Take dest and type. Note
unsupported additional sources when not linking and naming the
compiler output. Adjust source dirname prepending to cope
with leading blanks.
* lib/g++.exp (g++_target_compile): Pass dest and type on to
dg-additional-files-options.
* lib/gcc.exp (gcc_target_compile): Likewise.
* lib/gdc.exp (gdb_target_compile): Likewise.
* lib/gfortran.exp (gfortran_target_compile): Likewise.
* lib/go.exp (go_target_compile): Likewise.
* lib/obj-c++.exp (obj-c++_target_compile): Likewise.
* lib/objc.exp (objc_target_compile): Likewise.
* lib/rust.exp (rust_target_compile): Likewise.
* lib/profopt.exp (profopt-execute): Likewise-ish.
Alexandre Oliva [Thu, 30 May 2024 07:01:15 +0000 (04:01 -0300)]
[libstdc++-v3] [rtems] enable filesystem support
mkdir, chdir and chmod functions are defined in librtemscpu, that
doesn't get linked in during libstdc++-v3 configure, but applications
use -qrtems for linking, which brings those symbols in, so it makes
sense to mark them as available so that the C++ filesystem APIs are
enabled.
for libstdc++-v3/ChangeLog
* configure.ac [*-*-rtems*]: Set chdir, chmod and mkdir as
available.
* configure: Rebuilt.
liuhongt [Tue, 27 Feb 2024 07:34:57 +0000 (15:34 +0800)]
Don't reduce estimated unrolled size for innermost loop.
For the innermost loop, after completely loop unroll, it will most likely
not be able to reduce the body size to 2/3. The current 2/3 reduction
will make some of the larger loops completely unrolled during
cunrolli, which will then result in them not being able to be
vectorized. It also increases the register pressure.
The patch move the 2/3 reduction from estimated_unrolled_size to
tree_unroll_loops_completely.
gcc/ChangeLog:
PR tree-optimization/112325
* tree-ssa-loop-ivcanon.cc (estimated_unrolled_size): Move the
2 / 3 loop body size reduction to ..
(try_unroll_loop_completely): .. here, add it for the check of
body size shrink, and the check of comparison against
param_max_completely_peeled_insns when
(!cunrolli ||loop->inner).
(canonicalize_loop_induction_variables): Add new parameter
cunrolli and pass down.
(tree_unroll_loops_completely_1): Ditto.
(canonicalize_induction_variables): Pass cunrolli as false to
canonicalize_loop_induction_variables.
(tree_unroll_loops_completely): Set cunrolli to true at
beginning and set it to false after CHANGED is true.
Alexandre Oliva [Thu, 30 May 2024 05:06:48 +0000 (02:06 -0300)]
[testsuite] conditionalize dg-additional-sources on target and type
g++.dg/vect/pr95401.cc has dg-additional-sources, and that fails when
check_vect_support_and_set_flags finds vector support lacking for
execution tests: tests decay to compile tests, and additional sources
are rejected by the compiler when compiling to a named output file.
At first I considered using some effective target to conditionalize
the additional sources. There was no support for target-specific
additional sources, so I added that.
But then, I found that adding an effective target to check whether the
test involves linking would just make for busy work in this case, and
so I went ahead and adjusted the handling of additional sources to
refrain from adding them on compile tests, reporting them as
unsupported.
That solves the problem without using the newly-added machinery for
per-target additional sources, but I figured since I'd implemented it
I might as well contribute it, since there might be other uses for it.
for gcc/ChangeLog
* doc/sourcebuild.texi (dg-additional-sources): Document
newly-added support for target selectors, and implicit discard
on non-linking tests that name the compiler output explicitly.
for gcc/testsuite/ChangeLog
* lib/gcc-defs.exp (dg-additional-sources): Support target
selectors. Make it cumulative.
(dg-additional-files-options): Take dest and type. Note
unsupported additional sources when not linking and naming the
compiler output. Adjust source dirname prepending to cope
with leading blanks.
* lib/g++.exp (g++_target_compile): Pass dest and type on to
dg-additional-files-options.
* lib/gcc.exp (gcc_target_compile): Likewise.
* lib/gdc.exp (gdb_target_compile): Likewise.
* lib/gfortran.exp (gfortran_target_compile): Likewise.
* lib/go.exp (go_target_compile): Likewise.
* lib/obj-c++.exp (obj-c++_target_compile): Likewise.
* lib/objc.exp (objc_target_compile): Likewise.
* lib/rust.exp (rust_target_compile): Likewise.
* lib/profopt.exp (profopt-execute): Likewise-ish.
Martin Uecker [Sat, 30 Mar 2024 18:49:48 +0000 (19:49 +0100)]
C23: fix aliasing for structures/unions with incomplete types
When incomplete structure/union types are completed later, compatibility
of struct types that contain pointers to such types changes. When forming
equivalence classes for TYPE_CANONICAL, we therefor need to be conservative
and treat all structs with the same tag which are pointer targets as
equivalent for purposed of determining equivalency of structure/union
types which contain such types as member. This avoids having to update
TYPE_CANONICAL of such structure/unions recursively. The pointer types
themselves are updated in c_update_type_canonical.
gcc/c/
* c-typeck.cc (comptypes_internal): Add flag to track
whether a struct is the target of a pointer.
(tagged_types_tu_compatible): When forming equivalence
classes, treat nested pointed-to structs as equivalent.
gcc/testsuite/
* gcc.dg/c23-tag-incomplete-alias-1.c: New test.
YunQiang Su [Tue, 28 May 2024 18:28:25 +0000 (02:28 +0800)]
MIPS16: Mark $2/$3 as clobbered if GP is used
PR Target/84790.
The gp init sequence
li $2,%hi(_gp_disp)
addiu $3,$pc,%lo(_gp_disp)
sll $2,16
addu $2,$3
is generated directly in `mips_output_function_prologue`, and does
not appear in the RTL.
So the IRA/IPA passes are not aware that $2/$3 have been clobbered,
so they may be used for cross (local) function call.
Let's mark $2/$3 clobber both:
- Just after the UNSPEC_GP RTL of a function;
- Just after a function call.
Reported-by: Matthias Schiffer <mschiffer@universe-factory.net> Origin-Patch-by: Felix Fietkau <nbd@nbd.name>.
gcc
* config/mips/mips.cc(mips16_gp_pseudo_reg): Mark
MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered.
(mips_emit_call_insn): Mark MIPS16_PIC_TEMP and
MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP.
[committed] [v2] More logical op simplifications in simplify-rtx.cc
does some simplifications, and then `bseli.b $w1,$w0,255` is found that
it is same with `or.v $w1,$w0,$w1`. So there will be no bseli.b instruction
generated.
Let's use 254 instead of 255 to test the generation of `bseli.b`.
gcc/testsuite
* gcc.target/mips/msa-builtins.c: Use 254 instead of 255 for
bseli.b, as `bseli.b $w0,$w1,255` is same as `or.v $w0,$w0,$w1`.
This patch fixes libgm2/libm2iso/wraptime.cc:InitTM so that
it does not always return NULL. The incorrect autoconf macro
was used (inside InitTM) and the function short circuited
to return NULL. The fix is to use HAVE_SYS_TIME_H and use
AC_HEADER_TIME in libgm2/configure.ac.
libgm2/ChangeLog:
PR modula2/115276
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use AC_HEADER_TIME.
* libm2iso/wraptime.cc (InitTM): Check HAVE_SYS_TIME_H
before using struct tm to obtain the size.
gcc/testsuite/ChangeLog:
PR modula2/115276
* gm2/isolib/run/pass/testinittm.mod: New test.
Andrew Pinski [Mon, 27 May 2024 00:38:37 +0000 (17:38 -0700)]
match: Add support for `a ^ CST` to bitwise_inverted_equal_p [PR115224]
While looking into something else, I noticed that `a ^ CST` needed to be
special casing to bitwise_inverted_equal_p as it would simplify to `a ^ ~CST`
for the bitwise not.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Andrew Pinski [Sun, 26 May 2024 06:29:48 +0000 (23:29 -0700)]
Match: Add maybe_bit_not instead of plain matching
While working on adding matching of negative expressions of `a - b`,
I noticed that we started to have "duplicated" patterns due to not having
a way to match maybe negative expressions. So I went back to what I did for
bit_not and decided to improve the situtation there so for some patterns
where we had 2 operands of an expression where one could have been a bit_not,
add back maybe_bit_not.
This does not add maybe_bit_not in every place were bitwise_inverted_equal_p
is used, just the ones were 2 operands of an expression could be swapped.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd (bit_not_with_nop): Unconditionalize.
(maybe_cmp): Likewise.
(maybe_bit_not): New match pattern.
(`~X & X`): Use maybe_bit_not and add `:c` back.
(`~x ^ x`/`~x | x`): Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
aarch64: Split aarch64_combinev16qi before RA [PR115258]
Two-vector TBL instructions are fed by an aarch64_combinev16qi, whose
purpose is to put the two input data vectors into consecutive registers.
This aarch64_combinev16qi was then split after reload into individual
moves (from the first input to the first half of the output, and from
the second input to the second half of the output).
In the worst case, the RA might allocate things so that the destination
of the aarch64_combinev16qi is the second input followed by the first
input. In that case, the split form of aarch64_combinev16qi uses three
eors to swap the registers around.
This PR is about a test where this worst case occurred. And given the
insn description, that allocation doesn't semm unreasonable.
early-ra should (hopefully) mean that we're now better at allocating
subregs of vector registers. The upcoming RA subreg patches should
improve things further. The best fix for the PR therefore seems
to be to split the combination before RA, so that the RA can see
the underlying moves.
Perhaps it even makes sense to do this at expand time, avoiding the need
for aarch64_combinev16qi entirely. That deserves more experimentation
though.
gcc/
PR target/115258
* config/aarch64/aarch64-simd.md (aarch64_combinev16qi): Allow
the split before reload.
* config/aarch64/aarch64.cc (aarch64_split_combinev16qi): Generalize
into a form that handles pseudo registers.
gcc/testsuite/
PR target/115258
* gcc.target/aarch64/pr115258.c: New test.
François Dumont [Thu, 16 May 2024 04:59:50 +0000 (06:59 +0200)]
libstdc++: Use RAII to replace try/catch blocks
Move _Guard into std::vector declaration and use it to guard all calls to
vector _M_allocate.
Doing so the compiler has more visibility on what is done with the pointers
and do not raise anymore the -Wfree-nonheap-object warning.
libstdc++-v3/ChangeLog:
* include/bits/vector.tcc (_Guard): Move all the nested duplicated class...
* include/bits/stl_vector.h (_Guard_alloc): ...here and rename.
(_M_allocate_and_copy): Use latter.
(_M_initialize_dispatch): Small code simplification.
(_M_range_initialize): Likewise and set _M_finish first from the result
of __uninitialize_fill_n_a that can throw.
Feng Xue [Thu, 16 May 2024 03:08:38 +0000 (11:08 +0800)]
vect: Unify bbs in loop_vec_info and bb_vec_info
Both derived classes have their own "bbs" field, which have exactly same
purpose of recording all basic blocks inside the corresponding vect region,
while the fields are composed by different data type, one is normal array,
the other is auto_vec. This difference causes some duplicated code even
handling the same stuff, almost in tree-vect-patterns. One refinement is
lifting this field into the base class "vec_info", and reset its value to
the continuous memory area pointed by two old "bbs" in each constructor
of derived classes.
2024-05-16 Feng Xue <fxue@os.amperecomputing.com>
gcc/
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Move
initialization of bbs to explicit construction code. Adjust the
definition of nbbs.
(update_epilogue_loop_vinfo): Update nbbs for epilog vinfo.
* tree-vect-patterns.cc (vect_determine_precisions): Make
loop_vec_info and bb_vec_info share same code.
(vect_pattern_recog): Remove duplicated vect_pattern_recog_1 loop.
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Access to bbs[0]
via base vec_info class.
(_bb_vec_info::_bb_vec_info): Initialize bbs and nbbs using data
fields of input auto_vec<> bbs.
(vect_slp_region): Use access to nbbs to replace original
bbs.length().
(vect_schedule_slp_node): Access to bbs[0] via base vec_info class.
* tree-vectorizer.cc (vec_info::vec_info): Add initialization of
bbs and nbbs.
(vec_info::insert_seq_on_entry): Access to bbs[0] via base vec_info
class.
* tree-vectorizer.h (vec_info): Add new fields bbs and nbbs.
(LOOP_VINFO_NBBS): New macro.
(BB_VINFO_BBS): Rename BB_VINFO_BB to BB_VINFO_BBS.
(BB_VINFO_NBBS): New macro.
(_loop_vec_info): Remove field bbs.
(_bb_vec_info): Rename field bbs.
Jason Merrill [Wed, 14 Feb 2024 22:18:17 +0000 (17:18 -0500)]
c++: pragma target and static init [PR109753]
#pragma target and optimize should also apply to implicitly-generated
functions like static initialization functions and defaulted special member
functions.
The handle_optimize_attribute change is necessary to avoid regressing
g++.dg/opt/pr105306.C; maybe_clone_body creates a cgraph_node for the ~B
alias before handle_optimize_attribute, and the alias never goes through
finalize_function, so we need to adjust semantic_interposition somewhere
else.
PR c++/109753
gcc/c-family/ChangeLog:
* c-attribs.cc (handle_optimize_attribute): Set
cgraph_node::semantic_interposition.
Jeff Law [Wed, 29 May 2024 13:41:55 +0000 (07:41 -0600)]
[to-be-committed] [RISC-V] Use pack to handle repeating constants
This patch utilizes zbkb to improve the code we generate for 64bit constants
when the high half is a duplicate of the low half.
Basically we generate the low half and use a pack instruction with that same
register repeated. ie
pack dest,src,src
That gives us a maximum sequence of 3 instructions and sometimes it will be
just 2 instructions (say if the low 32bits can be constructed with a single
addi or lui).
As with shadd, I'm abusing an RTL opcode. This time it's CONCAT. It's
reasonably close to what we're doing. Obviously it's just how we identify the
desire to generate a pack in the array of opcodes. We don't actually emit a
CONCAT.
Note that we don't care about the potential sign extension from bit 31. pack
will only look at bits 0..31 of each input (for rv64). So we go ahead and sign
extend before synthesizing the low part as that allows us to handle more cases
trivially.
I had my testsuite generator chew on random cases of a repeating constant
without any surprises. I don't see much point in including all those in the
testcase (after all there's 2**32 of them). I've got a set of 10 I'm
including. Nothing particularly interesting in them.
An enterprising developer that needs this improved without zbkb could probably
do so with a bit of work. First increase the cost by 1 unit. Second avoid
cases where bit 31 is set and restrict it to cases when we can still create
pseudos. On the codegen side, when encountering the CONCAT, generate the
appropriate shift of "X" into a temporary register, then IOR the temporary with
"X" into the new destination.
Anyway, I've tested this in my tester (though it doesn't turn on zbkb, yet).
I'll let the CI system chew on it overnight, but like mine, I don't think it
lights up zbkb. So it's unlikely to spit out anything interesting.
gcc/
* config/riscv/crypto.md (riscv_xpack_<X:mode>_<HX:mode>_2): Remove '*'
allow it to be used via the gen_* interface.
* config/riscv/riscv.cc (riscv_build_integer): Identify when Zbkb
can be used to profitably synthesize repeating constants.
(riscv_move_integer): Codegen changes to generate those Zbkb sequences.
Jason Merrill [Thu, 16 May 2024 20:09:12 +0000 (16:09 -0400)]
c++: add module extensions
There is a trend in the broader C++ community to use a different extension
for module interface units, even though (in GCC) they are compiled in the
same way as other source files. Let's recognize these extensions as C++.
.ixx is the MSVC standard, while the .c*m are supported by Clang. libc++
standard headers use .cppm, as their other source files use .cpp.
Perhaps libstdc++ might use .ccm for parallel consistency?
One issue with .c++m is that libcpp/mkdeps.cc has been using it for the
phony dependencies to express module dependencies, so I'm changing mkdeps to
something less likely to be an actual file, ".c++-module".
gcc/cp/ChangeLog:
* lang-specs.h: Add module interface extensions.
gcc/ChangeLog:
* doc/invoke.texi: Update module extension docs.
libcpp/ChangeLog:
* mkdeps.cc (make_write): Change .c++m to .c++-module.
gcc/testsuite/ChangeLog:
* g++.dg/modules/dep-1_a.C
* g++.dg/modules/dep-1_b.C
* g++.dg/modules/dep-2.C: Change .c++m to .c++-module.
Tobias Burnus [Wed, 29 May 2024 13:29:06 +0000 (15:29 +0200)]
libgomp: Enable USM for AMD APUs and MI200 devices
If HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true,
all GPUs on the system support unified shared memory. That's
the case for APUs and MI200 devices when XNACK is enabled.
XNACK can be enabled by setting HSA_XNACK=1 as env var for
supported devices; otherwise, if disable, USM code will
use host fallback.
Tobias Burnus [Wed, 29 May 2024 13:14:38 +0000 (15:14 +0200)]
libgomp: Enable USM for some nvptx devices
A few high-end nvptx devices support the attribute
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS; for those, unified shared
memory is supported in hardware. This patch enables support for those -
if all installed nvptx devices have this feature (as the capabilities
are per device type).
This exposes a bug in gomp_copy_back_icvs as it did before use
omp_get_mapped_ptr to find mapped variables, but that returns
the unchanged pointer in cased of shared memory. But in this case,
we have a few actually mapped pointers - like the ICV variables.
Additionally, there was a mismatch with regards to '-1' for the
device number as gomp_copy_back_icvs and omp_get_mapped_ptr count
differently. Hence, do the lookup manually.
* libgomp.texi (nvptx): Update USM description.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
Claim support when requesting USM and all devices support
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS.
* target.c (gomp_copy_back_icvs): Fix device ptr lookup.
(gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the
devices supports USM.
Oskari Pirhonen [Wed, 28 Feb 2024 01:13:30 +0000 (19:13 -0600)]
c-family: add hints for strerror
Add proper hints for implicit declaration of strerror.
The results could be confusing depending on the other included headers.
These example messages are from compiling a trivial program to print the
string for an errno value. It only includes stdio.h (cstdio for C++).
Before:
$ /tmp/gcc-master/bin/gcc test.c -o test_c
test.c: In function ‘main’:
test.c:4:20: warning: implicit declaration of function ‘strerror’; did you mean ‘perror’? [-Wimplicit-function-declaration]
4 | printf("%s\n", strerror(0));
| ^~~~~~~~
| perror
$ /tmp/gcc-master/bin/g++ test.cpp -o test_cpp
test.cpp: In function ‘int main()’:
test.cpp:4:20: error: ‘strerror’ was not declared in this scope; did you mean ‘stderr’?
4 | printf("%s\n", strerror(0));
| ^~~~~~~~
| stderr
After:
$ /tmp/gcc-known-headers/bin/gcc test.c -o test_c
test.c: In function ‘main’:
test.c:4:20: warning: implicit declaration of function ‘strerror’ [-Wimplicit-function-declaration]
4 | printf("%s\n", strerror(0));
| ^~~~~~~~
test.c:2:1: note: ‘strerror’ is defined in header ‘<string.h>’; this is probably fixable by adding ‘#include <string.h>’
1 | #include <stdio.h>
+++ |+#include <string.h>
2 |
$ /tmp/gcc-known-headers/bin/g++ test.cpp -o test_cpp
test.cpp: In function ‘int main()’:
test.cpp:4:20: error: ‘strerror’ was not declared in this scope
4 | printf("%s\n", strerror(0));
| ^~~~~~~~
test.cpp:2:1: note: ‘strerror’ is defined in header ‘<cstring>’; this is probably fixable by adding ‘#include <cstring>’
1 | #include <cstdio>
+++ |+#include <cstring>
2 |
Richard Biener [Mon, 27 May 2024 14:04:35 +0000 (16:04 +0200)]
tree-optimization/115252 - enhance peeling for gaps avoidance
Code generation for contiguous load vectorization can already deal
with generalized avoidance of loading from a gap. The following
extends detection of peeling for gaps requirement with that,
gets rid of the old special casing of a half load and makes sure
when we do access the gap we have peeling for gaps enabled.
PR tree-optimization/115252
* tree-vect-stmts.cc (get_group_load_store_type): Enhance
detecting the number of cases where we can avoid accessing a gap
during code generation.
(vectorizable_load): Remove old half-vector peeling for gap
avoidance which is now redundant. Add gap-aligned case where
it's OK to access the gap. Add assert that we have peeling for
gaps enabled when we access a gap.
Richard Biener [Wed, 29 May 2024 08:41:51 +0000 (10:41 +0200)]
tree-optimization/114435 - pcom left around copies confusing SLP
The following arranges for the pre-SLP vectorization scalar cleanup
to be run when predictive commoning was applied to a loop in the
function. This is similar to the complete unroll situation and
facilitating SLP vectorization. Avoiding the SSA copies in predictive
commoning itself isn't easy (and predcom also sometimes unrolls,
asking for scalar cleanup).
PR tree-optimization/114435
* tree-predcom.cc (tree_predictive_commoning): Queue
the next scalar cleanup sub-pipeline to be run when we
did something.
Hongyu Wang [Wed, 15 May 2024 03:24:34 +0000 (11:24 +0800)]
i386: Fix ix86_option override after change [PR 113719]
In ix86_override_options_after_change, calls to ix86_default_align
and ix86_recompute_optlev_based_flags will cause mismatched target
opt_set when doing cl_optimization_restore. Move them back to
ix86_option_override_internal to solve the issue.
gcc/ChangeLog:
PR target/113719
* config/i386/i386-options.cc (ix86_override_options_after_change):
Remove call to ix86_default_align and
ix86_recompute_optlev_based_flags.
(ix86_option_override_internal): Call ix86_default_align and
ix86_recompute_optlev_based_flags.
Patrick Palka [Wed, 29 May 2024 08:49:37 +0000 (04:49 -0400)]
c++: canonicity of fn types w/ instantiated eh specs [PR115223]
When propagating structural equality in build_cp_fntype_variant, we
should consider structural equality of the exception-less variant, not
of the given type which might use structural equality only because it
has a (complex) noexcept-spec that we're intending to replace, as in
maybe_instantiate_noexcept which calls build_exception_variant using
the deferred-noexcept function type. Otherwise we might pessimistically
use structural equality for a function type with a simple instantiated
noexcept-spec, leading to a LTO-triggered type verification failure if we
later use that (structural-equality) type as the canonical version of
some other variant.
PR c++/115223
gcc/cp/ChangeLog:
* tree.cc (build_cp_fntype_variant): Propagate structural
equality of the exception-less variant.
Rainer Orth [Wed, 29 May 2024 08:08:07 +0000 (10:08 +0200)]
libstdc++: Build libbacktrace and 19_diagnostics/stacktrace with -funwind-tables [PR111641]
Several of the 19_diagnostics/stacktrace tests FAIL on Solaris/SPARC (32
and 64-bit), Solaris/x86 (32-bit only), and several other targets:
FAIL: 19_diagnostics/stacktrace/current.cc -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/current.cc -std=gnu++26 execution test
FAIL: 19_diagnostics/stacktrace/entry.cc -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/entry.cc -std=gnu++26 execution test
FAIL: 19_diagnostics/stacktrace/output.cc -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/output.cc -std=gnu++26 execution test
FAIL: 19_diagnostics/stacktrace/stacktrace.cc -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/stacktrace.cc -std=gnu++26 execution test
As it turns out, both the copy of libbacktrace in libstdc++ and the
testcases proper need to compiled with -funwind-tables, as is done for
libbacktrace itself.
This isn't an issue on Linux/x86_64 and Solaris/amd64 since 64-bit x86
always defaults to -funwind-tables. 32-bit x86 does, too, when
-fomit-frame-pointer is enabled as on Linux/i686, but unlike
Solaris/i386.
So this patch always enables the option both for the libbacktrace copy
and the testcases.
Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
x86_64-pc-linux-gnu.
PR libstdc++/115247
* include/experimental/bits/simd.h (__as_vector): Don't use
vector_size(8) on __i386__.
(__vec_shuffle): Never return MMX vectors, widen to 16 bytes
instead.
(concat): Fix padding calculation to pick up widening logic from
__as_vector.
liuhongt [Wed, 29 May 2024 03:14:26 +0000 (11:14 +0800)]
Align tight&hot loop without considering max skipping bytes.
When hot loop is small enough to fix into one cacheline, we should align
the loop with ceil_log2 (loop_size) without considering maximum
skipp bytes. It will help code prefetch.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_avoid_jump_mispredicts): Change
gen_pad to gen_max_skip_align.
(ix86_align_loops): New function.
(ix86_reorg): Call ix86_align_loops.
* config/i386/i386.md (pad): Rename to ..
(max_skip_align): .. this, and accept 2 operands for align and
skip.
Haochen Jiang [Wed, 29 May 2024 03:13:55 +0000 (11:13 +0800)]
Adjust generic loop alignment from 16:11:8 to 16 for Intel processors
Previously, we use 16:11:8 in generic tune for Intel processors, which
lead to cross cache line issue and result in some random performance
penalty in benchmarks with small loops commit to commit.
After changing to always aligning to 16 bytes, it will somehow solve
the issue.
gcc/ChangeLog:
* config/i386/x86-tune-costs.h (generic_cost): Change from
16:11:8 to 16.
Kewen Lin [Wed, 29 May 2024 02:41:12 +0000 (21:41 -0500)]
testsuite, rs6000: Replace powerpc_vsx_ok with powerpc_vsx [PR114842]
As noted in PR114842, most of the test cases which require
effective target check powerpc_vsx_ok actually care about
if VSX feature is enabled, and they should adopt effective
target powerpc_vsx instead. Otherwise, when users specifying
extra -mno-vsx like in RUNTESTFLAGS, powerpc_vsx_ok returns
true but the test is tested without VSX and the test case
would fail. With the commit teaching powerpc_vsx consider
current_compiler_flags, dg-{additional,}-options can be taken
into account when evaluating powerpc_vsx, so this patch also
moves dg-{additional,}-options lines before lines with
dg-require-effective-target to make it effective.
Andrew MacLeod [Wed, 22 May 2024 23:51:16 +0000 (19:51 -0400)]
Gori_on_edge tweaks.
FAST_VRP uses a non-ranger gori_on_edge routine which allows an optional
outgoing_edge_range object if one wanted to use switches. This is now
integrated with the gori () method of a range_query, and is no longer
needed.
* gimple-range-gori.cc (gori_on_edge): Always use static ranges
from the specified range_query.
* gimple-range-gori.h (gori_on_edge): Change prototype.
* gimple-range.cc (dom_ranger::maybe_push_edge): Change arguments
to call.
Kewen Lin [Wed, 29 May 2024 02:13:40 +0000 (21:13 -0500)]
rs6000: Don't clobber return value when eh_return called [PR114846]
As the associated test case in PR114846 shows, currently
with eh_return involved some register restoring for EH
RETURN DATA in epilogue can clobber the one which holding
the return value. Referring to the existing handlings in
some other targets, this patch makes eh_return expander
call one new define_insn_and_split eh_return_internal which
directly calls rs6000_emit_epilogue with epilogue_type
EPILOGUE_TYPE_EH_RETURN instead of the previous treating
normal return with crtl->calls_eh_return specially.
PR target/114846
gcc/ChangeLog:
* config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): As
EPILOGUE_TYPE_EH_RETURN would be passed as epilogue_type directly
now, adjust the relevant handlings on it.
* config/rs6000/rs6000.md (eh_return expander): Append by calling
gen_eh_return_internal and emit_barrier.
(eh_return_internal): New define_insn_and_split, call function
rs6000_emit_epilogue with epilogue type EPILOGUE_TYPE_EH_RETURN.
liuhongt [Mon, 19 Feb 2024 05:57:24 +0000 (13:57 +0800)]
Reduce cost of MEM (A + imm).
For MEM, rtx_cost iterates each subrtx, and adds up the costs,
so for MEM (reg) and MEM (reg + 4), the former costs 5,
the latter costs 9, it is not accurate for x86. Ideally
address_cost should be used, but it reduce cost too much.
So current solution is make constant disp as cheap as possible.
gcc/ChangeLog:
PR target/67325
* config/i386/i386.cc (ix86_rtx_costs): Reduce cost of MEM (A
+ imm) to "cost of MEM (A)" + 1.
Andrew MacLeod [Wed, 22 May 2024 23:27:01 +0000 (19:27 -0400)]
More tweaks from gimple_outgoing_range changes.
the dom_ranger used for fast vrp no longer needs a local
gimple_outgoing_range object as it is now always available from the
range_query parent class.
The builtin_unreachable code for adjusting globals and removing the
builtin calls during the final VRP pass can now function with just
a range_query object rather than a specific ranger. This adjusts it to
use the extra methods in the range_query API.
This will now allow removal of builtin_unreachable calls even if there is no
active ranger with dependency info available.
* gimple-range.cc (dom_ranger::dom_ranger): Do not initialize m_out.
(dom_ranger::maybe_push_edge): Use gori () rather than m_out.
* gimple-range.h (dom_ranger::m_out): Remove.
* tree-vrp.cc (remove_unreachable::remove_unreachable): Use a
range-query ranther than a gimple_ranger.
(remove_unreachable::remove): New.
(remove_unreachable::m_ranger): Change to a range_query.
(remove_unreachable::handle_early): If there is no dependency
information, do nothing.
(remove_unreachable::remove_and_update_globals): Do not update
globals if there is no dependecy info to use.
- We always have a target_hash_table and bb_ticks because
init_resource_info is always called. These conditionals are
an ancient artifact: it's been quite a while since
resource.cc was used elsewhere than exclusively from reorg.cc
- In mark_target_live_regs, get rid of a now-redundant "if
(tinfo != NULL)" conditional and replace an "if (bb)" with a
gcc_assert.
A "git diff -wb" (ignore whitespace diff) is better at
showing the actual changes.
* resource.cc (free_resource_info, clear_hashed_info_for_insn): Don't
check for non-null target_hash_table and bb_ticks.
(mark_target_live_regs): Ditto. Replace check for non-NULL result from
BLOCK_FOR_INSN with a call to gcc_assert. Fold code conditioned on
tinfo != NULL.
resource.cc (mark_target_live_regs): Remove check for bb not found
No functional change.
A "git diff -wb" (ignore whitespace diff) shows that this
commit just removes a "if (b != -1)" after a "gcc_assert (b
!= -1)" and also removes the subsequent "else" clause.
* resource.cc (mark_target_live_regs): Remove redundant check for b
being -1, after gcc_assert.
resource.cc: Replace calls to find_basic_block with cfgrtl BLOCK_FOR_INSN
...and call compute_bb_for_insn in init_resource_info and
free_bb_for_insn in free_resource_info.
I put a gcc_unreachable in that else-clause for a failing
find_basic_block in mark_target_live_regs after the comment that says:
/* We didn't find the start of a basic block. Assume everything
in use. This should happen only extremely rarely. */
SET_HARD_REG_SET (res->regs);
and found that it fails not extremely rarely but extremely early in
the build (compiling libgcc).
That kind of pessimization leads to suboptimal delay-slot-filling.
Instead, do like many machine_dependent_reorg passes and call
compute_bb_for_insn as part of resource.cc initialization.
After this patch, there's a whole "if (b != -1)" conditional that's
dominated by a gcc_assert (b != -1). I separated that, as it's a NFC
whitespace patch that hampers patch readability.
Altogether this improved coremark performance for CRIS at -O2
-march=v10 by 0.36%.
* resource.cc: Include cfgrtl.h. Use BLOCK_FOR_INSN (insn)->index
instead of calling find_basic_block (insn). Assert for not -1.
(find_basic_block): Remove function.
(init_resource_info): Call compute_bb_for_insn.
(free_resource_info): Call free_bb_for_insn.
resource.cc (mark_target_live_regs): Don't look past target insn, PR115182
The PR115182 regression is that a delay-slot for a conditional branch,
is no longer filled with an insn that has been "sunk" because of r15-518-g99b1daae18c095, for cris-elf w. -O2 -march=v10.
There are still sufficient "nearby" dependency-less insns that the
delay-slot shouldn't be empty. In particular there's one candidate in
the loop, right after an off-ramp branch, off the loop: a move from
$r9 to $r3.
beq .L2
nop
move.d $r9,$r3
But, the resource.cc data-flow-analysis incorrectly says it collides
with registers "live" at that .L2 off-ramp. The off-ramp insns
(inlined from simple_rand) look like this (left-to-right direction):
.L2:
move.d $r12,[_seed.0]
move.d $r13,[_seed.0+4]
ret
movem [$sp+],$r8
So, a store of a long long to _seed, a return instruction and a
restoring multi-register-load of r0..r8 (all callee-saved registers)
in the delay-slot of the return insn. The return-value is kept in
$r10,$r11 so in total $r10..$r13 live plus the stack-pointer and
return-address registers. But, mark_target_live_regs says that
$r0..$r8 are also live because it *includes the registers live for the
return instruction*! While they "come alive" after the movem, they
certainly aren't live at the "off-ramp" .L2 label.
The problem is in mark_target_live_regs: it consults a hash-table
indexed by insn uid, where it tracks the currently live registers with
a "generation" count to handle when it moves around insn, filling
delay-slots. As a fall-back, it starts with registers live at the
start of each basic block, calculated by the comparatively modern df
machinery (except that it can fail finding out which basic block an
insn belongs to, at which times it includes all registers film at 11),
and tracks the semantics of insns up to each insn.
You'd think that's all that should be done, but then for some reason
it *also* looks at insns *after the target insn* up to a few branches,
and includes that in the set of live registers! This is the code in
mark_target_live_regs that starts with the call to
find_dead_or_set_registers. I couldn't make sense of it, so I looked
at its history, and I think I found the cause; it's a thinko or
possibly two thinkos. The original implementation, gcc-git-described
as r0-97-g9c7e297806a27f, later moved from reorg.c to resource.c in r0-20470-gca545bb569b756.
I believe the "extra" lookup was intended to counter flaws in the
reorg.c/resource.c register liveness analysis; to inspect insns along
the execution paths to exclude registers that, when looking at
subsequent insns, weren't live. That guess is backed by a sentence in
the updated (i.e. deleted) part of the function head comment for
mark_target_live_regs: "Next, scan forward from TARGET looking for
things set or clobbered before they are used. These are not live."
To me that sounds like flawed register-liveness data.
An epilogue expanded as RTX (i.e. not just assembly code emitted as
text) is introduced in basepoints/gcc-0-1334-gbdac5f5848fb, so before
that time, nobody would notice that saved registers were included as
live registers in delay-slots in "next-to-last" basic blocks.
Then in r0-24783-g96e9c98d59cc40, the intersection ("and") was changed
to a union ("or"), i.e. it added to the set of live registers instead
of thinning it out. In the gcc-patches archives, I see the patch
submission doesn't offer a C test-case and only has RTX snippets
(apparently for SPARC). The message does admit that the change goes
"against what the comments in the code say":
https://gcc.gnu.org/pipermail/gcc-patches/1999-November/021836.html
It looks like this was related to a bug with register liveness info
messed up when moving a "delay-slotted" insn from one slot to another.
But, I can't help but thinking it's just papering over a register
liveness bug elsewhere.
I think, with a reliable "DF_LR_IN", the whole thing *after* tracking
from start-of-bb up to the target insn should be removed; thus.
This patch also removes the now-unused find_dead_or_set_registers
function.
At r15-518, it fixes the issue for CRIS and improves coremark scores
at -O2 -march=v10 a tiny bit (about 0.05%).
PR rtl-optimization/115182
* resource.cc (mark_target_live_regs): Don't look for
unconditional branches after the target to improve on the
register liveness.
(find_dead_or_set_registers): Remove unused function.
Uros Bizjak [Tue, 28 May 2024 18:25:14 +0000 (20:25 +0200)]
i386: Improve access to _Atomic DImode location via XMM regs for SSE4.1 x86_32 targets
Use MOVD/PEXTRD and MOVD/PINSRD insn sequences to move DImode value
between XMM and GPR register sets for SSE4.1 x86_32 targets in order
to avoid spilling the value to stack.
to:
movd %eax, %xmm0
pinsrd $1, %edx, %xmm0
movq %xmm0, b
gcc/ChangeLog:
* config/i386/sync.md (atomic_loaddi_fpu): Use movd/pextrd
to move DImode value from XMM to GPR for TARGET_SSE4_1.
(atomic_storedi_fpu): Use movd/pinsrd to move DImode value
from GPR to XMM for TARGET_SSE4_1.
David Malcolm [Tue, 28 May 2024 19:55:28 +0000 (15:55 -0400)]
diagnostics: consolidate global state in diagnostic-color.cc
Simplify the table of default colors, avoiding the need to manually
add the strlen of each entry.
Consolidate the global state in diagnostic-color.cc into a
g_color_dict, adding selftests for the new class diagnostic_color_dict.
No functional change intended.
gcc/ChangeLog:
* diagnostic-color.cc: Define INCLUDE_VECTOR.
Include "label-text.h" and "selftest.h".
(struct color_cap): Replace with...
(struct color_default): ...this, adding "m_" prefixes to fields
and dropping "name_len" and "free_val" field.
(color_dict): Convert to...
(gcc_color_defaults): ...this, making const, dropping the trailing
strlen and "false" from each entry.
(class diagnostic_color_dict): New.
(g_color_dict): New.
(colorize_start): Reimplement in terms of g_color_dict.
(diagnostic_color_dict::get_entry_by_name): New, based on
colorize_start.
(diagnostic_color_dict::get_start_by_name): Likewise.
(diagnostic_color_dict::diagnostic_color_dict): New.
(parse_gcc_colors): Reimplement, moving body...
(diagnostic_color_dict::parse_envvar_value): ...here.
(colorize_init): Lazily create g_color_dict.
(selftest::test_empty_color_dict): New.
(selftest::test_default_color_dict): New.
(selftest::test_color_dict_envvar_parsing): New.
(selftest::diagnostic_color_cc_tests): New.
* selftest-run-tests.cc (selftest::run_tests): Call
selftest::diagnostic_color_cc_tests.
* selftest.h (selftest::diagnostic_color_cc_tests): New decl.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Marek Polacek [Thu, 23 May 2024 19:49:42 +0000 (15:49 -0400)]
c++: extend -Wself-move for mem-init-list [PR109396]
We already warn for:
x = std::move (x);
which triggers:
warning: moving 'x' of type 'int' to itself [-Wself-move]
but bug 109396 reports that this doesn't work for a member-initializer-list:
X() : x(std::move (x))
so this patch amends that.
PR c++/109396
gcc/cp/ChangeLog:
* cp-tree.h (maybe_warn_self_move): Declare.
* init.cc (perform_member_init): Call maybe_warn_self_move.
* typeck.cc (maybe_warn_self_move): No longer static. Change the
return type to bool. Also warn when called from
a member-initializer-list. Drop the inform call.
Andrew MacLeod [Mon, 27 May 2024 15:00:57 +0000 (11:00 -0400)]
Do not invoke SCEV if it will use a different range query.
SCEV always uses the current range_query object.
Ranger's cache uses a global value_query when propagating cache values to
avoid re-invoking ranger during simple vavhe propagations.
when folding a PHI value, SCEV can be invoked, and since it alwys uses
the current range_query object, when ranger is active this causes the
undesired re-invoking of ranger during cache propagation.
This patch checks to see if the fold_using_range specified range_query
object is the same as the one SCEV uses, and does not invoke SCEV if
they do not match.
PR tree-optimization/115221
gcc/
* gimple-range-fold.cc (range_of_ssa_name_with_loop_info): Do
not invoke SCEV is range_query's do not match.
gcc/testsuite/
* gcc.dg/pr115221.c: New.
Marek Polacek [Wed, 22 May 2024 20:28:02 +0000 (16:28 -0400)]
c++: mark TARGET_EXPRs for function arguments eliding [PR114707]
Coming back to our discussion in
<https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649426.html>:
TARGET_EXPRs that initialize a function argument are not marked
TARGET_EXPR_ELIDING_P even though gimplify_arg drops such TARGET_EXPRs
on the floor. To work around it, I added a pset to
replace_placeholders_for_class_temp_r, but it would be best to just rely
on TARGET_EXPR_ELIDING_P.
PR c++/114707
gcc/cp/ChangeLog:
* call.cc (convert_for_arg_passing): Call set_target_expr_eliding.
* typeck2.cc (replace_placeholders_for_class_temp_r): Don't use pset.
(digest_nsdmi_init): Call cp_walk_tree_without_duplicates instead of
cp_walk_tree.
David Malcolm [Tue, 28 May 2024 17:04:25 +0000 (13:04 -0400)]
Fix bootstrap on AIX by adding c-family/c-type-mismatch.cc [PR115167]
PR bootstrap/115167 reports a bootstrap failure on AIX triggered by r15-636-g770657d02c986c whilst building f951 in stage 2, due to
the linker not being able to find symbols for:
vtable for range_label_for_type_mismatch
range_label_for_type_mismatch::get_text(unsigned int) const
The only users of the class range_label_for_type_mismatch are in the
C/C++ frontends, each of which supply their own implementation of:
i.e. we had a cluster of symbols that was disconnnected from any
users on f951.
The above patch added a new range_label::get_effects vfunc to the
base class. My hunch is that we were getting away with not defining
the symbol for Fortran with AIX's linker before (since none of the
users are used), but adding the get_effects vfunc has somehow broken
things (possibly because there's an empty implementation in the base
class in the *header*).
The following patch moves all of the code in
gcc/gcc-rich-location.[cc,h,o} defining and using
range_label_for_type_mismatch to a new
gcc/c-family/c-type-mismatch.{cc,h,o}, to help the linker ignore this
cluster of symbols when it's disconnected from users.
I was able to reproduce the failure without the patch, and then
successfully bootstrap with this patch on powerpc-ibm-aix7.3.1.0
(cfarm119).
gcc/c-family/ChangeLog:
PR bootstrap/115167
* c-format.cc: Replace include of "gcc-rich-location.h" with
"c-family/c-type-mismatch.h".
* c-type-mismatch.cc: New file, taking material from
gcc-rich-location.cc.
* c-type-mismatch.h: New file, taking material from
gcc-rich-location.h.
* c-warn.cc: Replace include of "gcc-rich-location.h" with
"c-family/c-type-mismatch.h".
gcc/c/ChangeLog:
PR bootstrap/115167
* c-objc-common.cc: Replace include of "gcc-rich-location.h" with
"c-family/c-type-mismatch.h".
* c-typeck.cc: Likewise.
gcc/cp/ChangeLog:
PR bootstrap/115167
PR bootstrap/115167
* call.cc: Replace include of "gcc-rich-location.h" with
"c-family/c-type-mismatch.h".
* error.cc: Likewise.
* typeck.cc: Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Lyut Nersisyan [Tue, 28 May 2024 15:17:50 +0000 (09:17 -0600)]
[to-be-committed] [RISC-V] Some basic patterns for zbkb code generation
And here's Lyut's basic Zbkb support. Essentially it's four new patterns for
packh, packw, pack plus a bridge pattern needed for packh.
packw is a bit ugly as we need to match a sign extension in an inconvenient
location. We pull it out so that the extension is exposed in a convenient
place for subsequent sign extension elimination.
We need a bridge pattern to get packh. Thankfully the bridge pattern is a
degenerate packh where one operand is x0, so it works as-is without splitting
and provides the bridge to the more general form of packh.
This patch also refines the condition for the constant reassociation patch to
avoid a few more cases than can be handled efficiently with other preexisting
patterns and one bugfix to avoid losing bits, particularly in the xor/ior case.
Lyut did the core work here. I think I did some minor cleanups and the bridge
pattern to work with gcc-15 and beyond.
This is a prerequisite for using zbkb in constant synthesis. It also stands on
its own. I know we've seen it trigger in spec without the constant synthesis
bits.
It's been through our internal CI and my tester. I'll obviously wait for the
upstream CI to finish before taking further action.
gcc/
* config/riscv/crypto.md: Add new combiner patterns to generate
pack, packh, packw instrutions.
* config/riscv/iterators.md (HX): New iterator for half X mode.
* config/riscv/riscv.md (<optab>_shift_reverse<X:mode>): Tighten
cases to avoid. Do not lose bits for XOR/IOR.
gcc/testsuite
* gcc.target/riscv/pack32.c: New test.
* gcc.target/riscv/pack64.c: New test.
* gcc.target/riscv/packh32.c: New test.
* gcc.target/riscv/packh64.c: New test.
* gcc.target/riscv/packw.c: New test.
Co-authored-by: Jeffrey A Law <jlaw@ventanamicro.com>
Feng Xue [Thu, 23 May 2024 07:25:53 +0000 (15:25 +0800)]
vect: Use vect representative statement instead of original in patch recog [PR115060]
Some utility functions (such as vect_look_through_possible_promotion) that are
to find out certain kind of direct or indirect definition SSA for a value, may
return the original one of the SSA, not its pattern representative SSA, even
pattern is involved. For example,
a = (T1) patt_b;
patt_b = (T2) c; // b = ...
patt_c = not-a-cast; // c = ...
Given 'a', the mentioned function will return 'c', instead of 'patt_c'. This
subtlety would make some pattern recog code that is unaware of it mis-use the
original instead of the new pattern statement, which is inconsistent wth
processing logic of the pattern formation pass. This patch corrects the issue
by forcing another utility function (vect_get_internal_def) return the pattern
statement information to caller by default.
2024-05-23 Feng Xue <fxue@os.amperecomputing.com>
gcc/
PR tree-optimization/115060
* tree-vect-patterns.cc (vect_get_internal_def): Return statement for
vectorization.
(vect_widened_op_tree): Call vect_get_internal_def instead of look_def
to get statement information.
(vect_recog_widen_abd_pattern): No need to call vect_stmt_to_vectorize.
The dump scanning is supposed to check that we do not merge two
sligtly different gathers into one SLP node but since we now
SLP the store scanning for "ectorizing stmts using SLP" is no
longer good. Instead the following makes us look for
"stmt 1 .* = .MASK" which would be how the second lane of an SLP
node looks like. We have to handle both .MASK_GATHER_LOAD (for
targets with ifun mask gathers) and .MASK_LOAD (for ones without).
Tested on x86_64-linux with and without native gather and on GCN
where this now avoids a FAIL.
Richard Biener [Mon, 27 May 2024 08:41:02 +0000 (10:41 +0200)]
tree-optimization/115236 - more points-to *ANYTHING = x fixes
The stored-to ANYTHING handling has more holes, uncovered by treating
volatile accesses as ANYTHING. We fail to properly build the
pred and succ graphs, in particular we may not elide direct nodes
from receiving from STOREDANYTHING.
PR tree-optimization/115236
* tree-ssa-structalias.cc (build_pred_graph): Properly
handle *ANYTHING = X.
(build_succ_graph): Likewise. Do not elide direct nodes
from receiving from STOREDANYTHING.
Richard Biener [Tue, 28 May 2024 11:29:30 +0000 (13:29 +0200)]
Avoid pessimistic constraints for asm memory constraints
We process asm memory input/outputs with constraints to ESCAPED
but for this temporarily build an ADDR_EXPR. The issue is that
the used build_fold_addr_expr ends up wrapping the ADDR_EXPR in
a conversion which ends up producing &ANYTHING constraints which
is quite bad. The following uses get_constraint_for_address_of
instead, avoiding the temporary tree and the unhandled conversion.
This avoids a gcc.dg/tree-ssa/restrict-9.c FAIL with the fix
for PR115236.
* tree-ssa-structalias.cc (find_func_aliases): Use
get_constraint_for_address_of to build escape constraints
for asm inputs and outputs.
Richard Biener [Fri, 29 Sep 2023 13:12:54 +0000 (15:12 +0200)]
tree-optimization/115254 - don't account single-lane SLP against discovery limit
The following avoids accounting single-lane SLP to the discovery
limit. As the two testcases show this makes discovery fail,
unfortunately even not the same across targets. The following
should fix two FAILs for GCN as a side-effect.
PR tree-optimization/115254
* tree-vect-slp.cc (vect_build_slp_tree): Only account
multi-lane SLP to limit.
* gcc.dg/vect/slp-cond-2-big-array.c: Expect 4 times SLP.
* gcc.dg/vect/slp-cond-2.c: Likewise.