gcc.gnu.org Git - gcc.git/log

tree-optimization/13962 - handle ptr-ptr compares in ptrs_compare_unequal

Now that we handle pt.null conservatively we can implement the missing
tracking of constant pool entries (aka STRING_CST) and handle
ptr-ptr compares using points-to info in ptrs_compare_unequal.

PR tree-optimization/13962
PR tree-optimization/96564
* tree-ssa-alias.h (pt_solution::const_pool): New flag.
* tree-ssa-alias.cc (ptrs_compare_unequal): Handle pointer-pointer
compares.
(dump_points_to_solution): Dump the const_pool flag, fix guard
of flag dumping.
* gimple-pretty-print.cc (pp_points_to_solution): Likewise.
* tree-ssa-structalias.cc (find_what_var_points_to): Set
the const_pool flag for STRING.
(pt_solution_ior_into): Handle the const_pool flag.
(ipa_escaped_pt): Initialize it.

* gcc.dg/tree-ssa/alias-39.c: New testcase.
* g++.dg/vect/pr68145.cc: Use -fno-tree-pta to avoid UB
to manifest in transforms no longer vectorizing this testcase
for an ICE.

wrong code with points-to and volatile

The following fixes points-to analysis which ignores the fact that
volatile qualified refs can result in any pointer.

* tree-ssa-structalias.cc (get_constraint_for_1): For
volatile referenced or decls use ANYTHING.

* gcc.dg/tree-ssa/alias-38.c: New testcase.

Vect: Support loop len in vectorizable early exit

This patch adds early break auto-vectorization support for target which
use length on partial vectorization.  Consider this following example:

unsigned vect_a[802];
unsigned vect_b[802];

void test (unsigned x, int n)
{
  for (int i = 0; i < n; i++)
  {
    vect_b[i] = x + i;

    if (vect_a[i] > x)
      break;

    vect_a[i] = x;
  }
}

We use VCOND_MASK_LEN to simulate the generate (mask && i < len + bias).
And then the IR of RVV looks like below:

  ...
  _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [32, 32]);
  _55 = (int) _87;
  ...
  mask_patt_6.13_69 = vect_cst__62 < vect__3.12_67;
  vec_len_mask_72 = .VCOND_MASK_LEN (mask_patt_6.13_69, { -1, ... }, \
    {0, ... }, _87, 0);
  if (vec_len_mask_72 != { 0, ... })
    goto <bb 6>; [5.50%]
  else
    goto <bb 7>; [94.50%]

The below tests are passed for this patch:
1. The riscv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

gcc/ChangeLog:

* tree-vect-loop.cc (vect_gen_loop_len_mask): New func to gen
the loop len mask.
* tree-vect-stmts.cc (vectorizable_early_exit): Invoke the
vect_gen_loop_len_mask for 1 or more stmt(s).
* tree-vectorizer.h (vect_gen_loop_len_mask): New func decl
for vect_gen_loop_len_mask.

Signed-off-by: Pan Li <pan2.li@intel.com>

Vect: Support new IFN SAT_ADD for unsigned vector int

For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd<mode>3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd<mode>3
in vector mode.  For example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
    out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
  ivtmp_58 = _80 * 8;
  vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
  vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
  vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
  mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
  vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
    ... }, vect__7.11_66);
  .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
  vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
  vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
  vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
  ivtmp_79 = ivtmp_78 - _80;
  ...
}

After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
  ivtmp_46 = _62 * 8;
  vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
  vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
  vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
  .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
  ...
}

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New
func decl generated by match.pd match.
(vect_recog_sat_add_pattern): New func impl to recog the pattern
for unsigned SAT_ADD.

Signed-off-by: Pan Li <pan2.li@intel.com>

Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

This patch would like to add the middle-end presentation for the
saturation add.  Aka set the result of add to the max when overflow.
It will take the pattern similar as below.

SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Take uint8_t as example, we will have:

* SAT_ADD (1, 254)   => 255.
* SAT_ADD (1, 255)   => 255.
* SAT_ADD (2, 255)   => 255.
* SAT_ADD (255, 255) => 255.

Given below example for the unsigned scalar integer uint64_t:

uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
  return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}

Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  _Bool _2;
  long unsigned int _3;
  long unsigned int _4;
  uint64_t _7;
  long unsigned int _10;
  __complex__ long unsigned int _11;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
  _1 = REALPART_EXPR <_11>;
  _10 = IMAGPART_EXPR <_11>;
  _2 = _10 != 0;
  _3 = (long unsigned int) _2;
  _4 = -_3;
  _7 = _1 | _4;
  return _7;
;;    succ:       EXIT

}

After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _7;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  return _7;
;;    succ:       EXIT
}

The below tests are passed for this patch:
1. The riscv fully regression tests.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
to the return true switch case(s).
* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
* match.pd: Add unsigned SAT_ADD match(es).
* optabs.def (OPTAB_NL): Remove fixed-point limitation for
us/ssadd.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
extern func decl generated in match.pd match.
(match_saturation_arith): New func impl to match the saturation arith.
(math_opts_dom_walker::after_dom_children): Try match saturation
arith when IOR expr.

Signed-off-by: Pan Li <pan2.li@intel.com>

Revert "Revert: "Enable prange support.""

This reverts commit d7bb8eaade3cd3aa70715c8567b4d7b08098e699 and enables prange
support again.

Cleanup prange sanity checks.

The pointers_handled_p() code was a temporary sanity check, and not
even a good one, since we have a cleaner way of checking type
mismatches with operand_check_p.  This patch removes all the code, and
adds an explicit type check for relational operators, which are the
main problem in PR114985.

Adding this check makes it clear where the type mismatch is happening
in IPA, even without prange.  I've added code to skip the range
folding if the types don't match what the operator expects.  In order
to reproduce the latent bug, just remove the operand_check_p calls.

Tested on x86-64 and ppc64le with and without prange support.

gcc/ChangeLog:

PR tree-optimization/114985
* gimple-range-op.cc: Remove pointers_handled_p.
* ipa-cp.cc (ipa_value_range_from_jfunc): Skip range folding if
operands don't match.
(propagate_vr_across_jump_function): Same.
* range-op-mixed.h: Remove pointers_handled_p and tweak
operand_check_p.
* range-op-ptr.cc (range_operator::pointers_handled_p): Remove.
(pointer_plus_operator::pointers_handled_p): Remove.
(class operator_pointer_diff): Remove pointers_handled_p.
(operator_pointer_diff::pointers_handled_p): Remove.
(operator_identity::pointers_handled_p): Remove.
(operator_cst::pointers_handled_p): Remove.
(operator_cast::pointers_handled_p): Remove.
(operator_min::pointers_handled_p): Remove.
(operator_max::pointers_handled_p): Remove.
(operator_addr_expr::pointers_handled_p): Remove.
(operator_bitwise_and::pointers_handled_p): Remove.
(operator_bitwise_or::pointers_handled_p): Remove.
(operator_equal::pointers_handled_p): Remove.
(operator_not_equal::pointers_handled_p): Remove.
(operator_lt::pointers_handled_p): Remove.
(operator_le::pointers_handled_p): Remove.
(operator_gt::pointers_handled_p): Remove.
(operator_ge::pointers_handled_p): Remove.
* range-op.cc (TRAP_ON_UNHANDLED_POINTER_OPERATORS): Remove.
(range_op_handler::lhs_op1_relation): Remove pointers_handled_p checks.
(range_op_handler::lhs_op2_relation): Same.
(range_op_handler::op1_op2_relation): Same.
* range-op.h: Remove RO_* declarations.

Use a boolean type when folding conditionals in simplify_using_ranges.

In adding some traps for PR114985 I noticed that the conditional
folding code in simplify_using_ranges was using the wrong type. This
cleans up the oversight.

gcc/ChangeLog:

PR tree-optimization/114985
* vr-values.cc (simplify_using_ranges::fold_cond_with_ops): Use
boolean type when folding conditionals.

RISC-V: testsuite: Drop march-string in cmpmemsi/cpymemsi tests

The tests cmpmemsi-1.c and cpymemsi-1.c are execution ("dg-do run")
tests, which does not have any restrictions for the enabled extensions.
Further, no other listed options are required.
Let's drop the options, so that the test can also be executed on
non-f and non-d targets. However, we need to set options to the
defaults without '-ansi', because the included test file uses the
'asm' keyword, which is not part of ANSI C.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmpmemsi-1.c: Drop options.
* gcc.target/riscv/cpymemsi-1.c: Likewise.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

tree-optimization/79958 - make DSE track multiple paths

DSE currently gives up when the path we analyze forks.  This leads
to multiple missed dead store elimination PRs.  The following fixes
this by recursing for each path and maintaining the visited bitmap
to avoid visiting CFG re-merges multiple times.  The overall cost
is still limited by the same bound, it's just more likely we'll hit
the limit now.  The patch doesn't try to deal with byte tracking
once a path forks but drops info on the floor and only handling
fully dead stores in that case.

PR tree-optimization/79958
PR tree-optimization/109087
PR tree-optimization/100314
PR tree-optimization/114774
* tree-ssa-dse.cc (dse_classify_store): New forwarder.
(dse_classify_store): Add arguments cnt and visited, recurse
to track multiple paths when we end up with multiple defs.

* gcc.dg/tree-ssa/ssa-dse-48.c: New testcase.
* gcc.dg/tree-ssa/ssa-dse-49.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-50.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-51.c: Likewise.
* gcc.dg/graphite/pr80906.c: Avoid DSE of last data reference
in loop.
* g++.dg/ipa/devirt-24.C: Adjust for extra DSE.
* g++.dg/warn/Wuninitialized-pr107919-1.C: Use more important
-O2 optimization level, -O1 regresses.

ada: Remove obsolete reference in comment

gcc/ada/

* exp_ch7.adb (Attach_Object_To_Master_Node): Remove reference to a
transient object in comment.

ada: Reset scope of top level object declaration during unnesting

When unnesting, the compiler gathers elaboration code and wraps it with
a new dedicated procedure. While doing so, it resets the scopes of
entities that are wrapped to point to this new procedure. This change
also resets the scopes of N_Object_Declaration and
N_Object_Renaming_Declaration nodes only if an elaboration procedure
is needed.

gcc/ada/

* exp_ch7.adb (Reset_Scopes_To_Block_Elab_Proc): also reset scope
for object declarations.

ada: Redundant validity checks

In some cases with validity checking enabled via the -gnatVa option,
the compiler generates validity checks that can (obviously) never fail.
These include validity checks for (some) static expressions, and consecutive
identical checks generated for a single read of an object.

gcc/ada/

* checks.adb (Expr_Known_Valid): Return True for a static expression.
* exp_util.adb (Adjust_Condition): No validity check needed for a
condition if it is an expression for which a validity check has
already been generated.

ada: Exception on Indefinite_Vector aggregate with loop_parameter_specification

Constraint_Error is raised on evaluation of a container aggregate with
a loop_parameter_specification for the type Indefinite_Vector. This
happens due to the Aggregate aspect for type Indefinite_Vector specifying
the Empty_Vector constant for the type's Empty operation rather than
using the type's primitive Empty function. This problem shows up as
a recent regression relative to earlier compilers, evidently due to
recent fixes in the container aggregate area, which uncovered this
issue of the wrong specification in Ada.Containers.Indefinite_Vectors.
The compiler incorrectly initializes the aggregate object using the
Empty_Vector constant rather than invoking the New_Vector function
to allocate the vector object with the appropriate number of elements,
and subsequent calls to Replace_Element fail because the vector object
is empty.

In addition to correcting the Indefinite_Vectors generic package,
checking is added to give an error for an attempt to specify the
Empty operation as a constant rather than a function. (Also note
that another AdaCore package that needs a similar correction is
the VSS.Vector_Strings package.)

gcc/ada/

* libgnat/a-coinve.ads (type Vector): In the Aggregate aspect for
this type, the Empty operation is changed to denote the Empty
function, rather than the Empty_Vector constant.
* exp_aggr.adb (Expand_Container_Aggregate): Remove code for
handling the case where the Empty_Subp denotes a constant object,
which should never happen (and add an assertion that Empty_Subp
must denote a function).
* sem_ch13.adb (Valid_Empty): No longer allow the entity to be an
E_Constant, and require the (optional) parameter of an Empty
function to be of a signed integer type (rather than any integer
type).

ada: Implement new experimental attribute 'Super

This patch implements (under -gnatX0) the 'Super attribute which can be
applied to tagged objects in order to get a view conversion to their
immediate parent.

gcc/ada/

* doc/gnat_rm/implementation_defined_attributes.rst: Add entry for
Super attribute.
* accessibility.adb (Accessibility_Level): Add handling for Super.
* exp_attr.adb (Expand_N_Attribute_Reference): Add entry for
Super.
* sem_attr.adb (Analyze_Attribute): Create a case to handle the
semantic checking and expansion for Super.
(Eval_Attribute): Add entry for Super.
* sem_attr.ads: Add entry for Super.
* sem_util.adb (Is_Aliased_View, Is_Variable): Add case to handle
references to 'Super.
* snames.ads-tmpl: Register Name_Super and Attribute_Super.
* gnat_rm.texi: Regenerate.

ada: Fix reference to RM clause in comment

gcc/ada/

* sem_util.ads (Check_Function_Writable_Actuals): Fix comment.

ada: Fix missing length checks with case expressions

This fixes an issue where length checks were not generated when the
right-hand side of an assigment involved a case expression.

gcc/ada/

* sem_res.adb (Resolve_Case_Expression): Add length check
insertion.
* exp_ch4.adb (Expand_N_Case_Expression): Add handling of nodes
known to raise Constraint_Error.

ada: Fix standalone Windows builds of adaint.c

Define PATH_SEPARATOR and HOST_EXECUTABLE_SUFFIX in standalone MinGW
builds; the definitions normally come from GCC, and the defaults don't
work for native Windows.

gcc/ada/

* adaint.c: New defines for STANDALONE mode.

ada: Avoid checking parameters of protected procedures

The compiler triggers warnings on generated protected procedures
if the procedure does not have an explicit spec. Instead check
if the body was created for a protected procedure if the spec
is not present.

gcc/ada/

* sem_ch6.adb (Analyze_Subprogram_Body_Helper):
If the spec is not present for a subprogram body then
check if the body definiton was created for a protected
procedure.

ada: Ignore ghost nodes in call graph information for dispatching calls

When emitting call graph information, we already skipped calls to
ignored ghost entities, but this code was causing crashes (in production
builds) and assertion failures (in development builds), because the
ignored ghost entities are not fully decorated, e.g. when they come from
instances of generic units with default subprograms.

With this patch we skip call graph information for ignored ghost
entities when they are registered, both as explicit calls and as
tagged types that will come with internally generated dispatching
subprograms.

gcc/ada/

* exp_cg.adb (Generate_CG_Output): Remove code for ignored ghost
entities that applied to subprogram calls.
(Register_CG_Node): Skip ignored ghost entities, both calls
and tagged types, when they are registered.

ada: Fix reason code for length check

This patch fixes the reason code used by Apply_Selected_Length_Checks,
which was wrong in some cases when the check could be determined to
always fail at compile time.

gcc/ada/

* checks.adb (Apply_Selected_Length_Checks): Fix reason code.

ada: Propagate Program_Error from failed finalization of collection

This aligns finalization collections with finalization masters when it comes
to propagating an exception raised by the finalization of a specific object,
by always propagating Program_Error instead of the aforementioned exception.

gcc/ada/

* libgnat/s-finpri.adb (Raise_From_Controlled_Operation): New
declaration of imported procedure moved from...
(Finalize_Master): ...there.
(Finalize): Call Raise_From_Controlled_Operation instead of
Reraise_Occurrence to propagate the exception, if any.

ada: Improve recovery from illegal occurrence of 'Old in if_expression

Fix assertion failure in developer builds which happened when the THEN
expression contains an illegal occurrence of 'Old and the type of the
THEN expression is left as Any_Type, but there is no ELSE expression.

gcc/ada/

* sem_ch4.adb (Analyze_If_Expression): Add guard for
if_expression without an ELSE part.

ada: No need to follow New_Occurrence_Of with Set_Etype

Routine New_Occurrence_Of itself sets the Etype of its result; there is
no need to set it explicitly afterwards.

Code cleanup related to fix for attribute 'Old; semantics is unaffected.

gcc/ada/

* exp_ch13.adb (Expand_N_Free_Statement): After analysis, the
new temporary has the type of its Object_Definition and the new
occurrence of this temporary has this type as well; simplify.
* sem_util.adb
(Indirect_Temp_Value): Remove redundant call to Set_Etype;
simplify.
(Is_Access_Type_For_Indirect_Temp): Add missing body header.

ada: Fix detection of if_expressions that are known on entry

Fix a small glitch in routine Is_Known_On_Entry, which returned False
for all if_expressions, regardless whether their conditions or dependent
expressions are known on entry.

gcc/ada/

* sem_util.adb (Is_Known_On_Entry): Check whether condition and
dependent expressions of an if_expression are known on entry.

ada: Fix comments about Get_Ranged_Checks

Checks.Get_Ranged_Checks was onced named Range_Check, and a few
comments referred to it by that name before this commit. To avoid
confusion with Types.Range_Check, this commits fixes those comments.

gcc/ada/

* checks.ads: Fix comments.
* checks.adb: Likewise.

ada: Minor performance improvement for dynamically-allocated controlled objects

The values returned by Header_Alignment and Header_Size are known at compile
time and powers of two on almost all platforms, so inlining them by means of
an expression function improves the object code generated for alignment and
size calculations involving them.

gcc/ada/

* libgnat/s-finpri.ads: Add use type clause for Storage_Offset.
(Header_Alignment): Turn into an expression function.
(Header_Size): Likewise.
* libgnat/s-finpri.adb: Remove use type clause for Storage_Offset.
(Header_Alignment): Delete.
(Header_Size): Likewise.

ada: Fixup one more pattern of broken scope information

When an array's initialization contains a `others =>` clause with an
expression that involves finalization, the resulting scope information
is incorrect and can cause crashes with backend (i.e. gnat-llvm) that
also use unnesting. The observable symptom is a nested object
declaration (created by the compiler) within a loop wrapped in a
procedure created by the unnester that has incoherent scope information:
its Scope field points to the scope of the procedure (1 level too high)
and is contained in the entity chain of some entity nested in the
procedure (correct).

The correct solution would be to fix the scope information when it is
created, but this revealed too large of a task with many interaction
with existing code.

This change adds another pattern to the Fixup_Inner_Scopes procedure to
detect the problematic case and fix the scope, "after the facts".

gcc/ada/

* exp_ch7.adb (Unnest_Loop::Fixup_Inner_Scopes): detect a new
problematic pattern and fixup the scope accordingly.

ada: Fix typo in CUDA error message

Fix typo in error message; semantics is unaffected.

gcc/ada/

* gnat_cuda.adb (Remove_CUDA_Device_Entities): Fix typo.

ada: Fix latent alignment issue for dynamically-allocated controlled objects

Dynamically-allocated controlled objects are attached to a finalization
collection by means of a hidden header placed right before the object,
which means that the size effectively allocated must naturally account
for the size of this header. But the allocation must also account for
the alignment of this header in order to have it properly aligned.

gcc/ada/

* libgnat/s-finpri.ads (Header_Alignment): New function.
(Header_Size): Adjust description.
(Master_Node): Put Finalize_Address as first component.
(Collection_Node): Likewise.
* libgnat/s-finpri.adb (Header_Alignment): New function.
(Header_Size): Return the object size in storage units.
* libgnat/s-stposu.ads (Adjust_Controlled_Dereference): Replace
collection node with header in description.
* libgnat/s-stposu.adb (Adjust_Controlled_Dereference): Likewise.
(Allocate_Any_Controlled): Likewise. Pass the maximum of the
specified alignment and that of the header to the allocator.
(Deallocate_Any_Controlled): Likewise to the deallocator.

ada: Fix resolving tagged operations in array aggregates

In the Two_Pass_Aggregate_Expansion we were removing
all of the entity links in the Iterator_Specification
to avoid reusing the same Iterator_Definition in both
loops.

However this approach was also breaking the links to
calls with dot notation that had been transformed to
the regular call notation.

In order to circumvent this, explicitly create new
identifier definitions when copying the
Iterator_Specfications for both of the loops.

gcc/ada/

* exp_aggr.adb (Two_Pass_Aggregate_Expansion):
Explicitly create new Defining_Iterators for both
of the loops.

ada: Fix bogus error on function returning noncontrolling result in private part

This occurs in the additional case of RM 3.9.3(10) in Ada 2012, that is to
say the access controlling result, because the implementation does not use
the same (correct) conditions as in the original case.

This factors out these conditions and uses them in both cases, as well as
adjusts the wording of the message in the first case.

gcc/ada/

* sem_ch6.adb (Check_Private_Overriding): Implement the second part
of RM 3.9.3(10) consistently in both cases.

ada: Fix casing of CUDA in error messages

Error messages now capitalize CUDA.

gcc/ada/

* erroutc.adb (Set_Msg_Insertion_Reserved_Word): Fix casing for
CUDA appearing in error message strings.
(Set_Msg_Str): Likewise for CUDA being a part of a Name_Id.

ada: Fix crash with -gnatdJ and -gnatw_q

This commit makes the emission of -gnatw_q warnings pass node information
so as to handle the enclosing subprogram display of -gnatdJ instead of
crashing.

gcc/ada/

* exp_ch4.adb (Expand_Composite_Equality): Call Error_Msg_N
instead of Error_Msg.

ada: Follow up fixes for Put_Image/streaming regressions

A recent change to reduce duplication of compiler-generated Put_Image and
streaming subprograms introduced some regressions. The fix for one of them
was incomplete.

gcc/ada/

* exp_attr.adb (Build_And_Insert_Type_Attr_Subp): Further tweaking
of the point where a compiler-generated Put_Image or streaming
subprogram is to be inserted in the tree. If one such subprogram
calls another (as is often the case with, for example, Put_Image
procedures for composite type and for a component type thereof),
then we want to avoid use-before-definition problems that can
result from inserting the caller ahead of the callee.

ada: Implement per-finalization-collection spinlocks

This changes the implementation of finalization collections from using the
global task lock to using per-collection spinlocks. Spinlocks are a good
fit in this context because they are very cheap and therefore can be taken
with a fine granularity only around the portions of code implementing the
shuffling of pointers required by attachment and detachment actions.

gcc/ada/

* libgnat/s-finpri.ads (Lock_Type): New modular type.
(Collection_Node): Add Enclosing_Collection component.
(Finalization_Collection): Add Lock component.
* libgnat/s-finpri.adb: Add clauses for System.Atomic_Primitives.
(Attach_Object_To_Collection): Lock and unlock the collection.
Save a pointer to the enclosing collection in the node.
(Detach_Object_From_Collection): Lock and unlock the collection.
(Finalize): Likewise.
(Initialize): Initialize the lock.
(Lock_Collection): New procedure.
(Unlock_Collection): Likewise.

ada: Formal_Derived_Type'Size is not static

In deciding whether a Size attribute reference is static, the compiler could
get confused about whether an implicitly-declared subtype of a generic formal
type is itself a generic formal type, possibly resulting in an assertion
failure and then a bugbox.

gcc/ada/

* sem_attr.adb (Eval_Attribute): Expand existing checks for
generic formal types for which Is_Generic_Type returns False. In
that case, mark the attribute reference as nonstatic.

ada: Fix bug in maintaining dimension info

Copying a node does not automatically propagate its associated dimension
information (if any). This must be done explicitly.

gcc/ada/

* sem_util.adb (Copy_Node_With_Replacement): Add call to
Copy_Dimensions so that any dimension information associated with
the copied node is also associated with the resulting copy.

ada: Remove Aspect_Specifications field from N_Procedure_Specification

Sync Has_Aspect_Specifications_Flag with the actual flags in the AST.
Code cleanup; behavior is unaffected.

gcc/ada/

* gen_il-gen-gen_nodes.adb (N_Procedure_Specification): Remove
Aspect_Specifications field.

ada: Reuse existing expression when rewriting aspects to pragmas

Code cleanup; semantics is unaffected.

gcc/ada/

* sem_ch13.adb (Analyze_Aspect_Specification): Consistently
reuse existing constant where possible.

ada: Cleanup reporting locations for Ada 2022 and GNAT extension aspects

Code cleanup; semantics is unaffected.

gcc/ada/

* sem_ch13.adb (Analyze_Aspect_Specification): Consistently
reuse existing constant where possible.

ada: Fix alphabetic ordering of aspect identifiers

Code cleanup.

gcc/ada/

* aspects.ads (Aspect_Id): Fix ordering.

ada: Fix ordering of code for pragma Preelaborable_Initialization

Code cleanup.

gcc/ada/

* sem_prag.adb (Analyze_Pragma): Move case alternative to match
to alphabetic order.

ada: Fix casing in error messages

Error messages should not start with a capital letter.

gcc/ada/

* gnat_cuda.adb (Remove_CUDA_Device_Entities): Fix casing
(this primarily fixes a style, because the capitalization will
not be preserved by the error-reporting machinery anyway).
* sem_ch13.adb (Analyze_User_Aspect_Aspect_Specification): Fix
casing in error message.

ada: Fix docs and comments about pragmas for Boolean-valued aspects

Fix various inconsistencies in documentation and comments of
Boolean-valued aspects.

gcc/ada/

* doc/gnat_rm/implementation_defined_pragmas.rst: Fix
documentation.
* sem_prag.adb: Fix comments.
* gnat_rm.texi: Regenerate.

diagnostics: use unicode art for interprocedural depth

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c: Update expected
output to use unicode for depth indication.
* gcc.dg/analyzer/out-of-bounds-diagram-1-unicode.c: Likewise.

gcc/ChangeLog:
* text-art/theme.cc (ascii_theme::get_cppchar): Add
cell_kind::INTERPROCEDURAL_*.
(unicode_theme::get_cppchar): Likewise.
* text-art/theme.h (theme::cell_kind): Likewise.
* tree-diagnostic-path.cc:
(thread_event_printer::print_swimlane_for_event_range): Use the
above to get characters for indicating interprocedural stack
depth activity, falling back to ascii.
(selftest::test_interprocedural_path_1): Test with both ascii
and unicode themes.
(selftest::test_interprocedural_path_2): Likewise.
(selftest::test_recursion): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: add warning emoji to events with VERB_danger

Tweak the printing of -fdiagnostics-path-format=inline-events so that
any event with diagnostic_event::VERB_danger gains a warning emoji,
provided that the text art theme enables emoji support.

VERB_danger is set by the analyzer on the last event in a path, and so
this emoji appears at the end of all analyzer execution paths
highlighting the location of the problem.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c: Update expected
output to include warning emoji.
* gcc.dg/analyzer/warning-emoji.c: New test.

gcc/ChangeLog:
* tree-diagnostic-path.cc: Include "text-art/theme.h".
(path_label::get_text): If the event has
diagnostic_event::VERB_danger, and the theme enables emojis, then
add a warning emoji between the event number and the event text.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: simplify output of purely intraprocedural execution paths

Diagnostic path printing was added in r10-5901-g4bc1899b2e883f. As of
that commit, with -fdiagnostics-path-format=inline-events (the default),
we print a vertical line to the left of the source line numbering,
visualizing the stack depth and interprocedural calls and returns as
indentation changes.

For cases where the events on a thread are purely interprocedural, this
line does nothing except take up space and complicate the output.

This patch adds logic to omit it for such cases, simpifying the output,
and, I believe, improving readability.

gcc/ChangeLog:
* diagnostic-path.h: Update leading comment to reflect
intraprocedural cases. Fix typo in comment.
* doc/invoke.texi: Update intraprocedural example.

gcc/testsuite/ChangeLog:
* c-c++-common/analyzer/allocation-size-multiline-1.c: Update
expected results for purely intraprocedural path.
* c-c++-common/analyzer/allocation-size-multiline-2.c: Likewise.
* c-c++-common/analyzer/allocation-size-multiline-3.c: Likewise.
* c-c++-common/analyzer/analyzer-verbosity-0.c: Likewise.
* c-c++-common/analyzer/analyzer-verbosity-1.c: Likewise.
* c-c++-common/analyzer/analyzer-verbosity-2.c: Likewise.
* c-c++-common/analyzer/analyzer-verbosity-3.c: Likewise.
* c-c++-common/analyzer/malloc-macro-inline-events.c: Likewise.
Doing so for this file requires a rewrite since the paths
prefixing the "in expansion of macro" lines become the only thing
on their line and so are no longer pruned by multiline.exp logic
for pruning extra content on non-blank lines.
* c-c++-common/analyzer/malloc-paths-9-noexcept.c: Likewise.
* c-c++-common/analyzer/setjmp-2.c: Likewise.
* gcc.dg/analyzer/malloc-paths-9.c: Likewise.
* gcc.dg/analyzer/out-of-bounds-multiline-2.c: Likewise.
* gcc.dg/plugin/diagnostic-test-paths-2.c: Likewise.

gcc/ChangeLog:
* tree-diagnostic-path.cc (per_thread_summary::interprocedural_p):
New.
(thread_event_printer::print_swimlane_for_event_range): Don't
indent and print the stack depth line if this thread's events are
purely intraprocedural.
(selftest::test_intraprocedural_path): Update expected output.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: handle SGR codes in line_label::m_display_width

gcc/ChangeLog:
* diagnostic-show-locus.cc: Define INCLUDE_VECTOR and include
"text-art/types.h".
(line_label::line_label): Drop "policy" argument. Use
styled_string::calc_canvas_width when computing m_display_width,
as this skips SGR codes.
(layout::print_any_labels): Update for line_label ctor change.
(selftest::test_one_liner_labels_utf8): Update expected text to
reflect that the labels can fit on one line if we don't get
confused by SGR colorization codes.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

RISC-V: Add Zvfbfwma extension to the -march= option

This patch would like to add new sub extension (aka Zvfbfwma) to the
-march= option. It introduces a new data type BF16.

1 In spec: "Zvfbfwma requires the Zvfbfmin extension and the Zfbfmin extension."
  1.1 In Embedded    Processor: Zvfbfwma -> Zvfbfmin -> Zve32f
  1.2 In Application Processor: Zvfbfwma -> Zvfbfmin -> V
  1.3 In both scenarios, there are: Zvfbfwma -> Zfbfmin

2 Zvfbfmin's information is in:
<https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1ddf65c5fc6ba7cf5826e1c02c569c923a541c09>

3 Zfbfmin's formation is in:
<https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=35224ead63732a3550ba4b1332c06e9dc7999c31>

4 Depending on different usage scenarios, the Zvfbfwma extension may
depend on 'V' or 'Zve32f'. This patch only implements dependencies in
scenario of Embedded Processor. This is consistent with the processing
strategy in Zvfbfmin. In scenario of Application Processor, it is
necessary to explicitly indicate the dependent 'V' extension.

5 You can locate more information about Zvfbfwma from below spec doc:
<https://github.com/riscv/riscv-bfloat16/releases/download/v59042fc71c31a9bcb2f1957621c960ed36fac401/riscv-bfloat16.pdf>

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
(riscv_implied_info): Add zvfbfwma item.
(riscv_ext_version_table): Ditto.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv.opt:
(MASK_ZVFBFWMA): New macro.
(TARGET_ZVFBFWMA): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-37.c: New test.
* gcc.target/riscv/arch-38.c: New test.
* gcc.target/riscv/predef-36.c: New test.
* gcc.target/riscv/predef-37.c: New test.

Set d.one_operand_p to true when TARGET_SSSE3 in ix86_expand_vecop_qihi_partial.

pshufb is available under TARGET_SSSE3, so
ix86_expand_vec_perm_const_1 must return true when TARGET_SSSE3.

With the patch under -march=x86-64-v2

v8qi
foo (v8qi a)
{
return a >> 5;
}

< pmovsxbw %xmm0, %xmm0
< psraw $5, %xmm0
< pshufb .LC0(%rip), %xmm0

vs.

> movdqa %xmm0, %xmm1
> pcmpeqd %xmm0, %xmm0
> pmovsxbw %xmm1, %xmm1
> psrlw $8, %xmm0
> psraw $5, %xmm1
> pand %xmm1, %xmm0
> packuswb %xmm0, %xmm0

Although there's a memory load from constant pool, but it should be
better when it's inside a loop. The load from constant pool can be
hoist out. it's 1 instruction vs 4 instructions.

< pshufb .LC0(%rip), %xmm0

vs.

> pcmpeqd %xmm0, %xmm0
> psrlw $8, %xmm0
> pand %xmm1, %xmm0
> packuswb %xmm0, %xmm0

gcc/ChangeLog:

PR target/114514
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial):
Set d.one_operand_p to true when TARGET_SSSE3.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114514-shufb.c: New test.

Optimize ashift >> 7 to vpcmpgtb for vector int8.

Since there is no corresponding instruction, the shift operation for
vector int8 is implemented using the instructions for vector int16,
but for some special shift counts, it can be transformed into vpcmpgtb.

gcc/ChangeLog:

PR target/114514
* config/i386/i386-expand.cc
(ix86_expand_vec_shift_qihi_constant): Optimize ashift >> 7 to
vpcmpgtb.
(ix86_expand_vecop_qihi_partial): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114514-shift.c: New test.

Daily bump.

Add missing hunk in recent change.

gcc/
* config/riscv/riscv-string.cc: Add missing hunk from last change.

analyzer: fix ICE seen with -fsanitize=undefined [PR114899]

gcc/analyzer/ChangeLog:
PR analyzer/114899
* access-diagram.cc
(written_svalue_spatial_item::get_label_string): Bulletproof
against SSA_NAME_VAR being null.

gcc/testsuite/ChangeLog:
PR analyzer/114899
* c-c++-common/analyzer/out-of-bounds-diagram-pr114899.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

[v2,2/2] RISC-V: strcmp expansion: Use adjust_address() for address calculation

We have an arch-independent routine to generate an address with an offset.
Let's use that instead of doing the calculation in the backend.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (emit_strcmp_scalar_load_and_compare):
Use adjust_address() to calculate MEM-PLUS pattern.

[v2,1/2] RISC-V: Add cmpmemsi expansion

GCC has a generic cmpmemsi expansion via the by-pieces framework,
which shows some room for target-specific optimizations.
E.g. for comparing two aligned memory blocks of 15 bytes
we get the following sequence:

my_mem_cmp_aligned_15:
        li      a4,0
        j       .L2
.L8:
        bgeu    a4,a7,.L7
.L2:
        add     a2,a0,a4
        add     a3,a1,a4
        lbu     a5,0(a2)
        lbu     a6,0(a3)
        addi    a4,a4,1
        li      a7,15    // missed hoisting
        subw    a5,a5,a6
        andi    a5,a5,0xff // useless
        beq     a5,zero,.L8
        lbu     a0,0(a2) // loading again!
        lbu     a5,0(a3) // loading again!
        subw    a0,a0,a5
        ret
.L7:
        li      a0,0
        ret

Diff first byte: 15 insns
Diff second byte: 25 insns
No diff: 25 insns

Possible improvements:
* unroll the loop and use load-with-displacement to avoid offset increments
* load and compare multiple (aligned) bytes at once
* Use the bitmanip/strcmp result calculation (reverse words and
  synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence)

When applying these improvements we get the following sequence:

my_mem_cmp_aligned_15:
        ld      a5,0(a0)
        ld      a4,0(a1)
        bne     a5,a4,.L2
        ld      a5,8(a0)
        ld      a4,8(a1)
        slli    a5,a5,8
        slli    a4,a4,8
        bne     a5,a4,.L2
        li      a0,0
.L3:
        sext.w  a0,a0
        ret
.L2:
        rev8    a5,a5
        rev8    a4,a4
        sltu    a5,a5,a4
        neg     a5,a5
        ori     a0,a5,1
        j       .L3

Diff first byte: 11 insns
Diff second byte: 16 insns
No diff: 11 insns

This patch implements this improvements.

The tests consist of a execution test (similar to
gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c) and a few tests
that test the expansion conditions (known length and alignment).

Similar to the cpymemsi expansion this patch does not introduce any
gating for the cmpmemsi expansion (on top of requiring the known length,
alignment and Zbb).

Bootstrapped and SPEC CPU 2017 tested.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_block_compare): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER2): New helper
for zero_extendhi.
(do_load_from_addr): Add support for HI and SI/64 modes.
(do_load): Add helper for zero-extended loads.
(emit_memcmp_scalar_load_and_compare): New helper to emit memcmp.
(emit_memcmp_scalar_result_calculation): Likewise.
(riscv_expand_block_compare_scalar): Likewise.
(riscv_expand_block_compare): New RISC-V expander for memory compare.
* config/riscv/riscv.md (cmpmemsi): New cmpmem expansion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmpmemsi-1.c: New test.
* gcc.target/riscv/cmpmemsi-2.c: New test.
* gcc.target/riscv/cmpmemsi-3.c: New test.
* gcc.target/riscv/cmpmemsi.c: New test.

c++: ICE with reference NSDMI [PR114854]

Here we crash on a cp_gimplify_expr/TARGET_EXPR assert:

      /* A TARGET_EXPR that expresses direct-initialization should have been
         elided by cp_gimplify_init_expr.  */
      gcc_checking_assert (!TARGET_EXPR_DIRECT_INIT_P (*expr_p));

the TARGET_EXPR in question is created for the NSDMI in:

  class Vector { int m_size; };
  struct S {
    const Vector &vec{};
  };

where we first need to create a Vector{} temporary, and then bind the
vec reference to it.  The temporary is represented by a TARGET_EXPR
and it cannot be elided.  When we create an object of type S, we get

  D.2848 = {.vec=(const struct Vector &) &TARGET_EXPR <D.2840, {.m_size=0}>}

where the TARGET_EXPR is no longer direct-initializing anything.

Fixed by not setting TARGET_EXPR_DIRECT_INIT_P in convert_like_internal/ck_user.

PR c++/114854

gcc/cp/ChangeLog:

* call.cc (convert_like_internal) <case ck_user>: Don't set
TARGET_EXPR_DIRECT_INIT_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/nsdmi-aggr22.C: New test.

c++: DR 569, DR 1693: fun with semicolons [PR113760]

Prompted by c++/113760, I started looking into a bogus "extra ;"
warning in C++11.  It quickly turned out that if I want to fix
this for good, the fix will not be so small.

This patch touches on DR 569, an extra ; at namespace scope should
be allowed since C++11:

  struct S {
  };
  ; // pedwarn in C++98

It also touches on DR 1693, which allows superfluous semicolons in
class definitions since C++11:

  struct S {
    int a;
    ; // pedwarn in C++98
  };

Note that a single semicolon is valid after a member function definition:

  struct S {
    void foo () {}; // only warns with -Wextra-semi
  };

There's a new function maybe_warn_extra_semi to handle all of the above
in a single place.  So now they all get a fix-it hint.

-Wextra-semi turns on all "extra ;" diagnostics.  Currently, options
like -Wc++11-compat or -Wc++11-extensions are not considered.

DR 1693
PR c++/113760
DR 569

gcc/c-family/ChangeLog:

* c.opt (Wextra-semi): Initialize to -1.

gcc/cp/ChangeLog:

* parser.cc (extra_semi_kind): New.
(maybe_warn_extra_semi): New.
(cp_parser_declaration): Call maybe_warn_extra_semi.
(cp_parser_member_declaration): Likewise.

gcc/ChangeLog:

* doc/invoke.texi: Update -Wextra-semi documentation.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/semicolon1.C: New test.
* g++.dg/diagnostic/semicolon10.C: New test.
* g++.dg/diagnostic/semicolon11.C: New test.
* g++.dg/diagnostic/semicolon12.C: New test.
* g++.dg/diagnostic/semicolon13.C: New test.
* g++.dg/diagnostic/semicolon14.C: New test.
* g++.dg/diagnostic/semicolon15.C: New test.
* g++.dg/diagnostic/semicolon16.C: New test.
* g++.dg/diagnostic/semicolon17.C: New test.
* g++.dg/diagnostic/semicolon2.C: New test.
* g++.dg/diagnostic/semicolon3.C: New test.
* g++.dg/diagnostic/semicolon4.C: New test.
* g++.dg/diagnostic/semicolon5.C: New test.
* g++.dg/diagnostic/semicolon6.C: New test.
* g++.dg/diagnostic/semicolon7.C: New test.
* g++.dg/diagnostic/semicolon8.C: New test.
* g++.dg/diagnostic/semicolon9.C: New test.

c++: Optimize in maybe_clone_body aliases even when not at_eof [PR113208]

This patch reworks the cdtor alias optimization, such that we can create
aliases even when maybe_clone_body is called not at at_eof time, without trying
to repeat it in maybe_optimize_cdtor.

2024-05-15 Jakub Jelinek <jakub@redhat.com>
Jason Merrill <jason@redhat.com>

PR lto/113208
* cp-tree.h (maybe_optimize_cdtor): Remove.
* decl2.cc (tentative_decl_linkage): Call maybe_make_one_only
for implicit instantiations of maybe in charge ctors/dtors
declared inline.
(import_export_decl): Don't call maybe_optimize_cdtor.
(c_parse_final_cleanups): Formatting fixes.
* optimize.cc (can_alias_cdtor): Adjust condition, for
HAVE_COMDAT_GROUP && DECL_ONE_ONLY && DECL_WEAK return true even
if not DECL_INTERFACE_KNOWN.
(maybe_clone_body): Don't clear DECL_SAVED_TREE, instead set it
to void_node.
(maybe_clone_body): Remove.
* decl.cc (cxx_comdat_group): For DECL_CLONED_FUNCTION_P
functions if SUPPORTS_ONE_ONLY return DECL_COMDAT_GROUP if already
set.

* g++.dg/abi/comdat3.C: New test.
* g++.dg/abi/comdat4.C: New test.

combine: Fix up simplify_compare_const [PR115092]

The following testcases are miscompiled (with tons of GIMPLE
optimization disabled) because combine sees GE comparison of
1-bit sign_extract (i.e. something with [-1, 0] value range)
with (const_int -1) (which is always true) and optimizes it into
NE comparison of 1-bit zero_extract ([0, 1] value range) against
(const_int 0).
The reason is that simplify_compare_const first (correctly)
simplifies the comparison to
GE (ashift:SI something (const_int 31)) (const_int -2147483648)
and then an optimization for when the second operand is power of 2
triggers. That optimization is fine for power of 2s which aren't
the signed minimum of the mode, or if it is NE, EQ, GEU or LTU
against the signed minimum of the mode, but for GE or LT optimizing
it into NE (or EQ) against const0_rtx is wrong, those cases
are always true or always false (but the function doesn't have
a standardized way to tell callers the comparison is now unconditional).

The following patch just disables the optimization in that case.

2024-05-15 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/114902
PR rtl-optimization/115092
* combine.cc (simplify_compare_const): Don't optimize
GE op0 SIGNED_MIN or LT op0 SIGNED_MIN into NE op0 const0_rtx or
EQ op0 const0_rtx.

* gcc.dg/pr114902.c: New test.
* gcc.dg/pr115092.c: New test.

openmp: Diagnose using grainsize+num_tasks clauses together [PR115103]

I've noticed that while we diagnose many other OpenMP exclusive clauses,
we don't diagnose grainsize together with num_tasks on taskloop construct
in all of C, C++ and Fortran (the implementation simply ignored grainsize
in that case) and for Fortran also don't diagnose mixing nogroup clause
with reduction clause(s).

Fixed thusly.

2024-05-15 Jakub Jelinek <jakub@redhat.com>

PR c/115103
gcc/c/
* c-typeck.cc (c_finish_omp_clauses): Diagnose grainsize
used together with num_tasks.
gcc/cp/
* semantics.cc (finish_omp_clauses): Diagnose grainsize
used together with num_tasks.
gcc/fortran/
* openmp.cc (resolve_omp_clauses): Diagnose grainsize
used together with num_tasks or nogroup used together with
reduction.
gcc/testsuite/
* c-c++-common/gomp/clause-dups-1.c: Add 2 further expected errors.
* gfortran.dg/gomp/pr115103.f90: New test.

tree-optimization/114589 - remove profile based sink heuristics

The following removes the profile based heuristic limiting sinking
and instead uses post-dominators to avoid sinking to places that
are executed under the same conditions as the earlier location which
the profile based heuristic should have guaranteed as well.

To avoid regressing this moves the empty-latch check to cover all
sink cases.

It also stream-lines the resulting select_best_block a bit but avoids
adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
starts execute failing with this on x86_64 with -m32 because the
(float)i * 9.9999...e-7 compute is sunk across a STOP causing it
to be no longer spilled and thus the compare failing due to excess
precision.  The patch adds -ffloat-store to avoid this, following
other similar testcases.

This change fixes the testcase in the PR only when using -fno-ivopts
as otherwise VRP is confused.

PR tree-optimization/114589
* tree-ssa-sink.cc (select_best_block): Remove profile-based
heuristics.  Instead reject sink locations that sink
to post-dominators.  Move empty latch check here from
statement_sink_location.  Also consider early_bb for the
loop depth check.
(statement_sink_location): Remove superfluous check.  Remove
empty latch check.
(pass_sink_code::execute): Compute/release post-dominators.

* gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
excess precision when not spilling.
* g++.dg/tree-ssa/pr114589.C: New testcase.

middle-end/111422 - wrong stack var coalescing, handle PHIs

The gcc.c-torture/execute/pr111422.c testcase after installing the
sink pass improvement reveals that we also need to handle

_65 = &g + _58; _44 = &g + _43;
# _59 = PHI <_65, _44>
*_59 = 8;
g = {v} {CLOBBER(eos)};
...
n[0] = &f;
*_59 = 8;
g = {v} {CLOBBER(eos)};

where we fail to see the conflict between n and g after the first
clobber of g. Before the sinking improvement there was a conflict
recorded on a path where _65/_44 are unused, so the real conflict
was missed but the fake one avoided the miscompile.

The following handles PHI defs in add_scope_conflicts_2 which
fixes the issue.

PR middle-end/111422
* cfgexpand.cc (add_scope_conflicts_2): Handle PHIs
by recursing to their arguments.

PR modula2/115057 TextIO.ReadRestLine raises an exception when buffer is exceeded

TextIO.ReadRestLine will raise an "attempting to read beyond end of file"
exception if the buffer is exceeded.  This bug is caused by the
TextIO.ReadRestLine calling IOChan.Skip without a preceeding IOChan.Look.
The Look procedure will update the status result whereas
Skip always sets read result to allRight.

gcc/m2/ChangeLog:

PR modula2/115057
* gm2-libs-iso/TextIO.mod (ReadRestLine): Use ReadChar to
skip unwanted characters as this calls IOChan.Look and updates
the cid result status.  A Skip without a Look does not update
the status.  Skip always sets read result to allRight.
* gm2-libs-iso/TextUtil.def (SkipSpaces): Improve comments.
(CharAvailable): Improve comments.
* gm2-libs-iso/TextUtil.mod (SkipSpaces): Improve comments.
(CharAvailable): Improve comments.

gcc/testsuite/ChangeLog:

PR modula2/115057
* gm2/isolib/run/pass/testrestline.mod: New test.
* gm2/isolib/run/pass/testrestline2.mod: New test.
* gm2/isolib/run/pass/testrestline3.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

c++: add test for DR 2855

Let

  int8_t x = 127;

This DR says that while

  x++;

invokes UB,

  ++x;

does not.  The resolution was to make the first one valid.  The
following test verifies that we don't report any errors in a constexpr
context.

DR 2855

gcc/testsuite/ChangeLog:

* g++.dg/DRs/dr2855.C: New test.

RISC-V: Test cbo.zero expansion for rv32

We had an issue when expanding via cmo-zero for RV32.
This was fixed upstream, but we don't have a RV32 test.
Therefore, this patch introduces such a test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Fix for rv32.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

AArch64: Use UZP1 instead of INS

Use UZP1 instead of INS when combining low and high halves of vectors.
UZP1 has 3 operands which improves register allocation, and is faster on
some microarchitectures.

gcc:
* config/aarch64/aarch64-simd.md (aarch64_combine_internal<mode>):
Use UZP1 instead of INS.
(aarch64_combine_internal_be<mode>): Likewise.

gcc/testsuite:
* gcc.target/aarch64/ldp_stp_16.c: Update to check for UZP1.
* gcc.target/aarch64/pr109072_1.c: Likewise.
* gcc.target/aarch64/vec-init-14.c: Likewise.
* gcc.target/aarch64/vec-init-9.c: Likewise.

Avoid pointer compares on TYPE_MAIN_VARIANT in TBAA

while building more testcases for ipa-icf I noticed that there are two places
in aliasing code where we still compare TYPE_MAIN_VARIANT for pointer equality.
This is not good idea for LTO since type merging may not happen for example
when in one unit pointed to type is forward declared while in other it is fully
defined. We have same_type_for_tbaa for that.

Bootstrapped/regtested x86_64-linux, OK?

gcc/ChangeLog:

* alias.cc (reference_alias_ptr_type_1): Use view_converted_memref_p.
* alias.h (view_converted_memref_p): Declare.
* tree-ssa-alias.cc (view_converted_memref_p): Export.
(ao_compare::compare_ao_refs): Use same_type_for_tbaa.

testsuite: Require lto-plugin in gcc.dg/ipa/ipa-icf-38.c [PR85656]

gcc.dg/ipa/ipa-icf-38.c currently FAILs on Solaris (SPARC and x86, 32
and 64-bit):

FAIL: gcc.dg/ipa/ipa-icf-38.c scan-ltrans-tree-dump-not optimized "Function bar"

As it turns out, this only happens when the Solaris linker is used; with
GNU ld the test PASSes just fine. In fact, that happens because gld
supports the lto-plugin while ld does not: in a Solaris build with gld,
the test FAILs the same way as with ld when -fno-use-linker-plugin is
passed, so this patch requires linker_plugin.

Tested on i386-pc-solaris2.11 (ld and gld) and x86_64-pc-linux-gnu.

2024-05-15 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
PR ipa/85656
* gcc.dg/ipa/ipa-icf-38.c: Require linker_plugin.

libstdc++: Fix data race in std::basic_ios::fill() [PR77704]

The lazy caching in std::basic_ios::fill() updates a mutable member
without synchronization, which can cause a data race if two threads both
call fill() on the same stream object when _M_fill_init is false.

To avoid this we can just cache the _M_fill member and set _M_fill_init
early in std::basic_ios::init, instead of doing it lazily. As explained
by the comment in init, there's a good reason for doing it lazily. When
char_type is neither char nor wchar_t, the locale might not have a
std::ctype<char_type> facet, so getting the fill character would throw
an exception. The current lazy init allows using unformatted I/O with
such a stream, because the fill character is never needed and so it
doesn't matter if the locale doesn't have a ctype<char_type> facet. We
can maintain this property by only setting the fill character in
std::basic_ios::init if the ctype facet is present at that time. If
fill() is called later and the fill character wasn't set by init, we can
get it from the stream's current locale at the point when fill() is
called (and not try to cache it without synchronization). If the stream
hasn't been imbued with a locale that includes the facet when we need
the fill() character, then throw bad_cast at that point.

This causes a change in behaviour for the following program:

  std::ostringstream out;
  out.imbue(loc);
  auto fill = out.fill();

Previously the fill character would have been set when fill() is called,
and so would have used the new locale. This commit changes it so that
the fill character is set on construction and isn't affected by the new
locale being imbued later. This new behaviour seems to be what the
standard requires, and matches MSVC.

The new 27_io/basic_ios/fill/char/fill.cc test verifies that it's still
possible to use a std::basic_ios without the ctype<char_type> facet
being present at construction.

libstdc++-v3/ChangeLog:

PR libstdc++/77704
* include/bits/basic_ios.h (basic_ios::fill()): Do not modify
_M_fill and _M_fill_init in a const member function.
(basic_ios::fill(char_type)): Use _M_fill directly instead of
calling fill(). Set _M_fill_init to true.
* include/bits/basic_ios.tcc (basic_ios::init): Set _M_fill and
_M_fill_init here instead.
* testsuite/27_io/basic_ios/fill/char/1.cc: New test.
* testsuite/27_io/basic_ios/fill/wchar_t/1.cc: New test.

testsuite: i386: Fix g++.target/i386/pr97054.C on Solaris

g++.target/i386/pr97054.C currently FAILs on 64-bit Solaris/x86:

FAIL: g++.target/i386/pr97054.C  -std=gnu++14 (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C  -std=gnu++14 compilation failed to produce executable
FAIL: g++.target/i386/pr97054.C  -std=gnu++17 (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C  -std=gnu++17 compilation failed to produce executable
FAIL: g++.target/i386/pr97054.C  -std=gnu++2a (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C  -std=gnu++2a compilation failed to produce executable
FAIL: g++.target/i386/pr97054.C  -std=gnu++98 (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C  -std=gnu++98 compilation failed to produce executable

Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/g++.target/i386/pr97054.C:49:20: error: frame pointer required, but reserved

Since Solaris/x86 defaults to -fno-omit-frame-pointer, this patch
explicitly builds with -fomit-frame-pointer as is the default on other
x86 targets.

Tested on i386-pc-solaris2.11 (32 and 64-bit) and x86_64-pc-linux-gnu.

2024-05-15  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
* g++.target/i386/pr97054.C (dg-options): Add -fomit-frame-pointer.

RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight

The current implementation of riscv_block_move_straight() emits a couple
of loads/stores with with maximum width (e.g. 8-byte for RV64).
The remainder is handed over to move_by_pieces().
The by-pieces framework utilizes target hooks to decide about the emitted
instructions (e.g. unaligned accesses or overlapping accesses).

Since the current implementation will always request less than XLEN bytes
to be handled by the by-pieces infrastructure, it is impossible that
overlapping memory accesses can ever be emitted (the by-pieces code does
not know of any previous instructions that were emitted by the backend).

This patch changes the implementation of riscv_block_move_straight()
such, that it utilizes the by-pieces framework if the remaining data
is less than 2*XLEN bytes, which is sufficient to enable overlapping
memory accesses (if the requirements for them are given).

The changes in the expansion can be seen in the adjustments of the
cpymem-NN-ooo test cases. The changes in the cpymem-NN tests are
caused by the different instruction ordering of the code emitted
by the by-pieces infrastructure, which emits alternating load/store
sequences.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (riscv_block_move_straight):
Hand over up to 2xXLEN bytes to move_by_pieces().

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-32.c: Adjustments for code emitted by
by-pieces.
* gcc.target/riscv/cpymem-64-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-64.c: Adjustments for code emitted by
by-pieces.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

RISC-V: add tests for overlapping mem ops

A recent patch added the field overlap_op_by_pieces to the struct
riscv_tune_param, which is used by the TARGET_OVERLAP_OP_BY_PIECES_P()
hook. This hook is used by the by-pieces infrastructure to decide
if overlapping memory accesses should be emitted.

The changes in the expansion can be seen in the adjustments of the
cpymem test cases. These tests also reveal a limitation in the
RISC-V cpymem expansion that prevents this optimization as only
by-pieces cpymem expansions emit overlapping memory accesses.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjust for overlapping
access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

RISC-V: Allow unaligned accesses in cpymemsi expansion

The RISC-V cpymemsi expansion is called, whenever the by-pieces
infrastructure will not take care of the builtin expansion.
The code emitted by the by-pieces infrastructure may emits code,
that includes unaligned accesses if riscv_slow_unaligned_access_p
is false.

The RISC-V cpymemsi expansion is handled via riscv_expand_block_move().
The current implementation of this function does not check
riscv_slow_unaligned_access_p and never emits unaligned accesses.

Since by-pieces emits unaligned accesses, it is reasonable to implement
the same behaviour in the cpymemsi expansion. And that's what this patch
is doing.

The patch checks riscv_slow_unaligned_access_p at the entry and sets
the allowed alignment accordingly. This alignment is then propagated
down to the routines that emit the actual instructions.

The changes introduced by this patch can be seen in the adjustments
of the cpymem tests.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (riscv_block_move_straight): Add
parameter align.
(riscv_adjust_block_mem): Replace parameter length by align.
(riscv_block_move_loop): Add parameter align.
(riscv_expand_block_move_scalar): Set alignment properly if the
target has fast unaligned access.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjust for unaligned access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

RISC-V: Add test cases for cpymem expansion

We have two mechanisms in the RISC-V backend that expand
cpymem pattern: a) by-pieces, b) riscv_expand_block_move()
in riscv-string.cc. The by-pieces framework has higher priority
and emits a sequence of up to 15 instructions
(see use_by_pieces_infrastructure_p() for more details).

As a rule-of-thumb, by-pieces emits alternating load/store sequences
and the setmem expansion in the backend emits a sequence of loads
followed by a sequence of stores.

Let's add some test cases to document the current behaviour
and to have tests to identify regressions.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: New test.
* gcc.target/riscv/cpymem-32.c: New test.
* gcc.target/riscv/cpymem-64-ooo.c: New test.
* gcc.target/riscv/cpymem-64.c: New test.

[prange] Default pointers_handled_p() to false.

The pointers_handled_p() method is an internal range-op helper to help
catch dispatch type mismatches for pointer operands.  This is what
caught the IPA mismatch in PR114985.

This method is only a temporary measure to catch any incompatibilities
in the current pointer range-op entries.  This patch returns true for
any *new* entries in the range-op table, as the current ones are
already fleshed out.  This keeps us from having to implement this
boilerplate function for any new range-op entries.

PR tree-optimization/114995
* range-op-ptr.cc (range_operator::pointers_handled_p): Default to true.

libstdc++: Rewrite std::variant comparisons without macros

libstdc++-v3/ChangeLog:

* include/std/variant (__detail::__variant::__compare): New
function template.
(operator==, operator!=, operator<, operator>, operator<=)
(operator>=): Replace macro definition with handwritten function
calling __detail::__variant::__compare.
(operator<=>): Call __detail::__variant::__compare.

libstdc++: Give std::memory_order a fixed underlying type [PR89624]

Prior to C++20 this enum type doesn't have a fixed underlying type,
which means it can be modified by -fshort-enums, which then means the
HLE bits are outside the range of valid values for the type.

As it has a fixed type of int in C++20 and later, do the same for
earlier standards too. This is technically a change for C++17 down,
because the implicit underlying type (without -fshort-enums) was
unsigned before. I doubt it matters in practice. That incompatibility
already exists between C++17 and C++20 and nobody has noticed or
complained. Now at least the underlying type will be int for all -std
modes.

libstdc++-v3/ChangeLog:

PR libstdc++/89624
* include/bits/atomic_base.h (memory_order): Use int as
underlying type.
* testsuite/29_atomics/atomic/89624.cc: New test.

tree-cfg: Move the returns_twice check to be last statement only [PR114301]

When I was checking to making sure that all of the bugs dealing
with the case where gimple_can_duplicate_bb_p would return false was fixed,
I noticed that the code which was checking if a call statement was
returns_twice was checking all call statements rather than just the
last statement. Since calling gimple_call_flags has a small non-zero
overhead due to a few string comparison, removing the uses of it
can have a small performance improvement. In the case of returns_twice
functions calls, will always end the basic-block due to the check in
stmt_can_terminate_bb_p (and others). So checking only the last statement
is a small optimization and will be safe.

Bootstrapped and tested pon x86_64-linux-gnu with no regressions.

PR tree-optimization/114301
gcc/ChangeLog:

* tree-cfg.cc (gimple_can_duplicate_bb_p): Check returns_twice
only on the last call statement rather than all.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

[committed] Fix rv32 issues with recent zicboz work

I should have double-checked the CI system before pushing Christoph's patches
for memset-zero.  While I thought I'd checked CI state, I must have been
looking at the wrong patch from Christoph.

Anyway, this fixes the rv32 ICEs and disables one of the tests for rv32.

The test would need a revamp for rv32 as the expected output is all rv64 code
using "sd" instructions.  I'm just not vested deeply enough into rv32 to adjust
the test to work in that environment though it should be fairly trivial to copy
the test and provide new expected output if someone cares enough.

Verified this fixes the rv32 failures in my tester:
> New tests that FAIL (6 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  (internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  (test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  (internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  (test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  (internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  (test for excess errors)

And after the ICE is fixed, these are eliminated by only running the test for
rv64:

> New tests that FAIL (3 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1   check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2   check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g   check-function-bodies clear_buf_123

gcc/
* config/riscv/riscv-string.cc
(riscv_expand_block_clear_zicboz_zic64b): Handle rv32 correctly.

gcc/testsuite

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Don't run on rv32.

x86: Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

Hi All

We've introduced a new subroutine in ix86_expand_vec_perm_const_1
to optimize vector shifting for the V16QI type on x86.
This patch uses a three-instruction sequence psrlw, psllw, and por
to handle specific vector shuffle operations more efficiently.
The change aims to improve assembly code generation for configurations
supporting SSE2.

Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?

Best
Levy

gcc/ChangeLog:

PR target/107563
* config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
subroutine.
(ix86_expand_vec_perm_const_1): Call expand_vec_perm_psrlw_psllw_por.

gcc/testsuite/ChangeLog:

PR target/107563
* g++.target/i386/pr107563-a.C: New test.
* g++.target/i386/pr107563-b.C: New test.

c++: lvalueness of non-dependent assignment expr [PR114994]

r14-4111-g6e92a6a2a72d3b made us check non-dependent simple assignment
expressions ahead of time and give them a type, as was already done for
compound assignments. Unlike for compound assignments however, if a
simple assignment resolves to an operator overload we represent it as a
(typed) MODOP_EXPR instead of a CALL_EXPR to the selected overload.
(I reckoned this was at worst a pessimization -- we'll just have to repeat
overload resolution at instantiatiation time.)

But this turns out to break the below testcase ultimately because
MODOP_EXPR (of non-reference type) is always treated as an lvalue
according to lvalue_kind, which is incorrect for the MODOP_EXPR
representing x=42.

We can fix this by representing such class assignment expressions as
CALL_EXPRs as well, but this turns out to require some tweaking of our
-Wparentheses warning logic and may introduce other fallout making it
unsuitable for backporting.

So this patch instead fixes lvalue_kind to consider the type of a
MODOP_EXPR representing a class assignment.

PR c++/114994

gcc/cp/ChangeLog:

* tree.cc (lvalue_kind) <case MODOP_EXPR>: For a class
assignment, consider the result type.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent32.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

[to-be-committed,RISC-V] Remove redundant AND in shift-add sequence

So this patch allows us to eliminate an redundant AND in some shift-add
style sequences.   I think the testcase was reduced from xz by the RAU
team, but I'm not highly confident of that.

Specifically the AND is masking off the upper 32 bits of the un-shifted
value and there's an outer SIGN_EXTEND from SI to DI.  However in the
RTL it's working on the post-shifted value, so the constant is left
shifted, so we have to account for that in the pattern's condition.

We can just drop the AND in this case.  So instead we do a 64bit shift,
then a sign extending ADD utilizing the low part of that 64bit shift result.

This has run through Ventana's CI as well as my own.  I'll wait for it
to run through the larger CI system before pushing.

Jeff

gcc/
* config/riscv/riscv.md: Add pattern for sign extended shift-add
sequence with a masked input.

gcc/testsuite

* gcc.target/riscv/shift-add-2.c: New test.

Daily bump.

c++: ICE in build_deduction_guide for invalid template [PR105760]

We currently ICE upon the following invalid snippet because we fail to
properly handle tsubst_arg_types returning error_mark_node in
build_deduction_guide.

== cut ==
template<class... Ts, class>
struct A { A(Ts...); };
A a;
== cut ==

This patch fixes this, and has been successfully tested on x86_64-pc-linux-gnu.

PR c++/105760

gcc/cp/ChangeLog:

* pt.cc (build_deduction_guide): Check for error_mark_node
result from tsubst_arg_types.

gcc/testsuite/ChangeLog:

* g++.dg/parse/error66.C: New test.

c++ comment adjustments for 114935

gcc/cp/ChangeLog:

* decl.cc (wrap_cleanups_r): Clarify comment.
* init.cc (build_vec_init): Update comment.

pru: Implement TARGET_CLASS_LIKELY_SPILLED_P to fix PR115013

Commit r15-436-g44e7855e did not fix PR115013 for PRU because
SMALL_REGISTER_CLASS_P is not returning an accurate value for the PRU
backend.

Word mode for PRU backend is defined as 8-bit, yet all ALU operations
are preferred in 32-bit mode.  Thus checking whether a register class
contains a single word_mode register would not classify the actually
single SImode register classes as small.  This affected the
multiplication source and destination register classes.

Fix by implementing TARGET_CLASS_LIKELY_SPILLED_P to treat all register
classes with SImode or smaller size as likely spilled.  This in turn
corrects the behaviour of SMALL_REGISTER_CLASS_P for PRU.

PR rtl-optimization/115013

gcc/ChangeLog:

* config/pru/pru.cc (pru_class_likely_spilled_p): Implement
to mark classes containing one SImode register as likely
spilled.
(TARGET_CLASS_LIKELY_SPILLED_P): Define.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

RISC-V: avoid LUI based const materialization ... [part of PR/106265]

... if the constant can be represented as sum of two S12 values.
The two S12 values could instead be fused with subsequent ADD insn.
The helps
- avoid an additional LUI insn
- side benefits of not clobbering a reg

e.g.
                            w/o patch             w/ patch
long                  |                     |
plus(unsigned long i) | li a5,4096     |
{                     | addi a5,a5,-2032 | addi a0, a0, 2047
   return i + 2064;   | add a0,a0,a5    | addi a0, a0, 17
}                     | ret         | ret

NOTE: In theory not having const in a standalone reg might seem less
      CSE friendly, but for workloads in consideration these mat are
      from very late LRA reloads and follow on GCSE is not doing much
      currently.

The real benefit however is seen in base+offset computation for array
accesses and especially for stack accesses which are finalized late in
optim pipeline, during LRA register allocation. Often the finalized
offsets trigger LRA reloads resulting in mind boggling repetition of
exact same insn sequence including LUI based constant materialization.

This shaves off 290 billion dynamic instrustions (QEMU icounts) in
SPEC 2017 Cactu benchmark which is over 10% of workload. In the rest of
suite, there additional 10 billion shaved, with both gains and losses
in indiv workloads as is usual with compiler changes.

500.perlbench_r-0 |  1,214,534,029,025 | 1,212,887,959,387 |
500.perlbench_r-1 |    740,383,419,739 |   739,280,308,163 |
500.perlbench_r-2 |    692,074,638,817 |   691,118,734,547 |
502.gcc_r-0       |    190,820,141,435 |   190,857,065,988 |
502.gcc_r-1       |    225,747,660,839 |   225,809,444,357 | <- -0.02%
502.gcc_r-2       |    220,370,089,641 |   220,406,367,876 | <- -0.03%
502.gcc_r-3       |    179,111,460,458 |   179,135,609,723 | <- -0.02%
502.gcc_r-4       |    219,301,546,340 |   219,320,416,956 | <- -0.01%
503.bwaves_r-0    |    278,733,324,691 |   278,733,323,575 | <- -0.01%
503.bwaves_r-1    |    442,397,521,282 |   442,397,519,616 |
503.bwaves_r-2    |    344,112,218,206 |   344,112,216,760 |
503.bwaves_r-3    |    417,561,469,153 |   417,561,467,597 |
505.mcf_r         |    669,319,257,525 |   669,318,763,084 |
507.cactuBSSN_r   |  2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10%
508.namd_r        |  1,855,884,342,110 | 1,855,881,110,934 |
510.parest_r      |  1,654,525,521,053 | 1,654,402,859,174 |
511.povray_r      |  2,990,146,655,619 | 2,990,060,324,589 |
519.lbm_r         |  1,158,337,294,525 | 1,158,337,294,529 |
520.omnetpp_r     |  1,021,765,791,283 | 1,026,165,661,394 |
521.wrf_r         |  1,715,955,652,503 | 1,714,352,737,385 |
523.xalancbmk_r   |    849,846,008,075 |   849,836,851,752 |
525.x264_r-0      |    277,801,762,763 |   277,488,776,427 |
525.x264_r-1      |    927,281,789,540 |   926,751,516,742 |
525.x264_r-2      |    915,352,631,375 |   914,667,785,953 |
526.blender_r     |  1,652,839,180,887 | 1,653,260,825,512 |
527.cam4_r        |  1,487,053,494,925 | 1,484,526,670,770 |
531.deepsjeng_r   |  1,641,969,526,837 | 1,642,126,598,866 |
538.imagick_r     |  2,098,016,546,691 | 2,097,997,929,125 |
541.leela_r       |  1,983,557,323,877 | 1,983,531,314,526 |
544.nab_r         |  1,516,061,611,233 | 1,516,061,407,715 |
548.exchange2_r   |  2,072,594,330,215 | 2,072,591,648,318 |
549.fotonik3d_r   |  1,001,499,307,366 | 1,001,478,944,189 |
554.roms_r        |  1,028,799,739,111 | 1,028,780,904,061 |
557.xz_r-0        |    363,827,039,684 |   363,057,014,260 |
557.xz_r-1        |    906,649,112,601 |   905,928,888,732 |
557.xz_r-2        |    509,023,898,187 |   508,140,356,932 |
997.specrand_fr   |        402,535,577 |       403,052,561 |
999.specrand_ir   |        402,535,577 |       403,052,561 |

This should still be considered damage control as the real/deeper fix
would be to reduce number of LRA reloads or CSE/anchor those during
LRA constraint sub-pass (re)runs (thats a different PR/114729.

Implementation Details (for posterity)
--------------------------------------
- basic idea is to have a splitter selected via a new predicate for constant
   being possible sum of two S12 and provide the transform.
   This is however a 2 -> 2 transform which combine can't handle.
   So we specify it using a define_insn_and_split.

- the initial loose "i" constraint caused LRA to accept invalid insns thus
   needing a tighter new constraint as well.

- An additional fallback alternate with catch-all "r" register
   constraint also needed to allow any "reloads" that LRA might
   require for ADDI with const larger than S12.

Testing
--------
This is testsuite clean (rv64 only).
I'll rely on post-commit CI multlib run for any possible fallout for
other setups such as rv32.

|                                               |         gcc |          g++ |     gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/  lp64d/ medlow |  41 /    17 |    8 /     3 |    7 /     2 |
| rv64imafdc_zba_zbb_zbs_zicond/  lp64d/ medlow |  41 /    17 |    8 /     3 |    7 /     2 |

I also threw this into a buildroot run, it obviously boots Linux to
userspace. bloat-o-meter on glibc and kernel show overall decrease in
staic instruction counts with some minor spot increases.
These are generally in the category of

- LUI + ADDI are 2 byte each vs. two ADD being 4 byte each.
- Knock on effects due to inlining changes.
- Sometimes the slightly shorter 2-insn seq in a mult-exit function
   can cause in-place epilogue duplication (vs. a jump back).
   This is slightly larger but more efficient in execution.
In summary nothing to fret about.

| linux/scripts/bloat-o-meter build-gcc-240131/target/lib/libc.so.6 \
         build-gcc-240131-new-splitter-1-variant/target/lib/libc.so.6
|
| add/remove: 0/0 grow/shrink: 21/49 up/down: 520/-3056 (-2536)
| Function                                     old     new   delta
| getnameinfo                                 2756    2892    +136
...
| tempnam                                      136     144      +8
| padzero                                      276     284      +8
...
| __GI___register_printf_specifier             284     280      -4
| __EI_xdr_array                               468     464      -4
| try_file_lock                                268     260      -8
| pthread_create@GLIBC_2                      3520    3508     -12
| __pthread_create_2_1                        3520    3508     -12
...
| _nss_files_setnetgrent                       932     904     -28
| _nss_dns_gethostbyaddr2_r                   1524    1480     -44
| build_trtable                               3312    3208    -104
| printf_positional                          25000   22580   -2420
| Total: Before=2107999, After=2105463, chg -0.12%

Caveat:
------
Jeff noted during v2 review that the operand0 constraint !riscv_reg_frame_related
could potentially cause issues with hard reg cprop in future. If that
trips things up we will have to loosen the constraint while dialing down
the const range to (-2048 to 2032) as opposed to fll S12 range of
(-2048 to 2047) to keep stack regs aligned.

gcc/ChangeLog:
* config/riscv/riscv.h: New macros to check for sum of two S12
range.
* config/riscv/constraints.md: New constraint.
* config/riscv/predicates.md: New Predicate.
* config/riscv/riscv.md: New splitter.
* config/riscv/riscv.cc (riscv_reg_frame_related): New helper.
* config/riscv/riscv-protos.h: New helper prototype.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/sum-of-two-s12-const-1.c: New test: checks
for new patterns output.
* gcc.target/riscv/sum-of-two-s12-const-2.c: Ditto.
* gcc.target/riscv/sum-of-two-s12-const-3.c: New test: should not
ICE.

Tested-by: Edwin Lu <ewlu@rivosinc.com> # pre-commit-CI #1520
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

tree-optimization/99954 - redo loop distribution memcpy recognition fix

The following revisits the fix for PR99954 which was observed as
causing missed memcpy recognition and instead using memmove for
non-aliasing copies.  While the original fix mitigated bogus
recognition of memcpy the root cause was not properly identified.
The root cause is dr_analyze_indices "failing" to handle union
references and leaving the DRs indices in a state that's not correctly
handled by dr_may_alias.  The following mitigates this there
appropriately, restoring memcpy recognition for non-aliasing copies.

This makes us run into a latent issue in ptr_deref_may_alias_decl_p
when the pointer is something like &MEM[0].a in which case we fail
to handle non-SSA name pointers.  Add code similar to what we have
in ptr_derefs_may_alias_p.

PR tree-optimization/99954
* tree-data-ref.cc (dr_may_alias_p): For bases that are
not completely analyzed fall back to TBAA and points-to.
* tree-loop-distribution.cc
(loop_distribution::classify_builtin_ldst): When there
is no dependence again classify as memcpy.
* tree-ssa-alias.cc (ptr_deref_may_alias_decl_p): Verify
the pointer is an SSA name.

* gcc.dg/tree-ssa/ldist-40.c: New testcase.

[PATCH 3/3] RISC-V: Add memset-zero expansion to cbo.zero

The Zicboz extension offers the cbo.zero instruction, which can be used
to clean a memory region corresponding to a cache block.
The Zic64b extension defines the cache block size to 64 byte.
If both extensions are available, it is possible to use cbo.zero
to clear memory, if the alignment and size constraints are met.
This patch implements this.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_block_clear): New prototype.
* config/riscv/riscv-string.cc (riscv_expand_block_clear_zicboz_zic64b):
New function to expand a block-clear with cbo.zero.
(riscv_expand_block_clear): New RISC-V block-clear expansion function.
* config/riscv/riscv.md (setmem<mode>): New setmem expansion.

[PATCH 2/3] RISC-V: testsuite: Make cmo tests LTO safe

Let's add '\t' to the instruction match pattern to avoid false positive
matches when compiling with -flto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbom-1.c: Add \t to test pattern.
* gcc.target/riscv/cmo-zicbom-2.c: Likewise.
* gcc.target/riscv/cmo-zicbop-1.c: Likewise.
* gcc.target/riscv/cmo-zicbop-2.c: Likewise.
* gcc.target/riscv/cmo-zicboz-1.c: Likewise.
* gcc.target/riscv/cmo-zicboz-2.c: Likewise.

[1/3] expr: Export clear_by_pieces()

Make clear_by_pieces() available to other parts of the compiler,
similar to store_by_pieces().

gcc/ChangeLog:

* expr.cc (clear_by_pieces): Remove static from clear_by_pieces.
* expr.h (clear_by_pieces): Add prototype for clear_by_pieces.

[testsuite] Fix gcc.dg/pr115066.c fail on aarch64

On aarch64, I get this failure:
...
FAIL: gcc.dg/pr115066.c scan-assembler \\.byte\\t0xb\\t# Define macro strx
...

This happens because we expect to match:
...
        .byte   0xb     # Define macro strx
...
but instead we get:
...
        .byte   0xb     // Define macro strx
...

Fix this by not explicitly matching the comment marker.

Tested on aarch64 and x86_64.

gcc/testsuite/ChangeLog:

2024-05-14  Tom de Vries  <tdevries@suse.de>

* gcc.dg/pr115066.c: Don't match comment marker.

testsuite: analyzer: Fix fd-glibc-byte-stream-connection-server.c on Solaris [PR107750]

gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c currently FAILs
on Solaris:

FAIL: gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c (test for
excess errors)

Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c:91:3:
error: implicit declaration of function 'memset'
[-Wimplicit-function-declaration]

Solaris <sys/select.h> has

but no declaration of memset. While one can argue that this should be
fixed, it's easy enough to just include <string.h> instead, which is
what this patch does.

Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.

2024-05-14 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/testsuite:
PR analyzer/107750
* gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c:
Include <string.h>.

libstdc++: Guard dynamic_cast use in src/c++23/print.cc [PR115015]

Do not use dynamic_cast unconditionally, in case libstdc++ is built with
-fno-rtti.

libstdc++-v3/ChangeLog:

PR libstdc++/115015
* src/c++23/print.cc (__open_terminal(streambuf*)) [!__cpp_rtti]:
Do not use dynamic_cast.

libstdc++: Document when std::string::shrink_to_fit was added

This section can be misread to say that shrink_to_fit is available from
GCC 3.4, but it was added later.

libstdc++-v3/ChangeLog:

* doc/xml/manual/strings.xml: Clarify that GCC 4.5 added
std::string::shrink_to_fit.
* doc/html/manual/strings.html: Regenerate.

[debug] Fix dwarf v4 .debug_macro.dwo

Consider a hello world, compiled with -gsplit-dwarf and dwarf version 4, and
-g3:
...
$ gcc -gdwarf-4 -gsplit-dwarf /data/vries/hello.c -g3 -save-temps -dA
...

In section .debug_macro.dwo, we have:
...
.Ldebug_macro0:
        .value  0x4     # DWARF macro version number
        .byte   0x2     # Flags: 32-bit, lineptr present
        .long   .Lskeleton_debug_line0
        .byte   0x3     # Start new file
        .uleb128 0      # Included from line number 0
        .uleb128 0x1    # file /data/vries/hello.c
        .byte   0x5     # Define macro strp
        .uleb128 0      # At line number 0
        .uleb128 0x1d0  # The macro: "__STDC__ 1"
...

Given that we use a DW_MACRO_define_strp, we'd expect 0x1d0 to be an
offset into a .debug_str.dwo section.

But in fact, 0x1d0 is an index into the string offset table in
section .debug_str_offsets.dwo:
...
        .long   0x34f0  # indexed string 0x1d0: __STDC__ 1
...

Add asserts that catch this inconsistency, and fix this by using
DW_MACRO_define_strx instead.

Tested on x86_64.

gcc/ChangeLog:

2024-05-14  Tom de Vries  <tdevries@suse.de>

PR debug/115066
* dwarf2out.cc (output_macinfo_op): Fix DW_MACRO_define_strx/strp
choice for v4 .debug_macro.dwo.  Add asserts to check that choice.

gcc/testsuite/ChangeLog:

2024-05-14  Tom de Vries  <tdevries@suse.de>

PR debug/115066
* gcc.dg/pr115066.c: New test.

Reduce recursive inlining of always_inline functions

this patch tames down inliner on (mutiply) self-recursive always_inline functions.
While we already have caps on recursive inlning, the testcase combines early inliner
and late inliner to get very wide recursive inlining tree. The basic idea is to
ignore DISREGARD_INLINE_LIMITS when deciding on inlining self recursive functions
(so we cut on function being large) and clear the flag once it is detected.

I did not include the testcase since it still produces a lot of code and would
slow down testing. It also outputs many inlining failed messages that is not
very nice, but it is hard to detect self recursin cycles in full generality
when indirect calls and other tricks may happen.

gcc/ChangeLog:

PR ipa/113291

* ipa-inline.cc (enum can_inline_edge_by_limits_flags): New enum.
(can_inline_edge_by_limits_p): Take flags instead of multiple bools; add flag
for forcing inlinie limits.
(can_early_inline_edge_p): Update.
(want_inline_self_recursive_call_p): Update; use FORCE_LIMITS mode.
(check_callers): Update.
(update_caller_keys): Update.
(update_callee_keys): Update.
(recursive_inlining): Update.
(add_new_edges_to_heap): Update.
(speculation_useful_p): Update.
(inline_small_functions): Clear DECL_DISREGARD_INLINE_LIMITS on self recursion.
(flatten_function): Update.
(inline_to_all_callers_1): Update.

libstdc++: Fix typo in std::stacktrace::max_size [PR115063]

libstdc++-v3/ChangeLog:

PR libstdc++/115063
* include/std/stacktrace (basic_stacktrace::max_size): Fix typo
in reference to _M_alloc member.
* testsuite/19_diagnostics/stacktrace/stacktrace.cc: Check
max_size() compiles.