The `configure` scripts generated with autoconf often tests compiler
features by setting output to `/dev/null`, which then sets the dump
folder as being /dev/* and the compilation halts with an error because
GCC cannot create files in /dev/. This is a problem when configure is
testing for compiler features because it cannot tell if the failure was
due to unsupported features or any other problem, and disable it even
if it is working.
As an example, running configure overriding CFLAGS="-fdump-ipa-clones"
will result in several compiler-features as being disabled because of
gcc halting with an error creating files in /dev/*.
This commit fixes this issue by checking if the output file is
/dev/null or /dev/zero. In this case we use the current working
directory for dump output instead of the directory of the output
file because we cannot write to /dev/*.
Iain Buclaw [Fri, 19 Nov 2021 13:43:07 +0000 (14:43 +0100)]
libphobos: Increase size of defaultStackPages on OSX X86_64 targets.
As of macOS 11, libunwind now requires more stack space than 16k, so
default to a larger stack size. This is only applied to X86 as the
PAGESIZE is still 4k, however on AArch64 it is 16k.
libphobos/ChangeLog:
* libdruntime/core/thread/fiber.d (defaultStackPages): Increase size
on OSX X86_64 targets.
Iain Buclaw [Fri, 19 Nov 2021 13:26:07 +0000 (14:26 +0100)]
libphobos: Don't call __gthread_key_delete in the emutls destroy function.
Fixes a EXC_BAD_ACCESS issue seen on Darwin when the libphobos DSO gets
unloaded. Based on reading libgcc's emutls implementation, as it
doesn't call __gthread_key_delete directly, neither should libphobos.
libphobos/ChangeLog:
* libdruntime/gcc/emutls.d (emutlsDestroyThread): Don't remove entry
from global array.
(_d_emutls_destroy): Don't call __gthread_key_delete.
Iain Buclaw [Thu, 18 Nov 2021 21:43:40 +0000 (22:43 +0100)]
d: Use HOST_WIDE_INT for type size temporaries.
These variables are later used as the value for the format specifier
`%wd`, which the expected type may not match dinteger_t, causing
unnecessary -Wformat warnings.
gcc/d/ChangeLog:
* decl.cc (d_finish_decl): Use HOST_WIDE_INT for type size
temporaries.
Jan Hubicka [Wed, 17 Nov 2021 21:04:26 +0000 (22:04 +0100)]
Fix modref summary streaming
Fixes bug in streaming in modref access tree that now cause a failure
of gamess benchmark. The bug is quite old (present in GCC11 release) but it
needs quite interesting series of events to manifest. In particular
1) At lto time ISRA turns some parameters passed by reference to scalar
2) At lto time modref computes summaries for old parameters and then updates
them but does so quite stupidly believing that the load from parameters
are now unkonwn loads (rather than optimized out).
This renders summary not very useful since it thinks every memory aliasing
int is now accssed (as opposed as parameter dereference)
3) At stream in we notice too early that summary is useless, set every_access
flag and drop the list. However while reading rest of the summary we
overwrite the flag back to 0 which makes us to lose part of summary.
4) right selection of partitions needs to be done to avoid late modref from
recalculating and thus fixing the summary.
This patch fixes the stream in bug, however we also should fix updating of
summaries.
gcc/ChangeLog:
2021-11-17 Jan Hubicka <hubicka@ucw.cz>
PR ipa/103246
* ipa-modref.c (read_modref_records): Fix streaminig in of every_access
flag.
Mikael Morin [Sun, 7 Nov 2021 13:39:18 +0000 (14:39 +0100)]
fortran: Ignore unused args in scalarization [PR97896]
The KIND argument of the INDEX intrinsic is a compile time constant
that is used at compile time only to resolve to a kind-specific library
function. That argument is otherwise completely ignored at runtime, and there is
no code generated for it as the library procedure has no kind argument.
This confuses the scalarizer which expects to see every argument
of elemental functions used when calling a procedure.
This change removes the argument from the scalarization lists
at the beginning of the scalarization process, so that the argument
is completely ignored.
This also reverts the existing workaround
(commit d09847357b965a2c2cda063827ce362d4c9c86f2 except for its testcase).
PR fortran/97896
gcc/fortran/ChangeLog:
* intrinsic.c (add_sym_4ind): Remove.
(add_functions): Use add_sym4 instead of add_sym4ind.
Don’t special case the index intrinsic.
* iresolve.c (gfc_resolve_index_func): Use the individual arguments
directly instead of the full argument list.
* intrinsic.h (gfc_resolve_index_func): Update the declaration
accordingly.
* trans-decl.c (gfc_get_extern_function_decl): Don’t modify the
list of arguments in the case of the index intrinsic.
* trans-array.h (gfc_get_intrinsic_for_expr,
gfc_get_proc_ifc_for_expr): New.
* trans-array.c (gfc_get_intrinsic_for_expr,
arg_evaluated_for_scalarization): New.
(gfc_walk_elemental_function_args): Add intrinsic procedure
as argument. Count arguments. Check arg_evaluated_for_scalarization.
* trans-intrinsic.c (gfc_walk_intrinsic_function): Update call.
* trans-stmt.c (get_intrinsic_for_code): New.
(gfc_trans_call): Update call.
Philipp Tomsich [Thu, 20 May 2021 19:57:48 +0000 (21:57 +0200)]
aarch64: enable Ampere-1 CPU
This adds support and a basic tuning model for the Ampere Computing
"Ampere-1" CPU.
The Ampere-1 implements the ARMv8.6 architecture in A64 mode and is
modelled as a 4-wide issue (as with all modern micro-architectures,
the chosen issue rate is a compromise between the maximum dispatch
rate and the maximum rate of uops issued to the scheduler).
This adds the -mcpu=ampere1 command-line option and the relevant cost
information/tuning tables for the Ampere-1.
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): New Ampere-1
core.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64-cost-tables.h: Add extra costs for
Ampere-1.
* config/aarch64/aarch64.c: Add tuning structures for Ampere-1.
* doc/invoke.texi: Add documentation for Ampere-1 core.
Kewen Lin [Thu, 11 Nov 2021 01:59:18 +0000 (19:59 -0600)]
rs6000/doc: Rename future cpu with power10
Commmit 5d9d0c94588 renamed future to power10 and ace60939fd2
updated the documentation for "future" renaming. This patch
is to rename the remaining "future architecture" references in
documentation and polish the words for float128.
gcc/ChangeLog:
* doc/invoke.texi: Change references to "future cpu" to "power10",
"-mcpu=future" to "-mcpu=power10". Adjust words for float128.
Harald Anlauf [Wed, 10 Nov 2021 19:30:27 +0000 (20:30 +0100)]
Fortran: avoid NULL pointer dereferences
CLASS(), PARAMETER is not yet properly implemented in gfortran. Using it
in declarations could lead to subsequent NULL pointer dereferences during
checking or simplification of expressions involving those CLASS variables.
gcc/fortran/ChangeLog:
PR fortran/103137
PR fortran/103138
* check.c (gfc_check_shape): Avoid NULL pointer dereference on
missing ref.
* simplify.c (gfc_simplify_cshift): Avoid NULL pointer dereference
when shape not set.
(gfc_simplify_transpose): Likewise.
Richard Biener [Mon, 18 Oct 2021 07:10:43 +0000 (09:10 +0200)]
tree-optimization/102798 - avoid copying PTA info to old SSA names
The vectorizer duplicates pointer-info to created pointer bases
but it has to avoid changing points-to info on existing SSA names
because there's now flow-sensitive info in there (pt->pt_null as
set from VRP).
2021-10-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/102798
* tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
Only copy points-to info to newly generated SSA names.
Richard Biener [Thu, 30 Sep 2021 13:05:53 +0000 (15:05 +0200)]
middle-end/102518 - avoid invalid GIMPLE during inlining
When inlining we have to avoid mapping a non-lvalue parameter
value into a context that prevents the parameter to be a register.
Formerly the register were TREE_ADDRESSABLE but now it can be
just DECL_NOT_GIMPLE_REG_P.
2021-09-30 Richard Biener <rguenther@suse.de>
PR middle-end/102518
* tree-inline.c (setup_one_parameter): Avoid substituting
an invariant into contexts where a GIMPLE register is not valid.
Bool pattern recog is required for correctness since vectorized
compares otherwise produce -1 for true so any context where bool
is used as value and not as condition or mask needs to be replaced
with CMP ? 1 : 0. When we fail to find a vector type for the
result of such use we may not simply elide such transform since
a new bool result can emerge when for example the cast_forwprop
pattern is applied. So the following avoids failing of the
bool pattern recog process and instead not assign a vector type
for the stmt.
2021-10-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/102788
* tree-vect-patterns.c (vect_init_pattern_stmt): Allow
a NULL vectype.
(vect_pattern_recog_1): Likewise.
(vect_recog_bool_pattern): Continue matching the pattern
even if we do not have a vector type for a conversion
result.
Richard Biener [Tue, 12 Oct 2021 11:42:08 +0000 (13:42 +0200)]
tree-optimization/102572 - fix gathers with invariant mask
This fixes the vector def gathering for invariant masks which
failed to pass in the desired vector type resulting in a non-mask
type to be generate.
2021-10-12 Richard Biener <rguenther@suse.de>
PR tree-optimization/102572
* tree-vect-stmts.c (vect_build_gather_load_calls): When
gathering the vectorized defs for the mask pass in the
desired mask vector type so invariants will be handled
correctly.
Richard Biener [Tue, 31 Aug 2021 08:28:40 +0000 (10:28 +0200)]
tree-optimization/102139 - fix SLP DR base alignment
When doing whole-function SLP we have to make sure the recorded
base alignments we compute as the maximum alignment seen for a
base anywhere in the function is actually valid at the point
we want to make use of it.
To make this work we now record the stmt the alignment was derived
from in addition to the DRs innermost behavior and we use a
dominance check to verify the recorded info is valid when doing
BB vectorization. For this to work for groups inside a BB that are
separate by a call that might not return we now store the DR
analysis group-id permanently and use that for an additional check
when the DRs are in the same BB.
2021-08-31 Richard Biener <rguenther@suse.de>
PR tree-optimization/102139
* tree-vectorizer.h (vec_base_alignments): Adjust hash-map
type to record a std::pair of the stmt-info and the innermost
loop behavior.
(dr_vec_info::group): New member.
* tree-vect-data-refs.c (vect_record_base_alignment): Adjust.
(vect_compute_data_ref_alignment): Verify the recorded
base alignment can be used.
(data_ref_pair): Remove.
(dr_group_sort_cmp): Adjust.
(vect_analyze_data_ref_accesses): Store the group-ID in the
dr_vec_info and operate on a vector of dr_vec_infos.
Richard Biener [Fri, 20 Aug 2021 09:32:00 +0000 (11:32 +0200)]
Refactor BB splitting of DRs for SLP group analysis
This uses the group_id computed to ensure DRs in different BBs do
not get merged into a DR group. To achieve this we seed the
group from the BB index when group_ids are not computed and we
make sure to bump the group_id when advancing to the next BB for
BB SLP analysis.
This paves the way for relaxing the grouping for BB vectorization
by adjusting its group_id computation.
2021-08-20 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (dr_group_sort_cmp): Do not compare
BBs.
(vect_analyze_data_ref_accesses): Likewise. Assign the BB
index as group_id when dataref_groups were not computed.
* tree-vect-slp.c (vect_slp_bbs): Bump current_group when
we advace to the next BB.
Richard Biener [Mon, 11 Oct 2021 14:06:03 +0000 (16:06 +0200)]
middle-end/101480 - overloaded global new/delete
The following fixes the issue of ignoring side-effects on memory
from overloaded global new/delete operators by not marking them
as effectively 'const' apart from other explicitely specified
side-effects.
This will cause
FAIL: g++.dg/warn/Warray-bounds-16.C -std=gnu++1? (test for excess errors)
because we now no longer statically see the initialization loop
never executes because the call to operator new can now clobber 'a.m'.
This seems to be an issue with the warning code and/or ranger so
I'm leaving this FAIL to be addressed as followup.
2021-10-11 Richard Biener <rguenther@suse.de>
PR middle-end/101480
* gimple.c (gimple_call_fnspec): Do not mark operator new/delete
as const.
Kewen Lin [Tue, 26 Oct 2021 02:05:02 +0000 (21:05 -0500)]
vect: Don't update inits for simd_lane_access DRs [PR102789]
As PR102789 shows, when vectorizer does some peelings for alignment
in prologues, function vect_update_inits_of_drs would update the
inits of some drs. But as the failed case, we shouldn't update the
dr for simd_lane_access, it has the fixed-length storage mainly for
the main loop, the update can make the access out of bound and access
the unexpected element.
gcc/ChangeLog:
PR tree-optimization/102789
* tree-vect-loop-manip.c (vect_update_inits_of_drs): Do not
update inits of simd_lane_access.
Harald Anlauf [Fri, 15 Oct 2021 19:23:17 +0000 (21:23 +0200)]
Fortran: validate shape of arrays in constructors against declarations
gcc/fortran/ChangeLog:
PR fortran/102685
* decl.c (match_clist_expr): Set rank/shape of clist initializer
to match LHS.
* resolve.c (resolve_structure_cons): In a structure constructor,
compare shapes of array components against declared shape.
This change implements TI mode on PA64. Various new patterns are
added to pa.md. The libgcc build needed modification to build both
DI and TI routines. We also need various softfp routines to
convert to and from TImode.
I added full softfp for the -msoft-float option. At the moment,
this doesn't completely eliminate all use of the floating-point
co-processor. For this, libgcc needs to be built with -msoft-mult.
The floating-point exception support also needs a soft option.
2021-11-05 John David Anglin <danglin@gcc.gnu.org>
PR libgomp/96661
gcc/ChangeLog:
* config/pa/pa-modes.def: Add OImode integer type.
* config/pa/pa.c (pa_scalar_mode_supported_p): Allow TImode
for TARGET_64BIT.
* config/pa/pa.h (MIN_UNITS_PER_WORD) Define to MIN_UNITS_PER_WORD
to UNITS_PER_WORD if IN_LIBGCC2.
* config/pa/pa.md (addti3, addvti3, subti3, subvti3, negti2,
negvti2, ashlti3, shrpd_internal): New patterns.
Change some multi instruction types to multi.
Martin Liska [Fri, 13 Aug 2021 15:22:35 +0000 (17:22 +0200)]
Speed up jump table switch detection.
PR tree-optimization/100393
gcc/ChangeLog:
* tree-switch-conversion.c (group_cluster::dump): Use
get_comparison_count.
(jump_table_cluster::find_jump_tables): Pre-compute number of
comparisons and then decrement it. Cache also max_ratio.
(jump_table_cluster::can_be_handled): Change signature.
* tree-switch-conversion.h (get_comparison_count): New.
Hongyu Wang [Wed, 3 Nov 2021 05:58:52 +0000 (13:58 +0800)]
i386: Fix wrong result for AMX-TILE intrinsic when parsing expression.
_tile_loadd, _tile_stored, _tile_streamloadd intrinsics are defined by
macro, so the parameters should be wrapped by parentheses to accept
expressions.
gcc/ChangeLog:
* config/i386/amxtileintrin.h (_tile_loadd_internal): Add
parentheses to base and stride.
(_tile_stream_loadd_internal): Likewise.
(_tile_stored_internal): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/amxtile-3.c: New test.
ranger: Fix `-Werror' build error with `ranger_cache::push_poor_value'
Remove a commit 86534c07a390 ("Disable poor value processing in ranger
cache.") regression that caused GCC not to build anymore if `-Werror'
has been enabled:
.../gcc/gimple-range-cache.cc: In member function 'bool ranger_cache::push_poor_value(basic_block, tree)':
.../gcc/gimple-range-cache.cc:850:44: error: unused parameter 'bb' [-Werror=unused-parameter]
850 | ranger_cache::push_poor_value (basic_block bb, tree name)
| ~~~~~~~~~~~~^~
.../gcc/gimple-range-cache.cc:850:53: error: unused parameter 'name' [-Werror=unused-parameter]
850 | ranger_cache::push_poor_value (basic_block bb, tree name)
| ~~~~~^~~~
To keep the change to the minimum mark the parameters reported unused.
gcc/
* gimple-range-cache.cc (ranger_cache::push_poor_value): Mark
parameters unused.
[PR102842] Consider all outputs in generation of matching reloads
Without considering all output insn operands (not only processed
before), in rare cases LRA can use the same hard register for
different outputs of the insn on different assignment subpasses. The
patch fixes the problem.
gcc/ChangeLog:
PR rtl-optimization/102842
* lra-constraints.c (match_reload): Ignore out in checking values
of outs.
(curr_insn_transform): Collect outputs before doing reloads of operands.
gcc/testsuite/ChangeLog:
PR rtl-optimization/102842
* g++.target/arm/pr102842.C: New test.
Richard Biener [Wed, 13 Oct 2021 07:13:36 +0000 (09:13 +0200)]
ipa/102714 - IPA SRA eliding volatile
The following fixes the volatileness check of IPA SRA which was
looking at the innermost reference when checking TREE_THIS_VOLATILE
but the reference to check is the outermost one.
Jonathan Wakely [Mon, 1 Nov 2021 11:06:51 +0000 (11:06 +0000)]
libstdc++: Fix range access for empty std::valarray [PR103022]
The std::begin and std::end overloads for std::valarray are defined in
terms of std::addressof(v[0]) which is undefined for an empty valarray.
libstdc++-v3/ChangeLog:
PR libstdc++/103022
* include/std/valarray (begin, end): Do not dereference an empty
valarray. Add noexcept and [[nodiscard]].
* testsuite/26_numerics/valarray/range_access.cc: Check empty
valarray. Check iterator properties. Run as well as compiling.
* testsuite/26_numerics/valarray/range_access2.cc: Likewise.
* testsuite/26_numerics/valarray/103022.cc: New test.
Martin Jambor [Wed, 27 Oct 2021 17:15:33 +0000 (19:15 +0200)]
sra: Fix corner case of total scalarization with virtual inheritance (PR 102505)
PR 102505 is a situation where of SRA takes its initial top-level
access size from a get_ref_base_and_extent called on a COMPONENT_REF,
and thus derived frm the FIELD_DECL, which however does not include a
virtual base. Total scalarization then goes on traversing the type,
which however has virtual base past the non-virtual bits, tricking SRA
to create sub-accesses outside of the supposedly encompassing
accesses, which in turn triggers the verifier within the pass.
The patch below fixes that by failing total scalarization when this
situation is detected.
PR tree-optimization/102505
* tree-sra.c (totally_scalarize_subtree): Check that the
encountered field fits within the acces we would like to put it
in.
gcc/testsuite/ChangeLog:
2021-10-20 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/102505
* g++.dg/torture/pr102505.C: New test.
Piotr Kubaj [Sat, 16 Oct 2021 02:09:05 +0000 (04:09 +0200)]
gcc/configure: Check for powerpc64le*-*-freebsd*
Only powerpc64-unknown-freebsd was checked for.
Signed-off-by: Piotr Kubaj <pkubaj@FreeBSD.org>
gcc/
* configure.ac: Treat powerpc64*-*-freebsd* the same as
powerpc64-*-freebsd*.
* configure: Regenerate.
Revise -mdisable-fpregs option and add new -msoft-mult option
The behavior of the -mdisable-fpregs is confusing in that it doesn't
disable the use of the floating-point registers in all situations.
The -msoft-float disables the use of the floating-point registers in
all situations. The Linux kernel only needs to disable use of the
xmpyu instruction to avoid using the floating-point registers.
This change revises the -mdisable-fpregs option to disable the use of
the floating-point registers in all situations. It is now equivalent
to the -msoft-float option. A new -msoft-mult option is added to
disable use of the xmpyu instruction. The libgcc library can be
compiled with the -msoft-mult option to avoid using hardware integer
multiplication.
2021-10-24 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa-d.c (pa_d_handle_target_float_abi): Don't check
TARGET_DISABLE_FPREGS.
* config/pa/pa.c (fix_range): Use MASK_SOFT_FLOAT instead of
MASK_DISABLE_FPREGS.
(hppa_rtx_costs): Don't check TARGET_DISABLE_FPREGS. Adjust
cost of hardware integer multiplication.
(pa_conditional_register_usage): Don't check TARGET_DISABLE_FPREGS.
* config/pa/pa.h (INT14_OK_STRICT): Likewise.
* config/pa/pa.md: Don't check TARGET_DISABLE_FPREGS. Check
TARGET_SOFT_FLOAT in patterns that use xmpyu instruction.
* config/pa/pa.opt (mdisable-fpregs): Change target mask to
SOFT_FLOAT. Revise comment.
(msoft-float): New option.
Arnaud Charlet [Wed, 20 Oct 2021 08:23:40 +0000 (10:23 +0200)]
Avoid exception propagation during bootstrap
This addresses PR ada/100486, which is the bootstrap failure of GCC 11 for
32-bit Windows in the MSYS setup. The PR shows that we cannot rely on
exception propagation being operational during the bootstrap, at least on
the 11 branch, so fix this by removing the problematic raise statement.
gcc/ada/
PR ada/100486
* sem_prag.adb (Check_Valid_Library_Unit_Pragma): Do not raise an
exception as part of the bootstrap.
Jakub Jelinek [Wed, 20 Oct 2021 06:38:58 +0000 (08:38 +0200)]
c++: Fix up push_local_extern_decl_alias error recovery [PR102642]
My recent push_local_extern_decl_alias change broke error-recovery,
do_pushdecl can return error_mark_node and set_decl_tls_model can't be
called on that. There are other code paths that store error_mark_node
into DECL_LOCAL_DECL_ALIAS, with the intent to differentiate the cases
where we haven't yet tried to push it into the namespace scope (NULL)
and one where we have tried it but it failed (error_mark_node), but looking
around, there are other spots where we call functions or do processing
which doesn't tolerate error_mark_node.
So, the first hunk with the testcase fixes the testcase, the others
fix what I've spotted and the fix was easy to figure out (there are I think
3 other spots mainly for function multiversioning).
2021-10-20 Jakub Jelinek <jakub@redhat.com>
PR c++/102642
* name-lookup.c (push_local_extern_decl_alias): Don't call
set_decl_tls_model on error_mark_node.
* decl.c (make_rtl_for_nonlocal_decl): Don't call
set_user_assembler_name on error_mark_node.
* parser.c (cp_parser_oacc_declare): Ignore DECL_LOCAL_DECL_ALIAS
if it is error_mark_node.
(cp_parser_omp_declare_target): Likewise.
Jonathan Wakely [Tue, 19 Oct 2021 15:00:13 +0000 (16:00 +0100)]
libstdc++: Fix doxygen generation to work with relative paths
In r12-826 I tried to remove some redundant steps from the doxygen
build, but they are needed when configure is run as a relative path. The
use of pwd is to resolve the relative path to an absolute one.
Tobias Burnus [Mon, 18 Oct 2021 07:49:05 +0000 (09:49 +0200)]
Fortran: Fix CLASS conversion check [PR102745]
PR fortran/102745
gcc/fortran/ChangeLog
* intrinsic.c (gfc_convert_type_warn): Fix checks by checking CLASS
and do typcheck in correct order for type extension.
* misc.c (gfc_typename): Print proper not internal CLASS type name.
Kito Cheng [Thu, 7 Oct 2021 08:17:13 +0000 (16:17 +0800)]
[PR/target 100316] Allow constant address for __builtin___clear_cache.
__builtin___clear_cache was able to accept constant address for the
argument, but it seems no longer accept recently, and it even not
accept constant address which is hold in variable when optimization is
enable:
Changes v2 -> v3:
- Use gcc_assert rather than error, maybe_emit_call_builtin___clear_cache is
internal use only, and we already checked the type in other place.
Changes v1 -> v2:
- Check is CONST_INT intead of cehck mode, no new testcase, since
constant value with other type like CONST_DOUBLE will catched by
front-end.
e.g.
Code:
```c
void foo(){
__builtin___clear_cache(1.11, 0);
}
```
Error message:
```
clearcache-double.c: In function 'foo':
clearcache-double.c:2:27: error: incompatible type for argument 1 of '__builtin___clear_cache'
2 | __builtin___clear_cache(1.11, 0);
| ^~~~
| |
| double
clearcache-double.c:2:27: note: expected 'void *' but argument is of type 'double'
```
gcc/ChangeLog:
PR target/100316
* builtins.c (maybe_emit_call_builtin___clear_cache): Allow
CONST_INT for BEGIN and END, and use gcc_assert rather than
error.
Jakub Jelinek [Fri, 15 Oct 2021 14:25:25 +0000 (16:25 +0200)]
openmp: Fix up handling of OMP_PLACES=threads(1)
When writing the places-*.c tests, I've noticed that we mishandle threads
abstract name with specified num-places if num-places isn't a multiple of
number of hw threads in a core. It then happily ignores the maximum count
and overwrites for the remaining hw threads in a core further places that
haven't been allocated.
2021-10-15 Jakub Jelinek <jakub@redhat.com>
* config/linux/affinity.c (gomp_affinity_init_level_1): For level 1
after creating count places clean up and return immediately.
* testsuite/libgomp.c/places-6.c: New test.
* testsuite/libgomp.c/places-7.c: New test.
* testsuite/libgomp.c/places-8.c: New test.
Andrew Stubbs [Wed, 13 Oct 2021 10:53:42 +0000 (11:53 +0100)]
amdgcn: fix up offload debug linking with LLVM 13
Between LLVM 9 and LLVM 13 the attribute works differently in several ways,
and this needs to be allowed for in GCC and mkoffload independently.
This patch fixes up mkoffload when debug info is enabled, which is made more
complicated because the configure tests checks whether the attribute option
is accepted silently, but does not check if the assembler actually sets the
ELF flags for that attribute, and mkoffload needs to mimick that behaviour
exactly. The patch therefore removes some of the conditionals.
gcc/ChangeLog:
* config/gcn/gcn-hsa.h (S_FIJI): Set unconditionally.
(S_900): Likewise.
(S_906): Likewise.
* config/gcn/gcn.c: Hard code SRAM ECC settings for old architectures.
* config/gcn/mkoffload.c (ELFABIVERSION_AMDGPU_HSA): Rename to ...
(ELFABIVERSION_AMDGPU_HSA_V3): ... this.
(ELFABIVERSION_AMDGPU_HSA_V4): New.
(SET_SRAM_ECC_UNSUPPORTED): New.
(copy_early_debug_info): Create elf flags to match the other objects.
(main): Just let the attribute flags pass through.
Andrew Stubbs [Thu, 30 Sep 2021 16:50:33 +0000 (17:50 +0100)]
amdgcn: Fix assembler version incompatibility
This is another case of the global_load instruction format changing in LLVM
(because they fixed a bug). The configure test is already in place to detect
what is needed.
Andrew Stubbs [Tue, 28 Sep 2021 15:26:09 +0000 (16:26 +0100)]
amdgcn: Implement -msram-ecc=any
The option was already there, but just an alias for -msram-ecc=on. Now that
LLVM13 supports HSACOv4 and the new ELF flags I can implement the option
properly.
The "any" option is the default in order to ensure that library files work
whichever way the user wants, which means we won't need multilibs to support
the different SRAM ECC hardware configurations.
Andrew Stubbs [Sat, 16 Oct 2021 16:41:39 +0000 (18:41 +0200)]
amdgcn: Support LLVM 13 assembler syntax
The LLVM devs have changed the assembler architecture attribute names on both
CLI and in the ".amdgcn_target" directive, and changed the attribute syntax
inside the directive, without keeping any backwards compatibility. :-(
This patch improves our configure tests to detect what dialect to use, what
attributes are valid, and adjusts the specs to match.
gcc/ChangeLog:
* config.in: Regenerate.
* config/gcn/gcn-hsa.h (X_FIJI): New macro.
(X_900): New macro.
(X_906): New macro.
(X_908): New macro.
(A_FIJI): Rename to ...
(S_FIJI): ... this.
(A_900): Rename to ...
(S_900): ... this.
(A_906): Rename to ...
(S_906): ... this.
(A_908): Rename to ...
(S_908): ... this.
(SRAMOPT): New macro.
(ASM_SPEC): Adjust xnack option usage.
* config/gcn/gcn.c (output_file_start): Adjust amdgcn_target usage.
* configure: Regenerate.
* configure.ac: Detect LLVM assembler dialect.
Julian Brown [Mon, 28 Jun 2021 13:58:52 +0000 (06:58 -0700)]
amdgcn: Mark s_mulk_i32 as clobbering SCC
The s_mulk_i32 instruction sets the SCC status register according to
whether the multiplication overflows, but that is not currently modelled
in the GCN backend. AFAIK this is a latent bug and hasn't been noticed
"in the wild", but it should be fixed.
2021-06-29 Julian Brown <julian@codesourcery.com>
gcc/
* config/gcn/gcn.md (mulsi3): Make s_mulk_i32 variant clobber SCC.
Andrew Stubbs [Thu, 8 Jul 2021 14:47:53 +0000 (15:47 +0100)]
amdgcn: Add -mxnack and -msram-ecc [PR 100208]
gcc/ChangeLog:
PR target/100208
* config/gcn/gcn-hsa.h (DRIVER_SELF_SPECS): New.
(ASM_SPEC): Set -mattr for xnack and sram-ecc.
* config/gcn/gcn-opts.h (enum sram_ecc_type): New.
* config/gcn/gcn-valu.md: Add a warning comment.
* config/gcn/gcn.c (gcn_option_override): Add "sorry" for -mxnack.
(output_file_start): Add xnack and sram-ecc state to ".amdgcn_target".
* config/gcn/gcn.md: Add a warning comment.
* config/gcn/gcn.opt: Add -mxnack and -msram-ecc.
* config/gcn/mkoffload.c (EF_AMDGPU_MACH_AMDGCN_GFX908): Remove
SRAM-ECC flag.
(EF_AMDGPU_XNACK): New.
(EF_AMDGPU_SRAM_ECC): New.
(elf_flags): New.
(copy_early_debug_info): Use elf_flags.
(main): Handle -mxnack and -msram-ecc options.
* doc/invoke.texi: Document -mxnack and -msram-ecc.
gcc/testsuite/ChangeLog:
PR target/100208
* gcc.target/gcn/sram-ecc-1.c: New test.
* gcc.target/gcn/sram-ecc-2.c: New test.
* gcc.target/gcn/sram-ecc-3.c: New test.
* gcc.target/gcn/sram-ecc-4.c: New test.
* gcc.target/gcn/sram-ecc-5.c: New test.
* gcc.target/gcn/sram-ecc-6.c: New test.
* gcc.target/gcn/sram-ecc-7.c: New test.
* gcc.target/gcn/sram-ecc-8.c: New test.