Richard Biener [Wed, 6 May 2020 08:23:15 +0000 (10:23 +0200)]
middle-end/94964 - avoid EH loop entry with CP_SIMPLE_PREHEADERS
Loop optimizers expect to be able to insert on the preheader
edge w/o splitting it thus avoid ending up with a preheader
that enters the loop via an EH edge (or an abnormal edge).
Richard Biener [Fri, 15 May 2020 07:38:54 +0000 (09:38 +0200)]
tree-optimization/95133 - avoid abnormal edges in path splitting
When path splitting tries to detect a CFG diamond make sure it
is composed of normal (non-EH, not abnormal) edges. Otherwise
CFG manipulation later may fail.
2020-05-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/95133
* gimple-ssa-split-paths.c
(find_block_to_duplicate_for_splitting_paths): Check for
normal edges.
Richard Biener [Wed, 17 Jun 2020 12:57:59 +0000 (14:57 +0200)]
tree-optimization/95717 - fix SSA update for vectorizer epilogue
This fixes yet another issue with the custom SSA updating in the
vectorizer when we copy from the non-if-converted loop. We must
not mess with current defs before we updated the BB copies.
2020-06-17 Richard Biener <rguenther@suse.de>
PR tree-optimization/95717
* tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg):
Move BB SSA updating before exit/latch PHI current def copying.
This attempts to make is_nothrow_constructible more robust (and
efficient to compile) by not depending on is_constructible. Instead the
__is_constructible intrinsic is used directly. The helper class
__is_nt_constructible_impl which checks whether the construction is
non-throwing now takes a bool template parameter that is substituted by
the result of the instrinsic. This fixes the reported bug by not using
the already-instantiated (and incorrect) value of std::is_constructible.
I don't think it really fixes the problem in general, because
std::is_nothrow_constructible itself could already have been
instantiated in a context where it gives the wrong result. A proper fix
needs to be done in the compiler.
Backported to the gcc-8 and gcc-9 branches to fix PR 96999.
PR libstdc++/94033
* include/std/type_traits (__is_nt_default_constructible_atom): Remove.
(__is_nt_default_constructible_impl): Remove.
(__is_nothrow_default_constructible_impl): Remove.
(__is_nt_constructible_impl): Add bool template parameter. Adjust
partial specializations.
(__is_nothrow_constructible_impl): Replace class template with alias
template.
(is_nothrow_default_constructible): Derive from alias template
__is_nothrow_constructible_impl instead of
__is_nothrow_default_constructible_impl.
* testsuite/20_util/is_nothrow_constructible/94003.cc: New test.
Eric Botcazou [Fri, 11 Sep 2020 08:41:28 +0000 (10:41 +0200)]
Fix crash on array component with nonstandard index type
This is a regression present on mainline, 10 and 9 branches: the compiler
goes into an infinite recursion eventually exhausting the stack for the
declaration of a discriminated record type with an array component having
a discriminant as bound and an index type that is an enumeration type with
a non-standard representation clause.
gcc/ada/ChangeLog:
* gcc-interface/decl.c (gnat_to_gnu_entity) <E_Array_Subtype>: Only
create extra subtypes for discriminants if the RM size of the base
type of the index type is lower than that of the index type.
gcc/testsuite/ChangeLog:
* gnat.dg/specs/discr7.ads: New test.
Eric Botcazou [Thu, 10 Sep 2020 15:47:32 +0000 (17:47 +0200)]
Fix uninitialized variable with nested variant record types
This fixes a wrong code issue with nested variant record types: the
compiler generates move instructions that depend on an uninitialized
variable, which was initially a SAVE_EXPR not instantiated early enough.
gcc/ada/ChangeLog:
* gcc-interface/decl.c (build_subst_list): For a definition, make
sure to instantiate the SAVE_EXPRs generated by the elaboration of
the constraints in front of the elaboration of the type itself.
gcc/testsuite/ChangeLog:
* gnat.dg/discr59.adb: New test.
* gnat.dg/discr59_pkg1.ads: New helper.
* gnat.dg/discr59_pkg2.ads: Likewise.
Harald Anlauf [Thu, 3 Sep 2020 18:33:14 +0000 (20:33 +0200)]
PR fortran/96890 - Wrong answer with intrinsic IALL
The IALL intrinsic would always return 0 when the DIM and MASK arguments
were present since the initial value of repeated BIT-AND operations was
set to 0 instead of -1.
libgfortran/ChangeLog:
* m4/iall.m4: Initial value for result should be -1.
* generated/iall_i1.c (miall_i1): Generated.
* generated/iall_i16.c (miall_i16): Likewise.
* generated/iall_i2.c (miall_i2): Likewise.
* generated/iall_i4.c (miall_i4): Likewise.
* generated/iall_i8.c (miall_i8): Likewise.
Jonathan Wakely [Thu, 30 Apr 2020 14:47:52 +0000 (15:47 +0100)]
libstdc++: Avoid errors in allocator's noexcept-specifier (PR 89510)
This fixes a regression due to the conditional noexcept-specifier on
std::allocator::construct and std::allocator::destroy, as well as the
corresponding members of new_allocator, malloc_allocator, and
allocator_traits. Those noexcept-specifiers were using expressions which
might be ill-formed, which caused errors outside the immediate context
when checking for the presence of construct and destroy in SFINAE
contexts.
The fix is to use the is_nothrow_constructible and
is_nothrow_destructible type traits instead, because those traits are
safe to use even when the construction/destruction itself is not valid.
The is_nothrow_constructible trait will be false for a type that is not
also nothrow-destructible, even if the new-expression used in the
construct function body is actually noexcept. That's not the correct
answer, but isn't a problem because providing a noexcept-specifier on
these functions is not required by the standard anyway. If the answer is
false when it should be true, that's suboptimal but OK (unlike giving
errors for valid code, or giving a true answer when it should be false).
PR libstdc++/89510
* include/bits/alloc_traits.h (allocator_traits::_S_construct)
(allocator_traits::_S_destroy)
(allocator_traits<allocator<T>>::construct): Use traits in
noexcept-specifiers.
* include/bits/allocator.h (allocator<void>::construct)
(allocator<void>::destroy): Likewise.
* include/ext/malloc_allocator.h (malloc_allocator::construct)
(malloc_allocator::destroy): Likewise.
* include/ext/new_allocator.h (new_allocator::construct)
(new_allocator::destroy): Likewise.
* testsuite/20_util/allocator/89510.cc: New test.
* testsuite/ext/malloc_allocator/89510.cc: New test.
* testsuite/ext/new_allocator/89510.cc: New test.
Kewen Lin [Wed, 2 Sep 2020 02:12:32 +0000 (21:12 -0500)]
rs6000: Backport fixes for PR92923 and PR93136
This patch is to backport the fix for PR92923 and its sequent fix
for PR93136. We found the builtin functions needlessly using
VIEW_CONVERT_EXPRs on their operands can probably cause remarkable
performance degradation especailly when they are used in the hotspot.
One typical case is
...github.com/antonblanchard/crc32-vpmsum/blob/master/vec_crc32.c
With this patch, the execution time can improve 47.81%.
Apart from the original fixes, this patch also gets two cases below
updated. During the regression testing I found two cases failed due
to icf optimization able to be adopted with this patch, the function
bodies use tail calls, the expected assembly instructions are gone.
Mark Eggleston [Fri, 21 Aug 2020 05:39:30 +0000 (06:39 +0100)]
Fortran : ICE for division by zero in declaration PR95882
A length expression containing a divide by zero in a character
declaration will result in an ICE if the constant is anymore
complicated that a contant divided by a constant.
The cause was that char_len_param_value can return MATCH_YES
even if a divide by zero was seen. Prior to returning check
whether a divide by zero was seen and if so set it to MATCH_ERROR.
2020-08-27 Mark Eggleston <markeggleston@gcc.gnu.org>
gcc/fortran
PR fortran/95882
* decl.c (char_len_param_value): Check gfc_seen_div0 and
if it is set return MATCH_ERROR.
2020-08-27 Mark Eggleston <markeggleston@gcc.gnu.org>
gcc/testsuite/
PR fortran/95882
* gfortran.dg/pr95882_1.f90: New test.
* gfortran.dg/pr95882_2.f90: New test.
* gfortran.dg/pr95882_3.f90: New test.
* gfortran.dg/pr95882_4.f90: New test.
* gfortran.dg/pr95882_5.f90: New test.
Christophe Lyon [Wed, 19 Aug 2020 09:02:21 +0000 (09:02 +0000)]
arm: Fix -mpure-code support/-mslow-flash-data for armv8-m.base [PR94538]
armv8-m.base (cortex-m23) has the movt instruction, so we need to
disable the define_split to generate a constant in this case,
otherwise we get incorrect insn constraints as described in PR94538.
We also need to fix the pure-code alternative for thumb1_movsi_insn
because the assembler complains with instructions like
movs r0, #:upper8_15:1234
(Internal error in md_apply_fix)
We now generate movs r0, 4 instead.
Jonathan Wakely [Wed, 26 Aug 2020 13:47:51 +0000 (14:47 +0100)]
libstdc++: Enable assertions in constexpr string_view members [PR 71960]
Since GCC 6.1 there is no reason we can't just use __glibcxx_assert in
constexpr functions in string_view. As long as the condition is true,
there will be no call to std::__replacement_assert that would make the
function ineligible for constant evaluation.
Runtime error occurs when the type of the value argument is
character(0): "Zero-length string passed as value...".
The status argument, intent(out), will contain -1 if the value
of the environment is too large to fit in the value argument, this
is the case if the type is character(0) so there is no reason to
produce a runtime error if the value argument is zero length.
2020-08-24 Mark Eggleston <markeggleston@gcc.gnu.org>
libgfortran/
PR fortran/96486
* intrinsics/env.c: If value_len is > 0 blank the string.
Copy the result only if its length is > 0.
2020-08-24 Mark Eggleston <markeggleston@gcc.gnu.org>
gcc/testsuite/
PR fortran/96486
* gfortran.dg/pr96486.f90: New test.
Tamar Christina [Mon, 3 Aug 2020 11:03:17 +0000 (12:03 +0100)]
AArch64: Fix hwasan failure in readline.
My previous fix added an unchecked call to fgets in the new function readline.
fgets can fail when there's an error reading the file in which case it returns
NULL. It also returns NULL when the next character is EOF.
The EOF case is already covered by the existing code but the error case isn't.
This fixes it by returning the empty string on error.
Also I now use strnlen instead of strlen to make sure we never read outside the
buffer.
This was flagged by Matthew Malcomson during his hwasan work.
gcc/ChangeLog:
* config/aarch64/driver-aarch64.c (readline): Check return value fgets.
Tamar Christina [Wed, 8 Jul 2020 13:32:34 +0000 (14:32 +0100)]
AArch64: Add test for -mcpu=native
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpunative/aarch64-cpunative.exp: New file.
* gcc.target/aarch64/cpunative/info_0: New test.
* gcc.target/aarch64/cpunative/info_1: New test.
* gcc.target/aarch64/cpunative/info_10: New test.
* gcc.target/aarch64/cpunative/info_11: New test.
* gcc.target/aarch64/cpunative/info_12: New test.
* gcc.target/aarch64/cpunative/info_13: New test.
* gcc.target/aarch64/cpunative/info_14: New test.
* gcc.target/aarch64/cpunative/info_15: New test.
* gcc.target/aarch64/cpunative/info_2: New test.
* gcc.target/aarch64/cpunative/info_3: New test.
* gcc.target/aarch64/cpunative/info_4: New test.
* gcc.target/aarch64/cpunative/info_5: New test.
* gcc.target/aarch64/cpunative/info_6: New test.
* gcc.target/aarch64/cpunative/info_7: New test.
* gcc.target/aarch64/cpunative/info_8: New test.
* gcc.target/aarch64/cpunative/info_9: New test.
* gcc.target/aarch64/cpunative/native_cpu_0.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_1.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_10.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_13.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_14.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_2.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_3.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_4.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_5.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_6.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_7.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_8.c: New test.
Tamar Christina [Fri, 17 Jul 2020 12:10:28 +0000 (13:10 +0100)]
AArch64: Fix bugs in -mcpu=native detection.
This patch fixes a couple of issues in AArch64's -mcpu=native processing:
The buffer used to read the lines from /proc/cpuinfo is 128 bytes long. While
this was enough in the past with the increase in architecture extensions it is
no longer enough. It results in two bugs:
1) No option string longer than 127 characters is correctly parsed. Features
that are supported are silently ignored.
2) It incorrectly enables features that are not present on the machine:
a) It checks for substring matching instead of full word matching. This makes
it incorrectly detect sb support when ssbs is provided instead.
b) Due to the truncation at the 127 char border it also incorrectly enables
features due to the full feature being cut off and the part that is left
accidentally enables something else.
This breaks -mcpu=native detection on some of our newer system.
The patch fixes these issues by reading full lines up to the \n in a string.
This gives us the full feature line. Secondly it creates a set from this string
to:
1) Reduce matching complexity from O(n*m) to O(n*logm).
2) Perform whole word matching instead of substring matching.
To make this code somewhat cleaner I also changed from using char* to using
std::string and std::set.
Note that I have intentionally avoided the use of ifstream and stringstream
to make it easier to backport. I have also not change the substring matching
for the initial line classification as I cannot find a documented cpuinfo format
which leads me to believe there may be kernels out there that require this which
may be why the original code does this.
I also do not want this to break if the kernel adds a new line that is long and
indents the file by two tabs to keep everything aligned. In short I think an
imprecise match is the right thing here.
Test for this is added as the last thing in this series as it requires some
changes to be made to be able to test this.
Jonathan Wakely [Wed, 19 Aug 2020 11:13:03 +0000 (12:13 +0100)]
libstdc++: Add deprecated attributes to old iostream members
Back in 2017 I removed these prehistoric members (which were deprecated
since C++98) for C++17 mode. But I didn't add deprecated attributes to
most of them, so users didn't get any warning they would be going away.
Apparently some poor souls do actually use some of these names, and so
now that GCC 11 defaults to -std=gnu++17 some code has stopped
compiling.
This adds deprecated attributes to them, so that C++98/03/11/14 code
will get a warning if it uses them. I'll also backport this to the
release branches so that users can find out about the deprecation before
they start using C++17.
libstdc++-v3/ChangeLog:
* include/bits/c++config (_GLIBCXX_DEPRECATED_SUGGEST): New
macro for "use 'foo' instead" message in deprecated warnings.
* include/bits/ios_base.h (io_state, open_mode, seek_dir)
(streampos, streamoff): Use _GLIBCXX_DEPRECATED_SUGGEST.
* include/std/streambuf (stossc): Replace C++11 attribute
with _GLIBCXX_DEPRECATED_SUGGEST.
* testsuite/27_io/types/1.cc: Check for deprecated warnings.
Also check for io_state, open_mode and seek_dir typedefs.
Kewen Lin [Wed, 12 Aug 2020 09:19:16 +0000 (04:19 -0500)]
testsuite: Add -fno-common to pr82374.c [PR94077]
As the PR comments show, the case gcc.dg/gomp/pr82374.c fails on
Power7 since gcc8. But it passes from gcc10. By looking into
the difference, it's due to that gcc10 sets -fno-common as default,
which makes vectorizer force the alignment and be able to use
aligned vector load/store on those targets which doesn't support
unaligned vector load/store (here it's Power7).
As Jakub suggested in the PR, this patch is to append -fno-common
into dg-options.
Verified with gcc8/gcc9 releases on ppc64-redhat-linux (Power7).
Christophe Lyon [Wed, 12 Aug 2020 09:22:38 +0000 (09:22 +0000)]
testsuite: Fix gcc.target/arm/stack-protector-1.c for Cortex-M
The stack-protector-1.c test fails when compiled for Cortex-M:
- for Cortex-M0/M1, str r0, [sp #-8]! is not supported
- for Cortex-M3/M4..., the assembler complains that "use of r13 is
deprecated"
This patch replaces the str instruction with
sub sp, sp, #8
str r0, [sp]
and removes the check for r13, which is unlikely to leak the canary
value.
Jakub Jelinek [Mon, 3 Aug 2020 20:55:28 +0000 (22:55 +0200)]
aarch64: Fix up __aarch64_cas16_acq_rel fallback
As mentioned in the PR, the fallback path when LSE is unavailable writes
incorrect registers to the memory if the previous content compares equal
to x0, x1 - it writes copy of x0, x1 from the start of function, but it
should write x2, x3.
2020-08-03 Jakub Jelinek <jakub@redhat.com>
PR target/96402
* config/aarch64/lse.S (__aarch64_cas16_acq_rel): Use x2, x3 instead
of x(tmp0), x(tmp1) in STXP arguments.
arm: Clear canary value after stack_protect_test [PR96191]
The stack_protect_test patterns were leaving the canary value in the
temporary register, meaning that it was often still in registers on
return from the function. An attacker might therefore have been
able to use it to defeat stack-smash protection for a later function.
gcc/
PR target/96191
* config/arm/arm.md (arm_stack_protect_test_insn): Zero out
operand 2 after use.
* config/arm/thumb1.md (thumb1_stack_protect_test_insn): Likewise.
gcc/testsuite/
* gcc.target/arm/stack-protector-1.c: New test.
* gcc.target/arm/stack-protector-2.c: Likewise.
aarch64: Clear canary value after stack_protect_test [PR96191]
The stack_protect_test patterns were leaving the canary value in the
temporary register, meaning that it was often still in registers on
return from the function. An attacker might therefore have been
able to use it to defeat stack-smash protection for a later function.
gcc/
PR target/96191
* config/aarch64/aarch64.md (stack_protect_test_<mode>): Set the
CC register directly, instead of a GPR. Replace the original GPR
destination with an extra scratch register. Zero out operand 3
after use.
(stack_protect_test): Update accordingly.
gcc/testsuite/
PR target/96191
* gcc.target/aarch64/stack-protector-1.c: New test.
* gcc.target/aarch64/stack-protector-2.c: Likewise.
This function was unimplemented, simply returning the native format
string instead.
PR libstdc++/93245
* include/experimental/bits/fs_path.h (path::generic_string<C,T,A>()):
Return the generic format path, not the native one.
* testsuite/experimental/filesystem/path/generic/generic_string.cc:
Improve test coverage.
It's not possible to construct a path::string_type from an allocator of
a different type. Create the correct specialization of basic_string, and
adjust path::_S_str_convert to use a basic_string_view so that it is
independent of the allocator type.
PR libstdc++/94242
* include/bits/fs_path.h (path::_S_str_convert): Replace first
parameter with basic_string_view so that strings with different
allocators can be accepted.
(path::generic_string<C,T,A>()): Use basic_string object that uses the
right allocator type.
* testsuite/27_io/filesystem/path/generic/94242.cc: New test.
* testsuite/27_io/filesystem/path/generic/generic_string.cc: Improve
test coverage.
Qian Jianhua [Fri, 7 Aug 2020 09:39:40 +0000 (10:39 +0100)]
aarch64: Add A64FX machine model
This patch add support for Fujitsu A64FX, as the first step of adding
A64FX machine model.
A64FX is used in FUJITSU Supercomputer PRIMEHPC FX1000,
PRIMEHPC FX700, and supercomputer Fugaku.
The official microarchitecture information of A64FX can be read at
https://github.com/fujitsu/A64FX.
2020-08-07 Qian jianhua <qianjh@cn.fujitsu.com>
gcc/
* config/aarch64/aarch64-cores.def (a64fx): New core.
* config/aarch64/aarch64-tune.md: Regenerated.
* config/aarch64/aarch64.c (a64fx_prefetch_tune, a64fx_tunings): New.
* doc/invoke.texi: Add a64fx to the list.
early-remat: Handle sets of multiple candidate regs [PR94605]
early-remat.c:process_block wasn't handling insns that set multiple
candidate registers, which led to an assertion failure at the end
of the main loop.
Instructions that set two pseudos aren't rematerialisation candidates in
themselves, but we still need to track them if another instruction that
sets the same register is a rematerialisation candidate.
gcc/
PR rtl-optimization/94605
* early-remat.c (early_remat::process_block): Handle insns that
set multiple candidate registers.
gcc/testsuite/
PR rtl-optimization/94605
* gcc.target/aarch64/sve/pr94605.c: New test.
ipa-devirt: Fix crash in obj_type_ref_class [PR95114]
The testcase has failed since r9-5035, because obj_type_ref_class
tries to look up an ODR type when no ODR type information is
available. (The information was available earlier in the
compilation, but was freed during pass_ipa_free_lang_data.)
We then crash dereferencing the null get_odr_type result.
The test passes with -O2. However, it fails again if -fdump-tree-all
is used, since obj_type_ref_class is called indirectly from the
dump routines.
Other code creates ODR type entries on the fly by passing “true”
as the insert parameter. But obj_type_ref_class can't do that
unconditionally, since it should have no side-effects when used
from the dumping code.
Following a suggestion from Honza, this patch adds parameters
to say whether the routines are being called from dump routines
and uses those to derive the insert parameter.
gcc/
PR middle-end/95114
* tree.h (virtual_method_call_p): Add a default-false parameter
that indicates whether the function is being called from dump
routines.
(obj_type_ref_class): Likewise.
* tree.c (virtual_method_call_p): Likewise.
* ipa-devirt.c (obj_type_ref_class): Likewise. Lazily add ODR
type information for the type when the parameter is false.
* tree-pretty-print.c (dump_generic_node): Update calls to
virtual_method_call_p and obj_type_ref_class accordingly.
gcc/testsuite/
PR middle-end/95114
* g++.target/aarch64/pr95114.C: New test.
This patch introduces the mitigation for Straight Line Speculation past
the BLR instruction.
This mitigation replaces BLR instructions with a BL to a stub which uses
a BR to jump to the original value. These function stubs are then
appended with a speculation barrier to ensure no straight line
speculation happens after these jumps.
When optimising for speed we use a set of stubs for each function since
this should help the branch predictor make more accurate predictions
about where a stub should branch.
When optimising for size we use one set of stubs for all functions.
This set of stubs can have human readable names, and we are using
`__call_indirect_x<N>` for register x<N>.
When BTI branch protection is enabled the BLR instruction can jump to a
`BTI c` instruction using any register, while the BR instruction can
only jump to a `BTI c` instruction using the x16 or x17 registers.
Hence, in order to ensure this transformation is safe we mov the value
of the original register into x16 and use x16 for the BR.
As an example when optimising for size:
a
BLR x0
instruction would get transformed to something like
BL __call_indirect_x0
where __call_indirect_x0 labels a thunk that contains
__call_indirect_x0:
MOV X16, X0
BR X16
<speculation barrier>
The first version of this patch used local symbols specific to a
compilation unit to try and avoid relocations.
This was mistaken since functions coming from the same compilation unit
can still be in different sections, and the assembler will insert
relocations at jumps between sections.
On any relocation the linker is permitted to emit a veneer to handle
jumps between symbols that are very far apart. The registers x16 and
x17 may be clobbered by these veneers.
Hence the function stubs cannot rely on the values of x16 and x17 being
the same as just before the function stub is called.
Similar can be said for the hot/cold partitioning of single functions,
so function-local stubs have the same restriction.
This updated version of the patch never emits function stubs for x16 and
x17, and instead forces other registers to be used.
Given the above, there is now no benefit to local symbols (since they
are not enough to avoid dealing with linker intricacies). This patch
now uses global symbols with hidden visibility each stored in their own
COMDAT section. This means stubs can be shared between compilation
units while still avoiding the PLT indirection.
This patch also removes the `__call_indirect_x30` stub (and
function-local equivalent) which would simply jump back to the original
location.
The function-local stubs are emitted to the assembly output file in one
chunk, which means we need not add the speculation barrier directly
after each one.
This is because we know for certain that the instructions directly after
the BR in all but the last function stub will be from another one of
these stubs and hence will not contain a speculation gadget.
Instead we add a speculation barrier at the end of the sequence of
stubs.
The global stubs are emitted in COMDAT/.linkonce sections by
themselves so that the linker can remove duplicates from multiple object
files. This means they are not emitted in one chunk, and each one must
include the speculation barrier.
Another difference is that since the global stubs are shared across
compilation units we do not know that all functions will be targeting an
architecture supporting the SB instruction.
Rather than provide multiple stubs for each architecture, we provide a
stub that will work for all architectures -- using the DSB+ISB barrier.
This mitigation does not apply for BLR instructions in the following
places:
- Some accesses to thread-local variables use a code sequence with a BLR
instruction. This code sequence is part of the binary interface between
compiler and linker. If this BLR instruction needs to be mitigated, it'd
probably be best to do so in the linker. It seems that the code sequence
for thread-local variable access is unlikely to lead to a Spectre Revalation
Gadget.
- PLT stubs are produced by the linker and each contain a BLR instruction.
It seems that at most only after the last PLT stub a Spectre Revalation
Gadget might appear.
Testing:
Bootstrap and regtest on AArch64
(with BOOT_CFLAGS="-mharden-sls=retbr,blr")
Used a temporary hack(1) in gcc-dg.exp to use these options on every
test in the testsuite, a slight modification to emit the speculation
barrier after every function stub, and a script to check that the
output never emitted a BLR, or unmitigated BR or RET instruction.
Similar on an aarch64-none-elf cross-compiler.
1) Temporary hack emitted a speculation barrier at the end of every stub
function, and used a script to ensure that:
a) Every RET or BR is immediately followed by a speculation barrier.
b) No BLR instruction is emitted by compiler.
* config/aarch64/aarch64-protos.h (aarch64_indirect_call_asm):
New declaration.
* config/aarch64/aarch64.c (aarch64_regno_regclass): Handle new
stub registers class.
(aarch64_class_max_nregs): Likewise.
(aarch64_register_move_cost): Likewise.
(aarch64_sls_shared_thunks): Global array to store stub labels.
(aarch64_sls_emit_function_stub): New.
(aarch64_create_blr_label): New.
(aarch64_sls_emit_blr_function_thunks): New.
(aarch64_sls_emit_shared_blr_thunks): New.
(aarch64_asm_file_end): New.
(aarch64_indirect_call_asm): New.
(TARGET_ASM_FILE_END): Use aarch64_asm_file_end.
(TARGET_ASM_FUNCTION_EPILOGUE): Use
aarch64_sls_emit_blr_function_thunks.
* config/aarch64/aarch64.h (STB_REGNUM_P): New.
(enum reg_class): Add STUB_REGS class.
(machine_function): Introduce `call_via` array for
function-local stub labels.
* config/aarch64/aarch64.md (*call_insn, *call_value_insn): Use
aarch64_indirect_call_asm to emit code when hardening BLR
instructions.
* config/aarch64/constraints.md (Ucr): New constraint
representing registers for indirect calls. Is GENERAL_REGS
usually, and STUB_REGS when hardening BLR instruction against
SLS.
* config/aarch64/predicates.md (aarch64_general_reg): STUB_REGS class
is also a general register.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c: New test.
* gcc.target/aarch64/sls-mitigation/sls-miti-blr.c: New test.
aarch64: Introduce SLS mitigation for RET and BR instructions
Instructions following RET or BR are not necessarily executed. In order
to avoid speculation past RET and BR we can simply append a speculation
barrier.
Since these speculation barriers will not be architecturally executed,
they are not expected to add a high performance penalty.
The speculation barrier is to be SB when targeting architectures which
have this enabled, and DSB SY + ISB otherwise.
We add tests for each of the cases where such an instruction was seen.
This is implemented by modifying each machine description pattern that
emits either a RET or a BR instruction. We choose not to use something
like `TARGET_ASM_FUNCTION_EPILOGUE` since it does not affect the
`indirect_jump`, `jump`, `sibcall_insn` and `sibcall_value_insn`
patterns and we find it preferable to implement the functionality in the
same way for every pattern.
There is one particular case which is slightly tricky. The
implementation of TARGET_ASM_TRAMPOLINE_TEMPLATE uses a BR which needs
to be mitigated against. The trampoline template is used *once* per
compilation unit, and the TRAMPOLINE_SIZE is exposed to the user via the
builtin macro __LIBGCC_TRAMPOLINE_SIZE__.
In the future we may implement function specific attributes to turn on
and off hardening on a per-function basis.
The fixed nature of the trampoline described above implies it will be
safer to ensure this speculation barrier is always used.
Testing:
Bootstrap and regtest done on aarch64-none-linux
Used a temporary hack(1) to use these options on every test in the
testsuite and a script to check that the output never emitted an
unmitigated RET or BR.
1) Temporary hack was a change to the testsuite to always use
`-save-temps` and run a script on the assembly output of those
compilations which produced one to ensure every RET or BR is immediately
followed by a speculation barrier.
* config/aarch64/aarch64-protos.h (aarch64_sls_barrier): New.
* config/aarch64/aarch64.c (aarch64_output_casesi): Emit
speculation barrier after BR instruction if needs be.
(aarch64_trampoline_init): Handle ptr_mode value & adjust size
of code copied.
(aarch64_sls_barrier): New.
(aarch64_asm_trampoline_template): Add needed barriers.
* config/aarch64/aarch64.h (AARCH64_ISA_SB): New.
(TARGET_SB): New.
(TRAMPOLINE_SIZE): Account for barrier.
* config/aarch64/aarch64.md (indirect_jump, *casesi_dispatch,
simple_return, *do_return, *sibcall_insn, *sibcall_value_insn):
Emit barrier if needs be, also account for possible barrier using
"sls_length" attribute.
(sls_length): New attribute.
(length): Determine default using any non-default sls_length
value.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c: New test.
* gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c:
New test.
* gcc.target/aarch64/sls-mitigation/sls-mitigation.exp: New file.
* lib/target-supports.exp (check_effective_target_aarch64_asm_sb_ok):
New proc.
aarch64: New Straight Line Speculation (SLS) mitigation flags
Here we introduce the flags that will be used for straight line speculation.
The new flag introduced is `-mharden-sls=`.
This flag can take arguments of `none`, `all`, or a comma seperated list
of one or more of `retbr` or `blr`.
`none` indicates no special mitigation of the straight line speculation
vulnerability.
`all` requests all mitigations currently implemented.
`retbr` requests that the RET and BR instructions have a speculation
barrier inserted after them.
`blr` requests that BLR instructions are replaced by a BL to a function
stub using a BR with a speculation barrier after it.
Setting this on a per-function basis using attributes or the like is not
enabled, but may be in the future.
Both intrinsics did not handle the case where the va_list object comes
from a ref parameter.
gcc/d/ChangeLog:
PR d/96140
* intrinsics.cc (expand_intrinsic_vaarg): Handle ref parameters as
arguments to va_arg().
(expand_intrinsic_vastart): Handle ref parameters as arguments to
va_start().