[PATCH, GCC/ARM] Fix PR77933: stack corruption on ARM when using high registers and lr

Thomas Preudhomme thomas.preudhomme@foss.arm.com
Wed Nov 30 10:41:00 GMT 2016



On 30/11/16 10:04, Richard Earnshaw (lists) wrote:
> On 30/11/16 09:50, Thomas Preudhomme wrote:
>> Hi,
>>
>> Is this ok to backport to gcc-5-branch and gcc-6-branch? Patch applies
>> cleanly (patches attached for reference).
>>
>>
>> 2016-11-17  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>
>>     Backport from mainline
>>     2016-11-17  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>
>>     gcc/
>>     PR target/77933
>>     * config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr
>>     being live in the function and lr needing to be saved.  Distinguish
>>     between already saved pushable registers and registers to push.
>>     Check for LR being an available pushable register.
>>
>>     gcc/testsuite/
>>     PR target/77933
>>     * gcc.target/arm/pr77933-1.c: New test.
>>     * gcc.target/arm/pr77933-2.c: Likewise.
>>
>
> Your attached patch doesn't appear to match your ChangeLog.  (rmprofile
> patch?).

There is 3 attached patches. I attached one by mistake indeed (2 double-clicks 
instead of 1?). Let's try again with only the 2 correct patches.

Best regards,

Thomas
>
> R.
>
>>
>> Best regards,
>>
>> Thomas
>>
>>
>> On 17/11/16 20:15, Thomas Preudhomme wrote:
>>> Hi Kyrill,
>>>
>>> I've committed the following updated patch where the test is
>>> restricted to Thumb
>>> execution mode and skipping it if not possible since -mtpcs-leaf-frame
>>> is only
>>> available in Thumb mode. I've considered the change obvious.
>>>
>>> *** gcc/ChangeLog ***
>>>
>>> 2016-11-08  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>>
>>>         PR target/77933
>>>         * config/arm/arm.c (thumb1_expand_prologue): Distinguish
>>> between lr
>>>         being live in the function and lr needing to be saved.
>>> Distinguish
>>>         between already saved pushable registers and registers to push.
>>>         Check for LR being an available pushable register.
>>>
>>>
>>> *** gcc/testsuite/ChangeLog ***
>>>
>>> 2016-11-08  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>>
>>>         PR target/77933
>>>         * gcc.target/arm/pr77933-1.c: New test.
>>>         * gcc.target/arm/pr77933-2.c: Likewise.
>>>
>>> Best regards,
>>>
>>> Thomas
>>>
>>> On 17/11/16 10:04, Kyrill Tkachov wrote:
>>>>
>>>> On 09/11/16 16:41, Thomas Preudhomme wrote:
>>>>> I've reworked the patch following comments from Wilco [1] (sorry
>>>>> could not
>>>>> find it in my MUA for some reason).
>>>>>
>>>>> [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00317.html
>>>>>
>>>>>
>>>>> == Context ==
>>>>>
>>>>> When saving registers, function thumb1_expand_prologue () aims at
>>>>> minimizing
>>>>> the number of push instructions. One of the optimization it does is
>>>>> to push LR
>>>>> alongside high register(s) (after having moved them to low
>>>>> register(s)) when
>>>>> there is no low register to save. The way this is implemented is to
>>>>> add LR to
>>>>> the pushable_regs mask if it is live just before pushing the
>>>>> registers in that
>>>>> mask. The mask of live pushable registers which is used to detect
>>>>> whether LR
>>>>> needs to be saved is then clear to ensure LR is only saved once.
>>>>>
>>>>>
>>>>> == Problem ==
>>>>>
>>>>> However beyond deciding what register to push pushable_regs is used
>>>>> to track
>>>>> what pushable register can be used to move a high register before being
>>>>> pushed, hence the name. That mask is cleared when all high registers
>>>>> have been
>>>>> assigned a low register but the clearing assumes the high registers
>>>>> were
>>>>> assigned to the registers with the biggest number in that mask. This
>>>>> is not
>>>>> the case because LR is not considered when looking for a register in
>>>>> that
>>>>> mask. Furthermore, LR might have been saved in the TARGET_BACKTRACE
>>>>> path above
>>>>> yet the mask of live pushable registers is not cleared in that case.
>>>>>
>>>>>
>>>>> == Solution ==
>>>>>
>>>>> This patch changes the loop to iterate over register LR to r0 so as
>>>>> to both
>>>>> fix the stack corruption reported in PR77933 and reuse lr to push
>>>>> some high
>>>>> register when possible. This patch also introduce a new variable
>>>>> lr_needs_saving to record whether LR (still) needs to be saved at a
>>>>> given
>>>>> point in code and sets the variable accordingly throughout the code,
>>>>> thus
>>>>> fixing the second issue. Finally, this patch create a new push_mask
>>>>> variable
>>>>> to distinguish between the mask of registers to push and the mask of
>>>>> live
>>>>> pushable registers.
>>>>>
>>>>>
>>>>> == Note ==
>>>>>
>>>>> Other bits could have been improved but have been left out to allow
>>>>> the patch
>>>>> to be backported to stable branch:
>>>>>
>>>>> (1) using argument registers that are not holding an argument
>>>>> (2) using push_mask consistently instead of l_mask (in
>>>>> TARGET_BACKTRACE), mask
>>>>> (low register push) and push_mask
>>>>> (3) the !l_mask case improved in TARGET_BACKTRACE since offset == 0
>>>>> (4) rename l_mask to a more appropriate name (live_pushable_regs_mask?)
>>>>>
>>>>> ChangeLog entry are as follow:
>>>>>
>>>>> *** gcc/ChangeLog ***
>>>>>
>>>>> 2016-11-08  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>>>>
>>>>>         PR target/77933
>>>>>         * config/arm/arm.c (thumb1_expand_prologue): Distinguish
>>>>> between lr
>>>>>         being live in the function and lr needing to be saved.
>>>>> Distinguish
>>>>>         between already saved pushable registers and registers to push.
>>>>>         Check for LR being an available pushable register.
>>>>>
>>>>>
>>>>> *** gcc/testsuite/ChangeLog ***
>>>>>
>>>>> 2016-11-08  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>>>>>
>>>>>         PR target/77933
>>>>>         * gcc.target/arm/pr77933-1.c: New test.
>>>>>         * gcc.target/arm/pr77933-2.c: Likewise.
>>>>>
>>>>>
>>>>> Testing: no regression on arm-none-eabi GCC cross-compiler targeting
>>>>> Cortex-M0
>>>>>
>>>>> Is this ok for trunk?
>>>>>
>>>>
>>>> Ok.
>>>> Thanks,
>>>> Kyrill
>>>>
>>>>> Best regards,
>>>>>
>>>>> Thomas
>>>>>
>>>>> On 02/11/16 17:08, Thomas Preudhomme wrote:
>>>>>> Hi,
>>>>>>
>>>>>> When saving registers, function thumb1_expand_prologue () aims at
>>>>>> minimizing
>>>>>> the
>>>>>> number of push instructions. One of the optimization it does is to
>>>>>> push lr
>>>>>> alongside high register(s) (after having moved them to low
>>>>>> register(s)) when
>>>>>> there is no low register to save. The way this is implemented is to
>>>>>> add lr to
>>>>>> the list of registers that can be pushed just before the push
>>>>>> happens. This
>>>>>> would then push lr and allows it to be used for further push if
>>>>>> there was not
>>>>>> enough registers to push all high registers to be pushed.
>>>>>>
>>>>>> However, the logic that decides what register to move high
>>>>>> registers to before
>>>>>> being pushed only looks at low registers (see for loop
>>>>>> initialization). This
>>>>>> means not only that lr is not used for pushing high registers but
>>>>>> also that lr
>>>>>> is not removed from the list of registers to be pushed when it's
>>>>>> not used. This
>>>>>> extra lr push is not poped in epilogue leading in stack corruption.
>>>>>>
>>>>>> This patch changes the loop to iterate over register r0 to lr so as
>>>>>> to both fix
>>>>>> the stack corruption and reuse lr to push some high register when
>>>>>> possible.
>>>>>>
>>>>>> ChangeLog entry are as follow:
>>>>>>
>>>>>> *** gcc/ChangeLog ***
>>>>>>
>>>>>> 2016-11-01  Thomas Preud'homme <thomas.preudhomme@arm.com>
>>>>>>
>>>>>>         PR target/77933
>>>>>>         * config/arm/arm.c (thumb1_expand_prologue): Also check for
>>>>>> lr being a
>>>>>>         pushable register.
>>>>>>
>>>>>>
>>>>>> *** gcc/testsuite/ChangeLog ***
>>>>>>
>>>>>> 2016-11-01  Thomas Preud'homme <thomas.preudhomme@arm.com>
>>>>>>
>>>>>>         PR target/77933
>>>>>>         * gcc.target/arm/pr77933.c: New test.
>>>>>>
>>>>>>
>>>>>> Testing: no regression on arm-none-eabi GCC cross-compiler
>>>>>> targeting Cortex-M0
>>>>>>
>>>>>> Is this ok for trunk?
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Thomas
>>>>
>>
>> 1_rmprofile_multilib.patch
>>
>>
>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>> index d956da22ad60abfe9c6b4be0882f9e7dd64ac39f..15b662ad5449f8b91eb760b7fbe45f33d8cecb4b 100644
>> --- a/gcc/config.gcc
>> +++ b/gcc/config.gcc
>> @@ -3739,6 +3739,16 @@ case "${target}" in
>>  				# pragmatic.
>>  				tmake_profile_file="arm/t-aprofile"
>>  				;;
>> +			rmprofile)
>> +				# Note that arm/t-rmprofile is a
>> +				# stand-alone make file fragment to be
>> +				# used only with itself.  We do not
>> +				# specifically use the
>> +				# TM_MULTILIB_OPTION framework because
>> +				# this shorthand is more
>> +				# pragmatic.
>> +				tmake_profile_file="arm/t-rmprofile"
>> +				;;
>>  			default)
>>  				;;
>>  			*)
>> @@ -3748,9 +3758,10 @@ case "${target}" in
>>  			esac
>>
>>  			if test "x${tmake_profile_file}" != x ; then
>> -				# arm/t-aprofile is only designed to work
>> -				# without any with-cpu, with-arch, with-mode,
>> -				# with-fpu or with-float options.
>> +				# arm/t-aprofile and arm/t-rmprofile are only
>> +				# designed to work without any with-cpu,
>> +				# with-arch, with-mode, with-fpu or with-float
>> +				# options.
>>  				if test "x$with_arch" != x \
>>  				    || test "x$with_cpu" != x \
>>  				    || test "x$with_float" != x \
>> diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..c8b5c9cbd03694eea69855e20372afa3e97d6b4c
>> --- /dev/null
>> +++ b/gcc/config/arm/t-rmprofile
>> @@ -0,0 +1,174 @@
>> +# Copyright (C) 2016 Free Software Foundation, Inc.
>> +#
>> +# This file is part of GCC.
>> +#
>> +# GCC is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3, or (at your option)
>> +# any later version.
>> +#
>> +# GCC is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with GCC; see the file COPYING3.  If not see
>> +# <http://www.gnu.org/licenses/>.
>> +
>> +# This is a target makefile fragment that attempts to get
>> +# multilibs built for the range of CPU's, FPU's and ABI's that
>> +# are relevant for the ARM architecture.  It should not be used in
>> +# conjunction with another make file fragment and assumes --with-arch,
>> +# --with-cpu, --with-fpu, --with-float, --with-mode have their default
>> +# values during the configure step.  We enforce this during the
>> +# top-level configury.
>> +
>> +MULTILIB_OPTIONS     =
>> +MULTILIB_DIRNAMES    =
>> +MULTILIB_EXCEPTIONS  =
>> +MULTILIB_MATCHES     =
>> +MULTILIB_REUSE       =
>> +
>> +# We have the following hierachy:
>> +#   ISA: A32 (.) or T16/T32 (thumb).
>> +#   Architecture: ARMv6S-M (v6-m), ARMv7-M (v7-m), ARMv7E-M (v7e-m),
>> +#                 ARMv8-M Baseline (v8-m.base) or ARMv8-M Mainline (v8-m.main).
>> +#   FPU: VFPv3-D16 (fpv3), FPV4-SP-D16 (fpv4-sp), FPV5-SP-D16 (fpv5-sp),
>> +#        VFPv5-D16 (fpv5), or None (.).
>> +#   Float-abi: Soft (.), softfp (softfp), or hard (hardfp).
>> +
>> +# Options to build libraries with
>> +
>> +MULTILIB_OPTIONS       += mthumb
>> +MULTILIB_DIRNAMES      += thumb
>> +
>> +MULTILIB_OPTIONS       += march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7/march=armv8-m.base/march=armv8-m.main
>> +MULTILIB_DIRNAMES      += v6-m v7-m v7e-m v7-ar v8-m.base v8-m.main
>> +
>> +MULTILIB_OPTIONS       += mfpu=vfpv3-d16/mfpu=fpv4-sp-d16/mfpu=fpv5-sp-d16/mfpu=fpv5-d16
>> +MULTILIB_DIRNAMES      += fpv3 fpv4-sp fpv5-sp fpv5
>> +
>> +MULTILIB_OPTIONS       += mfloat-abi=softfp/mfloat-abi=hard
>> +MULTILIB_DIRNAMES      += softfp hard
>> +
>> +
>> +# Option combinations to build library with
>> +
>> +# Default CPU/Arch
>> +MULTILIB_REQUIRED      += mthumb
>> +MULTILIB_REQUIRED      += mfloat-abi=hard
>> +
>> +# ARMv6-M
>> +MULTILIB_REQUIRED      += mthumb/march=armv6s-m
>> +
>> +# ARMv8-M Baseline
>> +MULTILIB_REQUIRED      += mthumb/march=armv8-m.base
>> +
>> +# ARMv7-M
>> +MULTILIB_REQUIRED      += mthumb/march=armv7-m
>> +
>> +# ARMv7E-M
>> +MULTILIB_REQUIRED      += mthumb/march=armv7e-m
>> +MULTILIB_REQUIRED      += mthumb/march=armv7e-m/mfpu=fpv4-sp-d16/mfloat-abi=softfp
>> +MULTILIB_REQUIRED      += mthumb/march=armv7e-m/mfpu=fpv4-sp-d16/mfloat-abi=hard
>> +MULTILIB_REQUIRED      += mthumb/march=armv7e-m/mfpu=fpv5-d16/mfloat-abi=softfp
>> +MULTILIB_REQUIRED      += mthumb/march=armv7e-m/mfpu=fpv5-d16/mfloat-abi=hard
>> +MULTILIB_REQUIRED      += mthumb/march=armv7e-m/mfpu=fpv5-sp-d16/mfloat-abi=softfp
>> +MULTILIB_REQUIRED      += mthumb/march=armv7e-m/mfpu=fpv5-sp-d16/mfloat-abi=hard
>> +
>> +# ARMv8-M Mainline
>> +MULTILIB_REQUIRED      += mthumb/march=armv8-m.main
>> +MULTILIB_REQUIRED      += mthumb/march=armv8-m.main/mfpu=fpv5-d16/mfloat-abi=softfp
>> +MULTILIB_REQUIRED      += mthumb/march=armv8-m.main/mfpu=fpv5-d16/mfloat-abi=hard
>> +MULTILIB_REQUIRED      += mthumb/march=armv8-m.main/mfpu=fpv5-sp-d16/mfloat-abi=softfp
>> +MULTILIB_REQUIRED      += mthumb/march=armv8-m.main/mfpu=fpv5-sp-d16/mfloat-abi=hard
>> +
>> +# ARMv7-R as well as ARMv7-A and ARMv8-A if aprofile was not specified
>> +MULTILIB_REQUIRED      += mthumb/march=armv7
>> +MULTILIB_REQUIRED      += mthumb/march=armv7/mfpu=vfpv3-d16/mfloat-abi=softfp
>> +MULTILIB_REQUIRED      += mthumb/march=armv7/mfpu=vfpv3-d16/mfloat-abi=hard
>> +
>> +
>> +# Matches
>> +
>> +# CPU Matches
>> +MULTILIB_MATCHES       += march?armv6s-m=mcpu?cortex-m0
>> +MULTILIB_MATCHES       += march?armv6s-m=mcpu?cortex-m0.small-multiply
>> +MULTILIB_MATCHES       += march?armv6s-m=mcpu?cortex-m0plus
>> +MULTILIB_MATCHES       += march?armv6s-m=mcpu?cortex-m0plus.small-multiply
>> +MULTILIB_MATCHES       += march?armv6s-m=mcpu?cortex-m1
>> +MULTILIB_MATCHES       += march?armv6s-m=mcpu?cortex-m1.small-multiply
>> +MULTILIB_MATCHES       += march?armv7-m=mcpu?cortex-m3
>> +MULTILIB_MATCHES       += march?armv7e-m=mcpu?cortex-m4
>> +MULTILIB_MATCHES       += march?armv7e-m=mcpu?cortex-m7
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-r4
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-r4f
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-r5
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-r7
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-r8
>> +MULTILIB_MATCHES       += march?armv7=mcpu?marvell-pj4
>> +MULTILIB_MATCHES       += march?armv7=mcpu?generic-armv7-a
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a8
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a9
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a5
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a7
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a15
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a12
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a17
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a15.cortex-a7
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a17.cortex-a7
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a32
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a35
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a53
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a57
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a57.cortex-a53
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a72
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a72.cortex-a53
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a73
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a73.cortex-a35
>> +MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a73.cortex-a53
>> +MULTILIB_MATCHES       += march?armv7=mcpu?exynos-m1
>> +MULTILIB_MATCHES       += march?armv7=mcpu?qdf24xx
>> +MULTILIB_MATCHES       += march?armv7=mcpu?xgene1
>> +
>> +# Arch Matches
>> +MULTILIB_MATCHES       += march?armv6s-m=march?armv6-m
>> +MULTILIB_MATCHES       += march?armv8-m.main=march?armv8-m.main+dsp
>> +MULTILIB_MATCHES       += march?armv7=march?armv7-r
>> +ifeq (,$(HAS_APROFILE))
>> +MULTILIB_MATCHES       += march?armv7=march?armv7-a
>> +MULTILIB_MATCHES       += march?armv7=march?armv7ve
>> +MULTILIB_MATCHES       += march?armv7=march?armv8-a
>> +MULTILIB_MATCHES       += march?armv7=march?armv8-a+crc
>> +MULTILIB_MATCHES       += march?armv7=march?armv8.1-a
>> +MULTILIB_MATCHES       += march?armv7=march?armv8.1-a+crc
>> +MULTILIB_MATCHES       += march?armv7=march?armv8.2-a
>> +MULTILIB_MATCHES       += march?armv7=march?armv8.2-a+fp16
>> +endif
>> +
>> +# FPU matches
>> +ifeq (,$(HAS_APROFILE))
>> +MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?vfpv3
>> +MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?vfpv3-fp16
>> +MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?vfpv3-d16-fp16
>> +MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?neon
>> +MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?neon-fp16
>> +MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?vfpv4
>> +MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?vfpv4-d16
>> +MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?neon-vfpv4
>> +MULTILIB_MATCHES       += mfpu?fpv5-d16=mfpu?fp-armv8
>> +MULTILIB_MATCHES       += mfpu?fpv5-d16=mfpu?neon-fp-armv8
>> +MULTILIB_MATCHES       += mfpu?fpv5-d16=mfpu?crypto-neon-fp-armv8
>> +endif
>> +
>> +
>> +# We map all requests for ARMv7-R or ARMv7-A in ARM mode to Thumb mode and
>> +# any FPU to VFPv3-d16 if possible.
>> +MULTILIB_REUSE         += mthumb/march.armv7=march.armv7
>> +MULTILIB_REUSE         += mthumb/march.armv7/mfpu.vfpv3-d16/mfloat-abi.softfp=march.armv7/mfpu.vfpv3-d16/mfloat-abi.softfp
>> +MULTILIB_REUSE         += mthumb/march.armv7/mfpu.vfpv3-d16/mfloat-abi.hard=march.armv7/mfpu.vfpv3-d16/mfloat-abi.hard
>> +MULTILIB_REUSE         += mthumb/march.armv7/mfpu.vfpv3-d16/mfloat-abi.softfp=march.armv7/mfpu.fpv5-d16/mfloat-abi.softfp
>> +MULTILIB_REUSE         += mthumb/march.armv7/mfpu.vfpv3-d16/mfloat-abi.hard=march.armv7/mfpu.fpv5-d16/mfloat-abi.hard
>> +MULTILIB_REUSE         += mthumb/march.armv7/mfpu.vfpv3-d16/mfloat-abi.softfp=mthumb/march.armv7/mfpu.fpv5-d16/mfloat-abi.softfp
>> +MULTILIB_REUSE         += mthumb/march.armv7/mfpu.vfpv3-d16/mfloat-abi.hard=mthumb/march.armv7/mfpu.fpv5-d16/mfloat-abi.hard
>> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
>> index e4c686e60c7f479ca3ea71e94c4bb6ad52373085..0b94bc1931a226e58d06a7ed5a726454142c006a 100644
>> --- a/gcc/doc/install.texi
>> +++ b/gcc/doc/install.texi
>> @@ -1107,19 +1107,59 @@ sysv, aix.
>>
>>  @item --with-multilib-list=@var{list}
>>  @itemx --without-multilib-list
>> -Specify what multilibs to build.
>> -Currently only implemented for arm*-*-*, sh*-*-* and x86-64-*-linux*.
>> +Specify what multilibs to build.  @var{list} is a comma separated list of
>> +values, possibly consisting of a single value.  Currently only implemented
>> +for arm*-*-*, sh*-*-* and x86-64-*-linux*.  The accepted values and meaning
>> +for each target is given below.
>>
>>  @table @code
>>  @item arm*-*-*
>> -@var{list} is either @code{default} or @code{aprofile}.  Specifying
>> -@code{default} is equivalent to omitting this option while specifying
>> -@code{aprofile} builds multilibs for each combination of ISA (@code{-marm} or
>> -@code{-mthumb}), architecture (@code{-march=armv7-a}, @code{-march=armv7ve},
>> -or @code{-march=armv8-a}), FPU available (none, @code{-mfpu=vfpv3-d16},
>> -@code{-mfpu=neon}, @code{-mfpu=vfpv4-d16}, @code{-mfpu=neon-vfpv4} or
>> -@code{-mfpu=neon-fp-armv8} depending on architecture) and floating-point ABI
>> -(@code{-mfloat-abi=softfp} or @code{-mfloat-abi=hard}).
>> +@var{list} is one of@code{default}, @code{aprofile} or @code{rmprofile}.
>> +Specifying @code{default} is equivalent to omitting this option, ie. only the
>> +default runtime library will be enabled.  Specifying @code{aprofile} or
>> +@code{rmprofile} builds multilibs for a combination of ISA, architecture,
>> +FPU available and floating-point ABI.
>> +
>> +The table below gives the combination of ISAs, architectures, FPUs and
>> +floating-point ABIs for which multilibs are built for each accepted value.
>> +
>> +@multitable @columnfractions .15 .28 .30
>> +@item Option @tab aprofile @tab rmprofile
>> +@item ISAs
>> +@tab @code{-marm} and @code{-mthumb}
>> +@tab @code{-mthumb}
>> +@item Architectures@*@*@*@*@*@*
>> +@tab default architecture@*
>> +@code{-march=armv7-a}@*
>> +@code{-march=armv7ve}@*
>> +@code{-march=armv8-a}@*@*@*
>> +@tab default architecture@*
>> +@code{-march=armv6s-m}@*
>> +@code{-march=armv7-m}@*
>> +@code{-march=armv7e-m}@*
>> +@code{-march=armv8-m.base}@*
>> +@code{-march=armv8-m.main}@*
>> +@code{-march=armv7}
>> +@item FPUs@*@*@*@*@*
>> +@tab none@*
>> +@code{-mfpu=vfpv3-d16}@*
>> +@code{-mfpu=neon}@*
>> +@code{-mfpu=vfpv4-d16}@*
>> +@code{-mfpu=neon-vfpv4}@*
>> +@code{-mfpu=neon-fp-armv8}
>> +@tab none@*
>> +@code{-mfpu=vfpv3-d16}@*
>> +@code{-mfpu=fpv4-sp-d16}@*
>> +@code{-mfpu=fpv5-sp-d16}@*
>> +@code{-mfpu=fpv5-d16}@*
>> +@item floating-point@/ ABIs@*@*
>> +@tab @code{-mfloat-abi=soft}@*
>> +@code{-mfloat-abi=softfp}@*
>> +@code{-mfloat-abi=hard}
>> +@tab @code{-mfloat-abi=soft}@*
>> +@code{-mfloat-abi=softfp}@*
>> +@code{-mfloat-abi=hard}
>> +@end multitable
>>
>>  @item sh*-*-*
>>  @var{list} is a comma separated list of CPU names.  These must be of the
>>
>>
>> fix_pr77933_gcc5.patch
>>
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index c01a3c878968f6e6f07358b0686e4a59e34f56b7..5c975625bfa25d2c71c27db348cd3e70fe44a951 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -24457,6 +24457,7 @@ thumb1_expand_prologue (void)
>>    unsigned long live_regs_mask;
>>    unsigned long l_mask;
>>    unsigned high_regs_pushed = 0;
>> +  bool lr_needs_saving;
>>
>>    func_type = arm_current_func_type ();
>>
>> @@ -24479,6 +24480,7 @@ thumb1_expand_prologue (void)
>>
>>    offsets = arm_get_frame_offsets ();
>>    live_regs_mask = offsets->saved_regs_mask;
>> +  lr_needs_saving = live_regs_mask & (1 << LR_REGNUM);
>>
>>    /* Extract a mask of the ones we can give to the Thumb's push instruction.  */
>>    l_mask = live_regs_mask & 0x40ff;
>> @@ -24545,6 +24547,7 @@ thumb1_expand_prologue (void)
>>  	{
>>  	  insn = thumb1_emit_multi_reg_push (l_mask, l_mask);
>>  	  RTX_FRAME_RELATED_P (insn) = 1;
>> +	  lr_needs_saving = false;
>>
>>  	  offset = bit_count (l_mask) * UNITS_PER_WORD;
>>  	}
>> @@ -24609,12 +24612,13 @@ thumb1_expand_prologue (void)
>>       be a push of LR and we can combine it with the push of the first high
>>       register.  */
>>    else if ((l_mask & 0xff) != 0
>> -	   || (high_regs_pushed == 0 && l_mask))
>> +	   || (high_regs_pushed == 0 && lr_needs_saving))
>>      {
>>        unsigned long mask = l_mask;
>>        mask |= (1 << thumb1_extra_regs_pushed (offsets, true)) - 1;
>>        insn = thumb1_emit_multi_reg_push (mask, mask);
>>        RTX_FRAME_RELATED_P (insn) = 1;
>> +      lr_needs_saving = false;
>>      }
>>
>>    if (high_regs_pushed)
>> @@ -24632,7 +24636,9 @@ thumb1_expand_prologue (void)
>>        /* Here we need to mask out registers used for passing arguments
>>  	 even if they can be pushed.  This is to avoid using them to stash the high
>>  	 registers.  Such kind of stash may clobber the use of arguments.  */
>> -      pushable_regs = l_mask & (~arg_regs_mask) & 0xff;
>> +      pushable_regs = l_mask & (~arg_regs_mask);
>> +      if (lr_needs_saving)
>> +	pushable_regs &= ~(1 << LR_REGNUM);
>>
>>        if (pushable_regs == 0)
>>  	pushable_regs = 1 << thumb_find_work_register (live_regs_mask);
>> @@ -24640,8 +24646,9 @@ thumb1_expand_prologue (void)
>>        while (high_regs_pushed > 0)
>>  	{
>>  	  unsigned long real_regs_mask = 0;
>> +	  unsigned long push_mask = 0;
>>
>> -	  for (regno = LAST_LO_REGNUM; regno >= 0; regno --)
>> +	  for (regno = LR_REGNUM; regno >= 0; regno --)
>>  	    {
>>  	      if (pushable_regs & (1 << regno))
>>  		{
>> @@ -24650,6 +24657,7 @@ thumb1_expand_prologue (void)
>>
>>  		  high_regs_pushed --;
>>  		  real_regs_mask |= (1 << next_hi_reg);
>> +		  push_mask |= (1 << regno);
>>
>>  		  if (high_regs_pushed)
>>  		    {
>> @@ -24659,23 +24667,20 @@ thumb1_expand_prologue (void)
>>  			  break;
>>  		    }
>>  		  else
>> -		    {
>> -		      pushable_regs &= ~((1 << regno) - 1);
>> -		      break;
>> -		    }
>> +		    break;
>>  		}
>>  	    }
>>
>>  	  /* If we had to find a work register and we have not yet
>>  	     saved the LR then add it to the list of regs to push.  */
>> -	  if (l_mask == (1 << LR_REGNUM))
>> +	  if (lr_needs_saving)
>>  	    {
>> -	      pushable_regs |= l_mask;
>> -	      real_regs_mask |= l_mask;
>> -	      l_mask = 0;
>> +	      push_mask |= 1 << LR_REGNUM;
>> +	      real_regs_mask |= 1 << LR_REGNUM;
>> +	      lr_needs_saving = false;
>>  	    }
>>
>> -	  insn = thumb1_emit_multi_reg_push (pushable_regs, real_regs_mask);
>> +	  insn = thumb1_emit_multi_reg_push (push_mask, real_regs_mask);
>>  	  RTX_FRAME_RELATED_P (insn) = 1;
>>  	}
>>      }
>> diff --git a/gcc/testsuite/gcc.target/arm/pr77933-1.c b/gcc/testsuite/gcc.target/arm/pr77933-1.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..95cf68ea7531bcc453371f493a05bd40caa5541b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/pr77933-1.c
>> @@ -0,0 +1,46 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +__attribute__ ((noinline, noclone)) void
>> +clobber_lr_and_highregs (void)
>> +{
>> +  __asm__ volatile ("" : : : "r8", "r9", "lr");
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  int ret;
>> +
>> +  __asm volatile ("mov\tr4, #0xf4\n\t"
>> +		  "mov\tr5, #0xf5\n\t"
>> +		  "mov\tr6, #0xf6\n\t"
>> +		  "mov\tr7, #0xf7\n\t"
>> +		  "mov\tr0, #0xf8\n\t"
>> +		  "mov\tr8, r0\n\t"
>> +		  "mov\tr0, #0xfa\n\t"
>> +		  "mov\tr10, r0"
>> +		  : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10");
>> +
>> +  clobber_lr_and_highregs ();
>> +
>> +  __asm volatile ("cmp\tr4, #0xf4\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "cmp\tr5, #0xf5\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "cmp\tr6, #0xf6\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "cmp\tr7, #0xf7\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "mov\tr0, r8\n\t"
>> +		  "cmp\tr0, #0xf8\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "mov\tr0, r10\n\t"
>> +		  "cmp\tr0, #0xfa\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "mov\t%0, #1\n"
>> +		  "fail:\n\t"
>> +		  "sub\tr0, #1"
>> +		  : "=r" (ret) : :);
>> +  return ret;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/arm/pr77933-2.c b/gcc/testsuite/gcc.target/arm/pr77933-2.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..9028c4fcab4229591fa057f15c641d2b5597cd1d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/pr77933-2.c
>> @@ -0,0 +1,47 @@
>> +/* { dg-do run } */
>> +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
>> +/* { dg-options "-mthumb -O2 -mtpcs-leaf-frame" } */
>> +
>> +__attribute__ ((noinline, noclone)) void
>> +clobber_lr_and_highregs (void)
>> +{
>> +  __asm__ volatile ("" : : : "r8", "r9", "lr");
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  int ret;
>> +
>> +  __asm volatile ("mov\tr4, #0xf4\n\t"
>> +		  "mov\tr5, #0xf5\n\t"
>> +		  "mov\tr6, #0xf6\n\t"
>> +		  "mov\tr7, #0xf7\n\t"
>> +		  "mov\tr0, #0xf8\n\t"
>> +		  "mov\tr8, r0\n\t"
>> +		  "mov\tr0, #0xfa\n\t"
>> +		  "mov\tr10, r0"
>> +		  : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10");
>> +
>> +  clobber_lr_and_highregs ();
>> +
>> +  __asm volatile ("cmp\tr4, #0xf4\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "cmp\tr5, #0xf5\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "cmp\tr6, #0xf6\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "cmp\tr7, #0xf7\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "mov\tr0, r8\n\t"
>> +		  "cmp\tr0, #0xf8\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "mov\tr0, r10\n\t"
>> +		  "cmp\tr0, #0xfa\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "mov\t%0, #1\n"
>> +		  "fail:\n\t"
>> +		  "sub\tr0, #1"
>> +		  : "=r" (ret) : :);
>> +  return ret;
>> +}
>>
>>
>> fix_pr77933_gcc6.patch
>>
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index 83cb13d1195beb19d6301f5c83a7eb544a91d877..1dba035c62c97a5f723d02208636c92108427379 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -24710,6 +24710,7 @@ thumb1_expand_prologue (void)
>>    unsigned long live_regs_mask;
>>    unsigned long l_mask;
>>    unsigned high_regs_pushed = 0;
>> +  bool lr_needs_saving;
>>
>>    func_type = arm_current_func_type ();
>>
>> @@ -24732,6 +24733,7 @@ thumb1_expand_prologue (void)
>>
>>    offsets = arm_get_frame_offsets ();
>>    live_regs_mask = offsets->saved_regs_mask;
>> +  lr_needs_saving = live_regs_mask & (1 << LR_REGNUM);
>>
>>    /* Extract a mask of the ones we can give to the Thumb's push instruction.  */
>>    l_mask = live_regs_mask & 0x40ff;
>> @@ -24798,6 +24800,7 @@ thumb1_expand_prologue (void)
>>  	{
>>  	  insn = thumb1_emit_multi_reg_push (l_mask, l_mask);
>>  	  RTX_FRAME_RELATED_P (insn) = 1;
>> +	  lr_needs_saving = false;
>>
>>  	  offset = bit_count (l_mask) * UNITS_PER_WORD;
>>  	}
>> @@ -24862,12 +24865,13 @@ thumb1_expand_prologue (void)
>>       be a push of LR and we can combine it with the push of the first high
>>       register.  */
>>    else if ((l_mask & 0xff) != 0
>> -	   || (high_regs_pushed == 0 && l_mask))
>> +	   || (high_regs_pushed == 0 && lr_needs_saving))
>>      {
>>        unsigned long mask = l_mask;
>>        mask |= (1 << thumb1_extra_regs_pushed (offsets, true)) - 1;
>>        insn = thumb1_emit_multi_reg_push (mask, mask);
>>        RTX_FRAME_RELATED_P (insn) = 1;
>> +      lr_needs_saving = false;
>>      }
>>
>>    if (high_regs_pushed)
>> @@ -24885,7 +24889,9 @@ thumb1_expand_prologue (void)
>>        /* Here we need to mask out registers used for passing arguments
>>  	 even if they can be pushed.  This is to avoid using them to stash the high
>>  	 registers.  Such kind of stash may clobber the use of arguments.  */
>> -      pushable_regs = l_mask & (~arg_regs_mask) & 0xff;
>> +      pushable_regs = l_mask & (~arg_regs_mask);
>> +      if (lr_needs_saving)
>> +	pushable_regs &= ~(1 << LR_REGNUM);
>>
>>        if (pushable_regs == 0)
>>  	pushable_regs = 1 << thumb_find_work_register (live_regs_mask);
>> @@ -24893,8 +24899,9 @@ thumb1_expand_prologue (void)
>>        while (high_regs_pushed > 0)
>>  	{
>>  	  unsigned long real_regs_mask = 0;
>> +	  unsigned long push_mask = 0;
>>
>> -	  for (regno = LAST_LO_REGNUM; regno >= 0; regno --)
>> +	  for (regno = LR_REGNUM; regno >= 0; regno --)
>>  	    {
>>  	      if (pushable_regs & (1 << regno))
>>  		{
>> @@ -24903,6 +24910,7 @@ thumb1_expand_prologue (void)
>>
>>  		  high_regs_pushed --;
>>  		  real_regs_mask |= (1 << next_hi_reg);
>> +		  push_mask |= (1 << regno);
>>
>>  		  if (high_regs_pushed)
>>  		    {
>> @@ -24912,23 +24920,20 @@ thumb1_expand_prologue (void)
>>  			  break;
>>  		    }
>>  		  else
>> -		    {
>> -		      pushable_regs &= ~((1 << regno) - 1);
>> -		      break;
>> -		    }
>> +		    break;
>>  		}
>>  	    }
>>
>>  	  /* If we had to find a work register and we have not yet
>>  	     saved the LR then add it to the list of regs to push.  */
>> -	  if (l_mask == (1 << LR_REGNUM))
>> +	  if (lr_needs_saving)
>>  	    {
>> -	      pushable_regs |= l_mask;
>> -	      real_regs_mask |= l_mask;
>> -	      l_mask = 0;
>> +	      push_mask |= 1 << LR_REGNUM;
>> +	      real_regs_mask |= 1 << LR_REGNUM;
>> +	      lr_needs_saving = false;
>>  	    }
>>
>> -	  insn = thumb1_emit_multi_reg_push (pushable_regs, real_regs_mask);
>> +	  insn = thumb1_emit_multi_reg_push (push_mask, real_regs_mask);
>>  	  RTX_FRAME_RELATED_P (insn) = 1;
>>  	}
>>      }
>> diff --git a/gcc/testsuite/gcc.target/arm/pr77933-1.c b/gcc/testsuite/gcc.target/arm/pr77933-1.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..95cf68ea7531bcc453371f493a05bd40caa5541b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/pr77933-1.c
>> @@ -0,0 +1,46 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +__attribute__ ((noinline, noclone)) void
>> +clobber_lr_and_highregs (void)
>> +{
>> +  __asm__ volatile ("" : : : "r8", "r9", "lr");
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  int ret;
>> +
>> +  __asm volatile ("mov\tr4, #0xf4\n\t"
>> +		  "mov\tr5, #0xf5\n\t"
>> +		  "mov\tr6, #0xf6\n\t"
>> +		  "mov\tr7, #0xf7\n\t"
>> +		  "mov\tr0, #0xf8\n\t"
>> +		  "mov\tr8, r0\n\t"
>> +		  "mov\tr0, #0xfa\n\t"
>> +		  "mov\tr10, r0"
>> +		  : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10");
>> +
>> +  clobber_lr_and_highregs ();
>> +
>> +  __asm volatile ("cmp\tr4, #0xf4\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "cmp\tr5, #0xf5\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "cmp\tr6, #0xf6\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "cmp\tr7, #0xf7\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "mov\tr0, r8\n\t"
>> +		  "cmp\tr0, #0xf8\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "mov\tr0, r10\n\t"
>> +		  "cmp\tr0, #0xfa\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "mov\t%0, #1\n"
>> +		  "fail:\n\t"
>> +		  "sub\tr0, #1"
>> +		  : "=r" (ret) : :);
>> +  return ret;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/arm/pr77933-2.c b/gcc/testsuite/gcc.target/arm/pr77933-2.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..9028c4fcab4229591fa057f15c641d2b5597cd1d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/pr77933-2.c
>> @@ -0,0 +1,47 @@
>> +/* { dg-do run } */
>> +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
>> +/* { dg-options "-mthumb -O2 -mtpcs-leaf-frame" } */
>> +
>> +__attribute__ ((noinline, noclone)) void
>> +clobber_lr_and_highregs (void)
>> +{
>> +  __asm__ volatile ("" : : : "r8", "r9", "lr");
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  int ret;
>> +
>> +  __asm volatile ("mov\tr4, #0xf4\n\t"
>> +		  "mov\tr5, #0xf5\n\t"
>> +		  "mov\tr6, #0xf6\n\t"
>> +		  "mov\tr7, #0xf7\n\t"
>> +		  "mov\tr0, #0xf8\n\t"
>> +		  "mov\tr8, r0\n\t"
>> +		  "mov\tr0, #0xfa\n\t"
>> +		  "mov\tr10, r0"
>> +		  : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10");
>> +
>> +  clobber_lr_and_highregs ();
>> +
>> +  __asm volatile ("cmp\tr4, #0xf4\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "cmp\tr5, #0xf5\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "cmp\tr6, #0xf6\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "cmp\tr7, #0xf7\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "mov\tr0, r8\n\t"
>> +		  "cmp\tr0, #0xf8\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "mov\tr0, r10\n\t"
>> +		  "cmp\tr0, #0xfa\n\t"
>> +		  "bne\tfail\n\t"
>> +		  "mov\t%0, #1\n"
>> +		  "fail:\n\t"
>> +		  "sub\tr0, #1"
>> +		  : "=r" (ret) : :);
>> +  return ret;
>> +}
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix_pr77933_gcc5.patch
Type: text/x-patch
Size: 5970 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20161130/105beb76/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix_pr77933_gcc6.patch
Type: text/x-patch
Size: 5970 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20161130/105beb76/attachment-0001.bin>


More information about the Gcc-patches mailing list