This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Using particular register class (like floating point registers) as spill register class
- From: Andrew Haley <aph at redhat dot com>
- To: ramrad01 at arm dot com
- Cc: Ian Bolton <ian dot bolton at arm dot com>, Kugan <kugan dot vivekanandarajah at linaro dot org>, Andrew Pinski <pinskia at gmail dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, Vladimir Makarov <vmakarov at redhat dot com>
- Date: Mon, 19 May 2014 14:10:44 +0100
- Subject: Re: Using particular register class (like floating point registers) as spill register class
- Authentication-results: sourceware.org; auth=none
- References: <5375E730 dot 20309 at linaro dot org> <CE1A054F-0FBD-4368-82E3-9BCE35509078 at gmail dot com> <5375F0E8 dot 70109 at linaro dot org> <53760B2E dot 1000802 at redhat dot com> <5379F2DC dot 600 at redhat dot com> <CAJA7tRYcAsCC_S2b0HJ_H5_ebzXxWYNCD-O3PyTyB_FW+3Fk0g at mail dot gmail dot com>
On 05/19/2014 01:19 PM, Ramana Radhakrishnan wrote:
> On Mon, May 19, 2014 at 1:02 PM, Andrew Haley <aph@redhat.com> wrote:
>> On 05/16/2014 05:20 PM, Ian Bolton wrote:
>>>> On 05/16/2014 12:05 PM, Kugan wrote:
>>>>>
>>>>>
>>>>> On 16/05/14 20:40, pinskia@gmail.com wrote:
>>>>>>
>>>>>>
>>>>>>> On May 16, 2014, at 3:23 AM, Kugan
>>>> <kugan.vivekanandarajah@linaro.org> wrote:
>>>>>>>
>>>>>>> I would like to know if there is anyway we can use registers from
>>>>>>> particular register class just as spill registers (in places where
>>>>>>> register allocator would normally spill to stack and nothing more),
>>>> when
>>>>>>> it can be useful.
>>>>>>>
>>>>>>> In AArch64, in some cases, compiling with -mgeneral-regs-only
>>>> produces
>>>>>>> better performance compared not using it. The difference here is
>>>> that
>>>>>>> when -mgeneral-regs-only is not used, floating point register are
>>>> also
>>>>>>> used in register allocation. Then IRA/LRA has to move them to core
>>>>>>> registers before performing operations as shown below.
>>>>>>
>>>>>> Can you show the code with fp register disabled? Does it use the
>>>> stack to spill? Normally this is due to register to register class
>>>> costs compared to register to memory move cost. Also I think it
>>>> depends on the processor rather the target. For thunder, using the fp
>>>> registers might actually be better than using the stack depending if
>>>> the stack was in L1.
>>>>> Not all the LDR/STR combination match to fmov. In the testcase I
>>>> have,
>>>>>
>>>>> aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S -mgeneral-regs-only
>>>>> grep -c "ldr" sha_dgst.s
>>>>> 50
>>>>> grep -c "str" sha_dgst.s
>>>>> 42
>>>>> grep -c "fmov" sha_dgst.s
>>>>> 0
>>>>>
>>>>> aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S
>>>>> grep -c "ldr" sha_dgst.s
>>>>> 42
>>>>> grep -c "str" sha_dgst.s
>>>>> 31
>>>>> grep -c "fmov" sha_dgst.s
>>>>> 105
>>>>>
>>>>> I am not saying that we shouldn't use floating point register here.
>>>> But
>>>>> from the above, it seems like register allocator is using it as more
>>>>> like core register (even though the cost mode has higher cost) and
>>>> then
>>>>> moving the values to core registers before operations. if that is the
>>>>> case, my question is, how do we just make this as spill register
>>>> class
>>>>> so that we will replace ldr/str with equal number of fmov when it is
>>>>> possible.
>>>>
>>>> I'm also seeing stuff like this:
>>>>
>>>> => 0x7fb72a0928 <ClassFileParser::parse_constant_pool_entries(int,
>>>> Thread*)+2500>:
>>>> add x21, x4, x21, lsl #3
>>>> => 0x7fb72a092c <ClassFileParser::parse_constant_pool_entries(int,
>>>> Thread*)+2504>:
>>>> fmov w2, s8
>>>> => 0x7fb72a0930 <ClassFileParser::parse_constant_pool_entries(int,
>>>> Thread*)+2508>:
>>>> str w2, [x21,#88]
>>>>
>>>> I guess GCC doesn't know how to store an SImode value in an FP register
>>>> into
>>>> memory? This is 4.8.1.
>>>>
>>>
>>> Please can you try that on trunk and report back.
>>
>> OK, this is trunk, and I'm not longer seeing that happen.
>>
>> However, I am seeing:
>>
>> 0x0000007fb76dc82c <+160>: adrp x25, 0x7fb7c80000
>> 0x0000007fb76dc830 <+164>: add x25, x25, #0x480
>> 0x0000007fb76dc834 <+168>: fmov d8, x0
>> 0x0000007fb76dc838 <+172>: add x0, x29, #0x160
>> 0x0000007fb76dc83c <+176>: fmov d9, x0
>> 0x0000007fb76dc840 <+180>: add x0, x29, #0xd8
>> 0x0000007fb76dc844 <+184>: fmov d10, x0
>> 0x0000007fb76dc848 <+188>: add x0, x29, #0xf8
>> 0x0000007fb76dc84c <+192>: fmov d11, x0
>>
>> followed later by:
>>
>> 0x0000007fb76dd224 <+2712>: fmov x0, d9
>> 0x0000007fb76dd228 <+2716>: add x6, x29, #0x118
>> 0x0000007fb76dd22c <+2720>: str x20, [x0,w27,sxtw #3]
>> 0x0000007fb76dd230 <+2724>: fmov x0, d10
>> 0x0000007fb76dd234 <+2728>: str w28, [x0,w27,sxtw #2]
>> 0x0000007fb76dd238 <+2732>: fmov x0, d11
>> 0x0000007fb76dd23c <+2736>: str w19, [x0,w27,sxtw #2]
>>
>> which seems a bit suboptimal, given that these double registers now have
>> to be saved in the prologue.
>
> That looks a bit suspicious - Is there a pre-processed file you can
> put on to bugzilla for someone to take a look at with command line
> options et al ?
I'll try, but I'm using precompiled headers so it's a bit tricky. I'll let
you know.
Andrew.