This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Using particular register class (like floating point registers) as spill register class

On 05/16/2014 05:20 PM, Ian Bolton wrote:
>> On 05/16/2014 12:05 PM, Kugan wrote:
>>> On 16/05/14 20:40, wrote:
>>>>> On May 16, 2014, at 3:23 AM, Kugan
>> <> wrote:
>>>>> I would like to know if there is anyway we can use registers from
>>>>> particular register class just as spill registers (in places where
>>>>> register allocator would normally spill to stack and nothing more),
>> when
>>>>> it can be useful.
>>>>> In AArch64, in some cases, compiling with -mgeneral-regs-only
>> produces
>>>>> better performance compared not using it. The difference here is
>> that
>>>>> when -mgeneral-regs-only is not used, floating point register are
>> also
>>>>> used in register allocation. Then IRA/LRA has to move them to core
>>>>> registers before performing operations as shown below.
>>>> Can you show the code with fp register disabled?  Does it use the
>> stack to spill?  Normally this is due to register to register class
>> costs compared to register to memory move cost.  Also I think it
>> depends on the processor rather the target.  For thunder, using the fp
>> registers might actually be better than using the stack depending if
>> the stack was in L1.
>>> Not all the LDR/STR combination match to fmov. In the testcase I
>> have,
>>> aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S  -mgeneral-regs-only
>>> grep -c "ldr" sha_dgst.s
>>> 50
>>> grep -c "str" sha_dgst.s
>>> 42
>>> grep -c "fmov" sha_dgst.s
>>> 0
>>> aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S
>>> grep -c "ldr" sha_dgst.s
>>> 42
>>> grep -c "str" sha_dgst.s
>>> 31
>>> grep -c "fmov" sha_dgst.s
>>> 105
>>> I  am not saying that we shouldn't use floating point register here.
>> But
>>> from the above, it seems like register allocator is using it as more
>>> like core register (even though the cost mode has higher cost) and
>> then
>>> moving the values to core registers before operations. if that is the
>>> case, my question is, how do we just make this as spill register
>> class
>>> so that we will replace ldr/str with equal number of fmov when it is
>>> possible.
>> I'm also seeing stuff like this:
>> => 0x7fb72a0928 <ClassFileParser::parse_constant_pool_entries(int,
>> Thread*)+2500>:
>>     add	x21, x4, x21, lsl #3
>> => 0x7fb72a092c <ClassFileParser::parse_constant_pool_entries(int,
>> Thread*)+2504>:
>>     fmov	w2, s8
>> => 0x7fb72a0930 <ClassFileParser::parse_constant_pool_entries(int,
>> Thread*)+2508>:
>>     str	w2, [x21,#88]
>> I guess GCC doesn't know how to store an SImode value in an FP register
>> into
>> memory?  This is  4.8.1.
> Please can you try that on trunk and report back.

OK, this is trunk, and I'm not longer seeing that happen.

However, I am seeing:

   0x0000007fb76dc82c <+160>:	adrp	x25, 0x7fb7c80000
   0x0000007fb76dc830 <+164>:	add	x25, x25, #0x480
   0x0000007fb76dc834 <+168>:	fmov	d8, x0
   0x0000007fb76dc838 <+172>:	add	x0, x29, #0x160
   0x0000007fb76dc83c <+176>:	fmov	d9, x0
   0x0000007fb76dc840 <+180>:	add	x0, x29, #0xd8
   0x0000007fb76dc844 <+184>:	fmov	d10, x0
   0x0000007fb76dc848 <+188>:	add	x0, x29, #0xf8
   0x0000007fb76dc84c <+192>:	fmov	d11, x0

followed later by:

   0x0000007fb76dd224 <+2712>:	fmov	x0, d9
   0x0000007fb76dd228 <+2716>:	add	x6, x29, #0x118
   0x0000007fb76dd22c <+2720>:	str	x20, [x0,w27,sxtw #3]
   0x0000007fb76dd230 <+2724>:	fmov	x0, d10
   0x0000007fb76dd234 <+2728>:	str	w28, [x0,w27,sxtw #2]
   0x0000007fb76dd238 <+2732>:	fmov	x0, d11
   0x0000007fb76dd23c <+2736>:	str	w19, [x0,w27,sxtw #2]

which seems a bit suboptimal, given that these double registers now have
to be saved in the prologue.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]