[PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

Jeff Law law@redhat.com
Thu Sep 14 18:03:00 GMT 2017


On 09/14/2017 10:33 AM, Steve Ellcey wrote:
> On Thu, 2017-09-14 at 09:03 -0600, Jeff Law wrote:
>> On 09/13/2017 03:46 PM, Steve Ellcey wrote:
>>>  
>>> In arm32 rtl expansion, when reading the QI memory location, I see
>>> these instructions get generated:
>>>
>>> (insn 10 3 11 2 (set (reg:SI 119)
>>>         (zero_extend:SI (mem:QI (reg/v/f:SI 117 [ string ]) [0
>>> *string_9(D)+0 S1 A8]))) "pr77729.c":4 -1
>>>      (nil))
>>> (insn 11 10 12 2 (set (reg:QI 118)
>>>         (subreg:QI (reg:SI 119) 0)) "pr77729.c":4 -1
>>>      (nil))
>>>
>>> And in aarch64 rtl expansion I see:
>>>
>>> (insn 10 9 11 (set (reg:QI 81)
>>>         (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1
>>> A8])) "pr77729.c":3 -1
>>>      (nil))
>>>
>>> Both of these sequences expand to ldrb but in the arm32 case I know
>>> that I set all 32 bits of the register (even though I only want the
>>> bottom 8 bits), but for aarch64 I only know that I set the bottom 8
>>> bits and I don't know anything about the higher bits, meaning I have to
>>> keep the AND instruction to mask out the upper bits on aarch64.
> 
>> It's one of the reasons I discourage subregs -- the number of cases
>> where we can optimize based on the "don't care" semantics are relatively
>> small in my experience and I consistently see cases where the "don't
>> care" property of the subreg turns into "don't know" and suppresses
>> downstream optimizations.
>>
>> It's always a judgment call, but more and more often I find myself
>> pushing towards defining those bits using a zero/sign extension, bit
>> operation or whatever rather than using subregs.
> 
> So if I were loading a QImode to a register (zeroing out the upper
> bits) would you generate something like:
> 
> (truncate:QI (zero_extend:SI (reg:QI)))
> 
>  instead of:
> 
> (subreg:QI (reg:SI))
> 
>>> I think we should change the movqi/movhi expansions on aarch64 to
>>> recognize that the ldrb/ldrh instructions zero out the upper bits in
>>> the register by generating rtl like arm32 does.
> 
>> Is LOAD_EXTEND_OP defined for aarch64?
> 
> Yes, aarch64 defines LOAD_EXTEND_OP to be ZERO_EXTEND.
> 
>> It may also be worth looking at ree.c -- my recollection is that it
>> didn't handle subregs, but it could and probably should.
> 
> I only see a couple of references to subregs in ree.c.  I think they
> both involve searching for all uses of a register.
Right.  But the subreg expressions are also forms of extension -- we
just don't know (or care) if it's zero or sign extension.

We'd start by recognizing the paradoxical subreg in
add_removable_extension as a form of an extension similar to zero_extend
and sign_extend.

When we go to combine the "extension" into the defining insn, we would
test 3 forms

(set (target) (zero_extend (exp)))

(set (target) (sign_extend (exp)))

(set (target) (subreg (exp)))

If any form matches an insn on the target, then we're done.


This may require adding some new patterns to aarch64 -- I believe we've
got patterns on x86 to match some of these forms to aid redundant
extension eliminmation.

It might also be helpful to teach ree about LOAD_EXTEND_OP which would
allow combining one of the extension forms with a memory reference.

Jeff



More information about the Gcc-patches mailing list