This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension


On 09/14/2017 10:33 AM, Steve Ellcey wrote:
> On Thu, 2017-09-14 at 09:03 -0600, Jeff Law wrote:
>> On 09/13/2017 03:46 PM, Steve Ellcey wrote:
>>>  
>>> In arm32 rtl expansion, when reading the QI memory location, I see
>>> these instructions get generated:
>>>
>>> (insn 10 3 11 2 (set (reg:SI 119)
>>>         (zero_extend:SI (mem:QI (reg/v/f:SI 117 [ string ]) [0
>>> *string_9(D)+0 S1 A8]))) "pr77729.c":4 -1
>>>      (nil))
>>> (insn 11 10 12 2 (set (reg:QI 118)
>>>         (subreg:QI (reg:SI 119) 0)) "pr77729.c":4 -1
>>>      (nil))
>>>
>>> And in aarch64 rtl expansion I see:
>>>
>>> (insn 10 9 11 (set (reg:QI 81)
>>>         (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1
>>> A8])) "pr77729.c":3 -1
>>>      (nil))
>>>
>>> Both of these sequences expand to ldrb but in the arm32 case I know
>>> that I set all 32 bits of the register (even though I only want the
>>> bottom 8 bits), but for aarch64 I only know that I set the bottom 8
>>> bits and I don't know anything about the higher bits, meaning I have to
>>> keep the AND instruction to mask out the upper bits on aarch64.
> 
>> It's one of the reasons I discourage subregs -- the number of cases
>> where we can optimize based on the "don't care" semantics are relatively
>> small in my experience and I consistently see cases where the "don't
>> care" property of the subreg turns into "don't know" and suppresses
>> downstream optimizations.
>>
>> It's always a judgment call, but more and more often I find myself
>> pushing towards defining those bits using a zero/sign extension, bit
>> operation or whatever rather than using subregs.
> 
> So if I were loading a QImode to a register (zeroing out the upper
> bits) would you generate something like:
> 
> (truncate:QI (zero_extend:SI (reg:QI)))
On a LOAD_EXTEND_OP target which zero extends and
WORD_REGISTER_OPERATIONS as 1 I'd load memory with just

(set (reg:QI) (mem:QI (whatever))

If that object is later used elsewhere it probably will be explicitly
sign/zero extended.  But combine ought to be able to eliminate that
explicit extension.

And I think that's starting to zero in on the problem --
WORD_REGISTER_OPERATIONS is zero on aarch64 as you don't get extension
to word_mode for W form registers.

I wonder if what needs to happen is somehow look to extend that code
somehow so that combine and friends know that the value is zero extended
to 32 bits, even if it's not extended to word_mode.

Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]