This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension


On Wed, 2017-09-13 at 14:46 -0500, Segher Boessenkool wrote:
> On Wed, Sep 13, 2017 at 06:13:50PM +0100, Kyrill Tkachov wrote:
> > 
> > We are usually hesitant to add explicit subreg matching in the MD pattern
> > (though I don't remember if there's a hard rule against it).
> > In this case this looks like a missing simplification from combine 
> > (simplify-rtx) so
> > I think adding it there would be better.

> Yes, it probably belongs as a generic simplification in simplify-rtx.c;
> if there is a reason not to do that, it can be done in combine.c
> instead.

Actually, now that I look at it some more and compare it to the arm32
version (where we do not have this problem) I think the problem starts
well before combine.

In arm32 rtl expansion, when reading the QI memory location, I see
these instructions get generated:

(insn 10 3 11 2 (set (reg:SI 119)
        (zero_extend:SI (mem:QI (reg/v/f:SI 117 [ string ]) [0 *string_9(D)+0 S1 A8]))) "pr77729.c":4 -1
     (nil))
(insn 11 10 12 2 (set (reg:QI 118)
        (subreg:QI (reg:SI 119) 0)) "pr77729.c":4 -1
     (nil))

And in aarch64 rtl expansion I see:

(insn 10 9 11 (set (reg:QI 81)
        (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8])) "pr77729.c":3 -1
     (nil))

Both of these sequences expand to ldrb but in the arm32 case I know
that I set all 32 bits of the register (even though I only want the
bottom 8 bits), but for aarch64 I only know that I set the bottom 8
bits and I don't know anything about the higher bits, meaning I have to
keep the AND instruction to mask out the upper bits on aarch64.

I think we should change the movqi/movhi expansions on aarch64 to
recognize that the ldrb/ldrh instructions zero out the upper bits in
the register by generating rtl like arm32 does.

Steve Ellcey
sellcey@cavium.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]