This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension
- From: Steve Ellcey <sellcey at cavium dot com>
- To: Segher Boessenkool <segher at kernel dot crashing dot org>, Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 13 Sep 2017 14:46:43 -0700
- Subject: Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Steve dot Ellcey at cavium dot com;
- References: <1505321438.2286.7.camel@cavium.com> <59B9674E.2000601@foss.arm.com> <20170913194617.GO8421@gate.crashing.org>
- Reply-to: sellcey at cavium dot com
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
On Wed, 2017-09-13 at 14:46 -0500, Segher Boessenkool wrote:
> On Wed, Sep 13, 2017 at 06:13:50PM +0100, Kyrill Tkachov wrote:
> >
> > We are usually hesitant to add explicit subreg matching in the MD pattern
> > (though I don't remember if there's a hard rule against it).
> > In this case this looks like a missing simplification from combine
> > (simplify-rtx) so
> > I think adding it there would be better.
> Yes, it probably belongs as a generic simplification in simplify-rtx.c;
> if there is a reason not to do that, it can be done in combine.c
> instead.
Actually, now that I look at it some more and compare it to the arm32
version (where we do not have this problem) I think the problem starts
well before combine.
In arm32 rtl expansion, when reading the QI memory location, I see
these instructions get generated:
(insn 10 3 11 2 (set (reg:SI 119)
(zero_extend:SI (mem:QI (reg/v/f:SI 117 [ string ]) [0 *string_9(D)+0 S1 A8]))) "pr77729.c":4 -1
(nil))
(insn 11 10 12 2 (set (reg:QI 118)
(subreg:QI (reg:SI 119) 0)) "pr77729.c":4 -1
(nil))
And in aarch64 rtl expansion I see:
(insn 10 9 11 (set (reg:QI 81)
(mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8])) "pr77729.c":3 -1
(nil))
Both of these sequences expand to ldrb but in the arm32 case I know
that I set all 32 bits of the register (even though I only want the
bottom 8 bits), but for aarch64 I only know that I set the bottom 8
bits and I don't know anything about the higher bits, meaning I have to
keep the AND instruction to mask out the upper bits on aarch64.
I think we should change the movqi/movhi expansions on aarch64 to
recognize that the ldrb/ldrh instructions zero out the upper bits in
the register by generating rtl like arm32 does.
Steve Ellcey
sellcey@cavium.com