This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension


Hi Steve,

On 13/09/17 17:50, Steve Ellcey wrote:
This is a patch for PR target/77729 on aarch64.  The code is doing an
unneeded zero extend ('uxtb' in the original report, 'and' in the ToT sources).

The patch looks a bit odd, it is a specialized define_insn for the combine
pass. At some point in combine (I never did find out where), the zero_extend
is converted to an AND so my instruction looks for an OR of a constant
and an AND expression where one operand of the AND is a subreg and the other
is a constant.  If the two constants add up to 255 that means that the AND
is being used to mask out the upper bits of the register while not messing
up the constant we are using in the OR expression.

I also had to recognize this in the aarch64 cost function or combine would
not use the new expression even when it recognized it as it thought it cost
more than the original uncombined expressions.

Tested on aarch64 with a bootstrap and testsuite run that had no regressions.

OK to checkin?


2017-09-13  Steve Ellcey  <sellcey@cavium.com>

        PR target/77729
        * config/aarch64/aarch64.c (aarch64_rtx_costs):
        Handle cost of *iorqi3_uxtw instruction.
        * config/aarch64/aarch64.md (*iorqi3_uxtw): New
        instruction for combine phase.


2017-09-13  Steve Ellcey  <sellcey@cavium.com>

        * gcc.target/aarch64/pr77729.c: New test.

+;; Specialized OR instruction for combiner.  The AND is masking out bits
+;; not needed in the OR (doing a zero_extend).  The zero_extend is not
+;; needed because we know from the subreg that the upper part of the reg
+;; is zero.
+(define_insn "*iorqi3_uxtw"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (ior:SI (and:SI
+		  (subreg:SI (match_operand:QI 1 "register_operand" "r") 0)
+		  (match_operand:SI 2 "const_int_operand" "n"))
+		(match_operand:SI 3 "aarch64_logical_operand" "K")))]
+  "INTVAL (operands[2]) + INTVAL (operands[3]) == 255"
+  "orr\\t%w0, %w1, %3"
+  [(set_attr "type" "logic_imm")]
+)

We are usually hesitant to add explicit subreg matching in the MD pattern
(though I don't remember if there's a hard rule against it).
In this case this looks like a missing simplification from combine (simplify-rtx) so
I think adding it there would be better.

Also, in this pattern operands[3] has the aarch64_logical_operand predicate which allows registers
but in the final instruction check below you unconditionally take its INTVAL which will throw an error
if the compiler is configured with RTL checking enabled.

Thanks,
Kyrill



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]