This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] [Aarch64] Variable shift count truncation issues
- From: Richard Sandiford <richard dot sandiford at linaro dot org>
- To: Michael Collison <Michael dot Collison at arm dot com>
- Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
- Date: Fri, 19 May 2017 08:21:02 +0100
- Subject: Re: [PATCH] [Aarch64] Variable shift count truncation issues
- Authentication-results: sourceware.org; auth=none
- References: <HE1PR0802MB2377C2EC041136EBC17948BA95E40@HE1PR0802MB2377.eurprd08.prod.outlook.com>
Thanks for doing this. Just a couple of comments about the .md stuff:
Michael Collison <Michael.Collison@arm.com> writes:
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 5adc5ed..c6ae670 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -3999,6 +3999,92 @@
> }
> )
>
> +;; When the LSL, LSR, ASR, ROR instructions operate on all register arguments
> +;; they truncate the shift/rotate amount by the size of the registers they
> +;; operate on: 32 for W-regs, 63 for X-regs. This allows us to optimise away
> +;; such redundant masking instructions. GCC can do that automatically when
> +;; SHIFT_COUNT_TRUNCATED is true, but we can't enable it for TARGET_SIMD
> +;; because some of the SISD shift alternatives don't perform this truncations.
> +;; So this pattern exists to catch such cases.
> +
> +(define_insn "*aarch64_<optab>_reg_<mode>3_mask1"
> + [(set (match_operand:GPI 0 "register_operand" "=r")
> + (SHIFT:GPI
> + (match_operand:GPI 1 "register_operand" "r")
> + (subreg:QI (and:GPI
> + (match_operand:GPI 2 "register_operand" "r")
> + (match_operand 3 "const_int_operand" "n")) 0)))]
> + "(~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0"
> + "<shift>\t%<w>0, %<w>1, %<w>2"
> + [(set_attr "type" "shift_reg")]
> +)
(subreg:QI (...) 0) is only correct for little endian. For big endian
it needs to be 3 for SI or 7 for DI. You could probably handle that
using something like:
(match_operator:QI 2 "lowpart_subreg"
[(and:GPI ...)])
and defining a lowpart_subreg predicate that checks the SUBREG_BYTE.
Or just leave the subreg as-is and add !BYTES_BIG_ENDIAN to the insn
condition.
> +(define_insn_and_split "*aarch64_reg_<mode>3_neg_mask2"
> + [(set (match_operand:GPI 0 "register_operand" "=r")
> + (SHIFT:GPI
> + (match_operand:GPI 1 "register_operand" "r")
> + (subreg:QI (neg:SI (and:SI
> + (match_operand:SI 2 "register_operand" "r")
> + (match_operand 3 "const_int_operand" "n"))) 0)))]
> + "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0)
> + && can_create_pseudo_p ()"
> + "#"
> + "&& true"
> + [(const_int 0)]
> + {
> + rtx tmp = gen_reg_rtx (SImode);
> +
> + emit_insn (gen_negsi2 (tmp, operands[2]));
> + rtx tmp2 = simplify_gen_subreg (QImode, tmp, SImode, 0);
> + emit_insn (gen_<optab><mode>3 (operands[0], operands[1], tmp2));
> + DONE;
> + }
> +)
Insn patterns shouldn't check can_create_pseudo_p, because there's no
guarantee that the associated split happens before RA. In this case it
should be safe to reuse operand 0 after RA if you change it to:
[(set (match_operand:GPI 0 "register_operand" "=&r")
...)]
and then:
rtx tmp = (can_create_pseudo_p ()
? gen_reg_rtx (SImode)
: gen_lowpart (SImode, operands[0]));
Thanks,
Richard