This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] [Aarch64] Optimize subtract in shift counts

From: Richard Sandiford <richard dot sandiford at linaro dot org>
To: Michael Collison <Michael dot Collison at arm dot com>
Cc: Richard Biener <richard dot guenther at gmail dot com>, Richard Kenner <kenner at vlsi1 dot ultra dot nyu dot edu>, GCC Patches <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>, Andrew Pinski <pinskia at gmail dot com>
Date: Wed, 06 Sep 2017 18:21:41 +0100
Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts
Authentication-results: sourceware.org; auth=none
References: <HE1PR0802MB2377266815DA3FBD9F6EF21D95B50@HE1PR0802MB2377.eurprd08.prod.outlook.com> <CA+=Sn1mV0mYmmY63OhvGv3LaP64oUt_vM1wFrTQ957JceG+jTA@mail.gmail.com> <20170807210159.86DD633CA8@vlsi1.gnat.com> <HE1PR0802MB2377C331B4FC7BDA5C2C429795B50@HE1PR0802MB2377.eurprd08.prod.outlook.com> <20170808015616.1386833CA8@vlsi1.gnat.com> <HE1PR0802MB237709FE48AFAC7C1DA43361958A0@HE1PR0802MB2377.eurprd08.prod.outlook.com> <20170808121311.5B7D033CAD@vlsi1.gnat.com> <HE1PR0802MB2377EFC4CDECEAA44AF20F26958A0@HE1PR0802MB2377.eurprd08.prod.outlook.com> <20170808195158.352B333ED7@vlsi1.gnat.com> <HE1PR0802MB2377DA25B2D643A902227E32958A0@HE1PR0802MB2377.eurprd08.prod.outlook.com> <20170808200402.3B4BE33EE9@vlsi1.gnat.com> <20170808202024.8AC9533F0E@vlsi1.gnat.com> <87valgddb2.fsf@linaro.org> <E171E75C-6C49-46BF-8B91-626FEAA0E7EF@gmail.com> <87r2w4cb6w.fsf@linaro.org> <CAFiYyc03jHmdOKEOq-5W-OShLwTW9pFsj6nA7rLB2Tb5wDau3w@mail.gmail.com> <87mv6sc98y.fsf@linaro.org> <HE1PR0802MB2377FA167C45D2F5A3EC18DC95970@HE1PR0802MB2377.eurprd08.prod.outlook.com> <87bmmot8wu.fsf@linaro.org> <HE1PR0802MB237763E2263323F17908C03295970@HE1PR0802MB2377.eurprd08.prod.outlook.com> <8760cvu5to.fsf@linaro.org>

Richard Sandiford <richard.sandiford@linaro.org> writes:
> Richard Sandiford <richard.sandiford@linaro.org> writes:
>> Michael Collison <Michael.Collison@arm.com> writes:
>>> Richard,
>>>
>>> The problem with this approach for Aarch64 is that
>>> TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which is
>>> normally 0 as it based on the TARGET_SIMD flag.
>>
>> Maybe I'm wrong, but that seems like a missed optimisation in itself.

Sorry to follow up on myself yet again, but I'd forgotten this was
because we allow the SIMD unit to do scalar shifts.  So I guess
we have no choice, even though it seems unfortunate.

> +(define_insn_and_split "*aarch64_reg_<optab>_minus<mode>3"
> +  [(set (match_operand:GPI 0 "register_operand" "=&r")
> +	(ASHIFT:GPI
> +	  (match_operand:GPI 1 "register_operand" "r")
> +	  (minus:QI (match_operand 2 "const_int_operand" "n")
> +		    (match_operand:QI 3 "register_operand" "r"))))]
> +  "INTVAL (operands[2]) == GET_MODE_BITSIZE (<MODE>mode)"
> +  "#"
> +  "&& true"
> +  [(const_int 0)]
> +  {
> +    /* Handle cases where operand 3 is a plain QI register, or
> +       a subreg with either a SImode or DImode register.  */
> +
> +    rtx subreg_tmp = (REG_P (operands[3])
> +		      ? gen_lowpart_SUBREG (SImode, operands[3])
> +		      : SUBREG_REG (operands[3]));
> +
> +    if (REG_P (subreg_tmp) && GET_MODE (subreg_tmp) == DImode)
> +      subreg_tmp = gen_lowpart_SUBREG (SImode, subreg_tmp);

I think this all simplifies to:

  rtx subreg_tmp = gen_lowpart (SImode, operands[3]);

(or it would be worth having a comment that explains why not).
As well as being shorter, it will properly simplify hard REGs
to new hard REGs.

> +    rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode)
> +	       : operands[0]);
> +
> +    if (<MODE>mode == DImode && !can_create_pseudo_p ())
> +      tmp = gen_lowpart_SUBREG (SImode, operands[0]);

I think this too would be simpler with gen_lowpart:

    rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (SImode)
	       : gen_lowpart (SImode, operands[0]));

> +
> +    emit_insn (gen_negsi2 (tmp, subreg_tmp));
> +
> +    rtx and_op = gen_rtx_AND (SImode, tmp,
> +			      GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - 1));
> +
> +    rtx subreg_tmp2 = gen_lowpart_SUBREG (QImode, and_op);
> +
> +    emit_insn (gen_<optab><mode>3 (operands[0], operands[1], subreg_tmp2));
> +    DONE;
> +  }
> +)

The pattern should probably set the "length" attribute to 8.

Looks good to me with those changes FWIW.

Thanks,
Richard

Follow-Ups:
- RE: [PATCH] [Aarch64] Optimize subtract in shift counts
  - From: Michael Collison

References:
- RE: [PATCH] [Aarch64] Optimize subtract in shift counts
  - From: Michael Collison
- Re: [PATCH] [Aarch64] Optimize subtract in shift counts
  - From: Richard Sandiford
- RE: [PATCH] [Aarch64] Optimize subtract in shift counts
  - From: Michael Collison
- Re: [PATCH] [Aarch64] Optimize subtract in shift counts
  - From: Richard Sandiford

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]