This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] [Aarch64] Optimize subtract in shift counts
Richard Biener <email@example.com> writes:
> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford
> <firstname.lastname@example.org> wrote:
>>Richard Biener <email@example.com> writes:
>>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner
>>> <firstname.lastname@example.org> wrote:
>>>>> Correct. It is truncated for integer shift, but not simd shift
>>>>> instructions. We generate a pattern in the split that only
>>>>> the integer shift instructions.
>>>> That's unfortunate, because it would be nice to do this in
>>>> since it's machine-independent, but that has to be conditioned on
>>>> SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.
>>> SHIFT_COUNT_TRUNCATED should go ... you should express this in
>>> the patterns, like for example with
>>> (define_insn ashlSI3
>>> [(set (match_operand 0 "")
>>> (ashl:SI (match_operand ... )
>>> (subreg:QI (match_operand:SI ...)))]
>>> or an explicit and:SI and combine / simplify_rtx should apply the
>>> optimization we expect.
>>The problem with the explicit AND is that you'd end up with either
>>an AND of two constants for constant shifts, or with two separate
>>one for constant shifts and one for variable shifts. (And the problem
>>theory with two patterns is that it reduces the RA's freedom, although
>>practice I guess we'd always want a constant shift where possible for
>>cost reasons, and so the RA would never need to replace pseudos with
>>I think all useful instances of this optimisation will be exposed by
>>the gimple optimisers, so maybe expand could to do it based on
>>TARGET_SHIFT_TRUNCATION_MASK? That describes the optab rather than
>>the rtx code and it does take the mode into account.
> Sure, that could work as well and also take into account range info. But
> we'd then need named expanders and the result would still have the
> explicit and or need to be an unspec or a different RTL operation.
Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have
target-dependent rather than undefined behaviour, so it's OK
for a target to use shift codes with out-of-range values. And
TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about
how the normal shift optabs behave, so I don't think we'd need new
optabs or new unspecs.
E.g. it already works this way when expanding double-word shifts,
which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added. There it's
possible to use a shorter sequence if you know that the shift optab
truncates the count, so we can do that even if SHIFT_COUNT_TRUNCATED