[Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
Jeff Law
law@redhat.com
Fri Aug 14 20:30:00 GMT 2015
On 08/14/2015 11:40 AM, Jiong Wang wrote:
>
> * Figuring out whether the shift source is coming from sign extension
> by checking SSA_NAME_DEF_STMT instead of deducing from tree range
> info. I fell checking the gimple statement is more reliable and
> straigtforward.
I suspect it'll also apply more often if you're looking for the
nop-conversion rather than looking at range information.
I keep thinking there ought to be a generalization here so that we're
not so restrictive on the modes, but it's probably not worth doing.
In a perfect world we'd also integrate the other strategies for
double-word shifts that exist in the various ports as special cases in
expansion and remove the double-word shift patterns for ports that don't
actually have them in hardware. But that's probably a bit much to ask
-- however, it probably couldn't hurt to see if there are any that are
easily handled.
>
> * For the pseudo register overlaping issue, given:
>
> RTX target = TREE source << TREE amount
>
> I was trying to make sure those two SSA_NAME won't get the same
> pseudo register by comparing their assigned partition using
> var_to_partition, but I can't get the associated tree representation
> for "target" which is RTX.
>
> Then I just expand "source" and compare the register number with
> "target".
Right. Which is what I would have suggested. Given two pseudos, you
can just compare them for equality. If they're unequal, then its safe
to assume they do not overlap.
>
> But I found the simplest way maybe just reorder
>
> low_part (target) = low_part (source) << amount
> high_part (target) = low_part (source) >> amount1
>
> to:
>
> high_part (target) = low_part (source) >> amount1
> low_part (target) = low_part (source) << amount
>
> then, even target and source share the same pseudo register, we can
> still get what we want, as we are operating on their subreg.
Yes. This would work too and avoids the need to find source's pseudo.
>
> * Added checking on the result value of expand_variable_shift, if it's
> not the same as "target" then call emit_move_insn to do that.
Excellent.
>
> How is the re-worked patch looks to you?
>
> x86 bootstrap OK, regression OK. aarch64 bootstrap OK.
>
> 2015-08-14 Jiong.Wang<jiong.wang@arm.com>
>
> gcc/
> * expr.c (expand_expr_real_2): Check tree format to optimize
> LSHIFT_EXPR expand.
>
> gcc/testsuite
> * gcc.dg/wide_shift_64_1.c: New testcase.
> * gcc.dg/wide_shift_128_1.c: Likewise.
> * gcc.target/aarch64/ashlti3_1.c: Likewise.
> * gcc.target/arm/ashldisi_1.c: Likewise.
> + /* Left shfit optimization:
s/shfit/shift/
> +
> + If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't
> + native instruction to support this wide mode left shift. Given below
> + example:
> +
> + Type A = (Type) B << C
> +
> + |< T >|
> + | high | low |
> +
> + |<- size ->|
> +
> + By checking tree shape, if operand B is coming from signed extension,
> + then the left shift operand can be simplified into:
> +
> + 1. high = low >> (size - C);
> + 2. low = low << C;
You'll want to reorder those to match the code you generate.
Doesn't this require that C be less than the bitsize of a word?
If C is larger than the bitsize of a word, then you need some
adjustments, something like:
1. high = low << (C - size)
2. low = 0
Does this transformation require that the wider mode be exactly 2X the
narrower mode? If so, that needs to be verified.
> + if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode)
So we're assured we have a widening conversion.
> + && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
> + >= GET_MODE_BITSIZE (word_mode)))
This test seems wrong. I'm not sure what you're really trying to test
here. It seems you want to look at the shift count relative to the
bitsize of word_mode. If it's less, then generate the code you
mentioned above in the comment. If it's more, then generate the
sequence I referenced? Right?
I do think you need to be verifying that rmode == wordmode here. If I
understand everything correctly, the final value is "mode" which must be
2X the size size of rmode/wordmode here, right?
The other question is are we endianness-safe in these transformations?
> + {
> + rtx low = simplify_gen_subreg (word_mode, op0, mode, 0);
> + rtx tlow = simplify_gen_subreg (word_mode, target, mode, 0);
> + rtx thigh = simplify_gen_subreg (word_mode, target, mode,
> + UNITS_PER_WORD);
> + HOST_WIDE_INT ramount = (BITS_PER_WORD
> + - TREE_INT_CST_LOW (treeop1));
> + tree rshift = build_int_cst (TREE_TYPE (treeop1), ramount);
> +
> + temp = expand_variable_shift (code, word_mode, low, treeop1,
> + tlow, unsignedp);
Why use "code" here right than just using LSHIFT_EXPR? As noted earlier,
Jeff
More information about the Gcc-patches
mailing list