[Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding

Fri Aug 14 20:30:00 GMT 2015

On 08/14/2015 11:40 AM, Jiong Wang wrote:
>
>    * Figuring out whether the shift source is coming from sign extension
>      by checking SSA_NAME_DEF_STMT instead of deducing from tree range
>      info. I fell checking the gimple statement is more reliable and
>      straigtforward.
I suspect it'll also apply more often if you're looking for the 
nop-conversion rather than looking at range information.

I keep thinking there ought to be a generalization here so that we're 
not so restrictive on the modes, but it's probably not worth doing.

In a perfect world we'd also integrate the other strategies for 
double-word shifts that exist in the various ports as special cases in 
expansion and remove the double-word shift patterns for ports that don't 
actually have them in hardware.  But that's probably a bit much to ask 
-- however, it probably couldn't hurt to see if there are any that are 
easily handled.

>
>    * For the pseudo register overlaping issue, given:
>
>        RTX target = TREE source << TREE amount
>
>      I was trying to make sure those two SSA_NAME won't get the same
>      pseudo register by comparing their assigned partition using
>      var_to_partition, but I can't get the associated tree representation
>      for "target" which is RTX.
>
>      Then I just expand "source" and compare the register number with
>      "target".
Right.  Which is what I would have suggested.  Given two pseudos, you 
can just compare them for equality.  If they're unequal, then its safe 
to assume they do not overlap.

>
>      But I found the simplest way maybe just reorder
>
>        low_part (target) = low_part (source) << amount
>        high_part (target) = low_part (source) >> amount1
>
>      to:
>
>        high_part (target) = low_part (source) >> amount1
>        low_part (target) = low_part (source) << amount
>
>      then, even target and source share the same pseudo register, we can
>      still get what we want, as we are operating on their subreg.
Yes.  This would work too and avoids the need to find source's pseudo.

>
>    * Added checking on the result value of expand_variable_shift, if it's
>      not the same as "target" then call emit_move_insn to do that.
Excellent.

>
>    How is the re-worked patch looks to you?
>
>    x86 bootstrap OK, regression OK. aarch64 bootstrap OK.
>
> 2015-08-14  Jiong.Wang<jiong.wang@arm.com>
>
> gcc/
>    * expr.c (expand_expr_real_2): Check tree format to optimize
>    LSHIFT_EXPR expand.
>
> gcc/testsuite
>    * gcc.dg/wide_shift_64_1.c: New testcase.
>    * gcc.dg/wide_shift_128_1.c: Likewise.
>    * gcc.target/aarch64/ashlti3_1.c: Likewise.
>    * gcc.target/arm/ashldisi_1.c: Likewise.

> +	/* Left shfit optimization:
s/shfit/shift/

> +
> +	   If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't
> +	   native instruction to support this wide mode left shift.  Given below
> +	   example:
> +
> +	    Type A = (Type) B  << C
> +
> +            |<           T	    >|
> +            |   high     |   low     |
> +
> +                         |<- size  ->|
> +
> +	   By checking tree shape, if operand B is coming from signed extension,
> +	   then the left shift operand can be simplified into:
> +
> +	      1. high = low >> (size - C);
> +	      2. low = low << C;
You'll want to reorder those to match the code you generate.

Doesn't this require that C be less than the bitsize of a word?

If C is larger than the bitsize of a word, then you need some 
adjustments, something like:

	      1. high = low << (C - size)
               2. low = 0

Does this transformation require that the wider mode be exactly 2X the 
narrower mode?  If so, that needs to be verified.

> +		if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode)
So we're assured we have a widening conversion.

> +		    && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
> +			>= GET_MODE_BITSIZE (word_mode)))
This test seems wrong.  I'm not sure what you're really trying to test 
here.  It seems you want to look at the shift count relative to the 
bitsize of word_mode.  If it's less, then generate the code you 
mentioned above in the comment.  If it's more, then generate the 
sequence I referenced?  Right?

I do think you need to be verifying that rmode == wordmode here.  If I 
understand everything correctly, the final value is "mode" which must be 
2X the size size of rmode/wordmode here, right?

The other question is are we endianness-safe in these transformations?

> +		  {
> +		    rtx low = simplify_gen_subreg (word_mode, op0, mode, 0);
> +		    rtx tlow = simplify_gen_subreg (word_mode, target, mode, 0);
> +		    rtx thigh = simplify_gen_subreg (word_mode, target, mode,
> +						     UNITS_PER_WORD);
> +		    HOST_WIDE_INT ramount = (BITS_PER_WORD
> +					     - TREE_INT_CST_LOW (treeop1));
> +		    tree rshift = build_int_cst (TREE_TYPE (treeop1), ramount);
> +
> +		    temp = expand_variable_shift (code, word_mode, low, treeop1,
> +						  tlow, unsignedp);
Why use "code" here right than just using LSHIFT_EXPR?  As noted earlier,

Jeff