[PATCH][combine] PR rtl-optimization/68381: Only restrict pure simplification in mult-extend subst case, allow other substitutions
Kyrill Tkachov
kyrylo.tkachov@arm.com
Thu Nov 26 09:55:00 GMT 2015
On 24/11/15 00:15, Segher Boessenkool wrote:
> On Thu, Nov 19, 2015 at 03:20:22PM +0000, Kyrill Tkachov wrote:
>> Hmmm, so the answer to that is a bit further down the validate_replacement:
>> path.
>> It's the code after the big comment:
>> /* See if this is a PARALLEL of two SETs where one SET's destination is
>> a register that is unused and this isn't marked as an instruction that
>> might trap in an EH region. In that case, we just need the other SET.
>> We prefer this over the PARALLEL.
>>
>> This can occur when simplifying a divmod insn. We *must* test for this
>> case here because the code below that splits two independent SETs
>> doesn't
>> handle this case correctly when it updates the register status.
>>
>> It's pointless doing this if we originally had two sets, one from
>> i3, and one from i2. Combining then splitting the parallel results
>> in the original i2 again plus an invalid insn (which we delete).
>> The net effect is only to move instructions around, which makes
>> debug info less accurate. */
>>
>> The code extracts all the valid sets inside the PARALLEL and calls
>> recog_for_combine on them
>> individually, ignoring the clobber.
> Before I made this use is_parallel_of_n_reg_sets the code used to test
> if it is a parallel of two sets, and no clobbers allowed. So it would
> never allow a clobber of zero. But now it does. I'll fix this in
> is_parallel_of_n_reg_sets.
>
> Thanks for finding the problem!
Thanks for fixing the wrong-code issue.
As I mentioned on IRC, this patch improves codegen on aarch64 as well.
I've re-checked SPEC2006 and it seems to improve codegen around multiply-extend-accumulate
instructions. For example the sequence:
mov w4, 64
mov x1, 24
smaddl x1, w9, w4, x1 // multiply-sign-extend-accumulate
add x1, x3, x1
becomes something like this:
mov w3, 64
smaddl x1, w9, w3, x0
add x1, x1, 24 // constant 24 propagated into the add
Another was transforming the muliply-extend into something cheaper:
mov x0, 40
mov w22, 32
umaddl x22, w21, w22, x0 // multiply-zero-extend-accumulate
changed becomes:
ubfiz x22, x21, 5, 32 // ASHIFT+extend
add x22, x22, 40
which should be always beneficial.
From what I can see we don't lose any of the multiply-extend-accumulate
opportunities that we gained from the original combine patch.
So can we take this patch in as well?
Thanks,
Kyrill
> Segher
>
More information about the Gcc-patches
mailing list