This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][combine] PR rtl-optimization/68381: Only restrict pure simplification in mult-extend subst case, allow other substitutions
- From: Kyrill Tkachov <kyrylo dot tkachov at arm dot com>
- To: Segher Boessenkool <segher at kernel dot crashing dot org>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 26 Nov 2015 09:50:50 +0000
- Subject: Re: [PATCH][combine] PR rtl-optimization/68381: Only restrict pure simplification in mult-extend subst case, allow other substitutions
- Authentication-results: sourceware.org; auth=none
- References: <564DA3D1 dot 10905 at arm dot com> <20151119105739 dot GA32343 at gate dot crashing dot org> <564DD0ED dot 9010604 at arm dot com> <20151119144102 dot GB32343 at gate dot crashing dot org> <564DE415 dot 1090600 at arm dot com> <564DE8B6 dot 9040902 at arm dot com> <20151124001553 dot GA9049 at gate dot crashing dot org>
On 24/11/15 00:15, Segher Boessenkool wrote:
On Thu, Nov 19, 2015 at 03:20:22PM +0000, Kyrill Tkachov wrote:
Hmmm, so the answer to that is a bit further down the validate_replacement:
path.
It's the code after the big comment:
/* See if this is a PARALLEL of two SETs where one SET's destination is
a register that is unused and this isn't marked as an instruction that
might trap in an EH region. In that case, we just need the other SET.
We prefer this over the PARALLEL.
This can occur when simplifying a divmod insn. We *must* test for this
case here because the code below that splits two independent SETs
doesn't
handle this case correctly when it updates the register status.
It's pointless doing this if we originally had two sets, one from
i3, and one from i2. Combining then splitting the parallel results
in the original i2 again plus an invalid insn (which we delete).
The net effect is only to move instructions around, which makes
debug info less accurate. */
The code extracts all the valid sets inside the PARALLEL and calls
recog_for_combine on them
individually, ignoring the clobber.
Before I made this use is_parallel_of_n_reg_sets the code used to test
if it is a parallel of two sets, and no clobbers allowed. So it would
never allow a clobber of zero. But now it does. I'll fix this in
is_parallel_of_n_reg_sets.
Thanks for finding the problem!
Thanks for fixing the wrong-code issue.
As I mentioned on IRC, this patch improves codegen on aarch64 as well.
I've re-checked SPEC2006 and it seems to improve codegen around multiply-extend-accumulate
instructions. For example the sequence:
mov w4, 64
mov x1, 24
smaddl x1, w9, w4, x1 // multiply-sign-extend-accumulate
add x1, x3, x1
becomes something like this:
mov w3, 64
smaddl x1, w9, w3, x0
add x1, x1, 24 // constant 24 propagated into the add
Another was transforming the muliply-extend into something cheaper:
mov x0, 40
mov w22, 32
umaddl x22, w21, w22, x0 // multiply-zero-extend-accumulate
changed becomes:
ubfiz x22, x21, 5, 32 // ASHIFT+extend
add x22, x22, 40
which should be always beneficial.
From what I can see we don't lose any of the multiply-extend-accumulate
opportunities that we gained from the original combine patch.
So can we take this patch in as well?
Thanks,
Kyrill
Segher