[PATCH] Canonicalize (vec_duplicate (not A)) to (not (vec_duplicate A)).

Liu, Hongtao hongtao.liu@intel.com
Thu Jun 3 11:03:43 GMT 2021

>-----Original Message-----
>From: Segher Boessenkool <segher@kernel.crashing.org>
>Sent: Thursday, June 3, 2021 4:46 AM
>To: Richard Biener <richard.guenther@gmail.com>
>Cc: Liu, Hongtao <hongtao.liu@intel.com>; GCC Patches <gcc-
>Subject: Re: [PATCH] Canonicalize (vec_duplicate (not A)) to (not
>(vec_duplicate A)).
>On Wed, Jun 02, 2021 at 09:07:35AM +0200, Richard Biener wrote:
>> On Wed, Jun 2, 2021 at 7:41 AM liuhongt via Gcc-patches
>> <gcc-patches@gcc.gnu.org> wrote:
>> > For i386, it will enable below opt
>> >
>> > from
>> >         notl    %edi
>> >         vpbroadcastd    %edi, %xmm0
>> >         vpand   %xmm1, %xmm0, %xmm0
>> > to
>> >         vpbroadcastd    %edi, %xmm0
>> >         vpandn   %xmm1, %xmm0, %xmm0
>> There will be cases where (vec_duplicate (not A)) is better than (not
>> (vec_duplicate A)), so I'm not sure it is a good idea to forcefully
>> canonicalize unary operations.
>It is two unaries in sequence, where the order does not matter either.
>As in all such cases you either have to handle both cases everywhere, or have
>a canonical order.
>> I suppose the
>> simplification happens inside combine
>combine uses simplify-rtx for most cases (it is part of combine, but used in
>quite a few other places these days).
>> - doesn't combine
>> already have code to try variants of an expression and isn't this a
>> good candidate that can be added there, avoiding the canonicalization?
>As I mentioned, this is done in simplify-rtx in cases that do not have a
>canonical representation.  This is critical because it prevents loops.
>A very typical example is how UMIN is optimised:
>   case UMIN:
>      if (trueop1 == CONST0_RTX (mode) && ! side_effects_p (op0))
>	return op1;
>      if (rtx_equal_p (trueop0, trueop1) && ! side_effects_p (op0))
>	return op0;
>      tem = simplify_associative_operation (code, mode, op0, op1);
>      if (tem)
>	return tem;
>      break;
>(the stuff using "tem").
>Hongtao, can we do something similar here?  Does that work well?  Please try
>it out :-)

In simplify_rtx, no simplication occurs, there is just the difference between
 (vec_duplicate (not REG)) and (not (vec_duplicate (REG)). So here tem will only be 0.
Basically we don't know it's a simplication until combine successfully split the
3->2 instructions (not + broadcast + and to andnot + broadcast), but it's pretty awkward
to do this in combine.

Consider andnot is existed for many backends, I think a canonicalization is needed here.
Maybe we can add insn canonicalization for transforming (and (vect_duplicate (not A)) B) to 
(and (not (duplicate (not A)) B) instead of (vec_duplicate (not A)) to (not (vec_duplicate A))?


More information about the Gcc-patches mailing list