This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] Vectorizing abs(char/short/int) on x86.
- From: Cong Hou <congh at google dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: "Joseph S. Myers" <joseph at codesourcery dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Biener <rguenther at suse dot de>
- Date: Thu, 24 Oct 2013 17:09:35 -0700
- Subject: Re: [PATCH] Vectorizing abs(char/short/int) on x86.
- Authentication-results: sourceware.org; auth=none
- References: <CAK=A3=3J2J_ZTT3qxO6z32kSNpA7dVFobYg=SQNi0udMY8E5VA at mail dot gmail dot com> <Pine dot LNX dot 4 dot 64 dot 1310231548200 dot 28882 at digraph dot polyomino dot org dot uk> <CAK=A3=3-GYLuGa5Di7fD7Cr+Kj0P_nhWLBh2wetE7dB0w4X3uQ at mail dot gmail dot com> <20131024061857 dot GX30970 at tucnak dot zalov dot cz>
On Wed, Oct 23, 2013 at 11:18 PM, Jakub Jelinek <email@example.com> wrote:
> On Wed, Oct 23, 2013 at 09:40:21PM -0700, Cong Hou wrote:
>> On Wed, Oct 23, 2013 at 8:52 AM, Joseph S. Myers
>> <firstname.lastname@example.org> wrote:
>> > On Tue, 22 Oct 2013, Cong Hou wrote:
>> >> For abs(char/short), type conversions are needed as the current abs()
>> >> function/operation does not accept argument of char/short type.
>> >> Therefore when we want to get the absolute value of a char_val using
>> >> abs (char_val), it will be converted into abs ((int) char_val). It
>> >> then can be vectorized, but the generated code is not efficient as
>> >> lots of packings and unpackings are envolved. But if we convert
>> >> (char) abs ((int) char_val) to abs (char_val), the vectorizer will be
>> >> able to generate better code. Same for short.
>> > ABS_EXPR has undefined overflow behavior. Thus, abs ((int) -128) is
>> > defined (and we also define the subsequent conversion of +128 to signed
>> > char, which ISO C makes implementation-defined not undefined), and
>> > converting to an ABS_EXPR on char would wrongly make it undefined. For
>> > such a transformation to be valid (in the absence of VRP saying that -128
>> > isn't a possible value) you'd need a GIMPLE representation for
>> > ABS_EXPR<overflow:wrap>, as distinct from ABS_EXPR<overflow:undefined>.
>> > You don't have the option there is for some arithmetic operations of
>> > converting to a corresponding operation on unsigned types.
>> Yes, you are right. The method I use can guarantee wrapping on
>> overflow (either shift-xor-sub or max(x, -x)). Can I just add the
>> condition if (flag_wrapv) before the conversion I made to prevent the
>> undefined behavior on overflow?
> What HW insns you expand to is one thing, but if some GCC pass assumes that
> ABS_EXPR always returns non-negative value (many do, look e.g. at
> tree_unary_nonnegative_warnv_p, extract_range_from_unary_expr_1,
> simplify_const_relational_operation, etc., you'd need to grep for all
> ABS_EXPR/ABS occurrences) and optimizes code based on that fact, you get
> wrong code because (char) abs((char) -128) is well defined.
> If we change ABS_EXPR/ABS definition that it is well defined on the most
> negative value of the typ (resp. mode), then we loose all those
> optimizations, if we do that only for the char/short types, it would be
> quite weird, though we could keep the benefits, but at the RTL level we'd
> need to treat that way all the modes equal to short's mode and smaller (so,
> for sizeof(short) == sizeof(int) target even int's mode).
I checked those functions and they all consider the possibility of
overflow. For example, tree_unary_nonnegative_warnv_p only returns
true for ABS_EXPR on integers if overflow is undefined. If the
consequence of overflow is wrapping, I think converting (char)
abs((int)-128) to abs(-128) (-128 has char type) is safe. Can we do it
by checking flag_wrapv?
I could also first remove the abs conversion content from this patch
but only keep the content of expanding abs() for i386. I will submit
> The other possibility is not to create the ABS_EXPRs of char/short anywhere,
> solve the vectorization issues either through tree-vect-patterns.c or
> as part of the vectorization type demotion/promotions, see the recent
> discussions for that, you'd represent the short/char abs for the vectorized
> loop say using the shift-xor-sub or builtin etc. and if you want to do the
> same thing for scalar code, you'd just have combiner try to match some
Yes, I could do it through tree-vect-patterns.c, if the abs conversion
is prohibited. Currently the only reason I need the abs conversion is
Vectorization type demotion/promotions is interesting, but I am afraid
we will face the same problem there.
Thank you for your comment!