This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [14/n] PR85694: Rework overwidening detection
- From: Richard Sandiford <richard dot sandiford at arm dot com>
- To: Christophe Lyon <christophe dot lyon at linaro dot org>
- Cc: gcc Patches <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 02 Jul 2018 14:37:22 +0100
- Subject: Re: [14/n] PR85694: Rework overwidening detection
- References: <878t79eob8.fsf@arm.com> <87a7rdbzam.fsf@arm.com> <CAKdteOZtgC5JaNFeScDTQ0AtrKCJ+VPQvGCDGPLJ4ciPZ3r63Q@mail.gmail.com>
Christophe Lyon <christophe.lyon@linaro.org> writes:
> On Fri, 29 Jun 2018 at 13:36, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>> > This patch is the main part of PR85694. The aim is to recognise at least:
>> >
>> > signed char *a, *b, *c;
>> > ...
>> > for (int i = 0; i < 2048; i++)
>> > c[i] = (a[i] + b[i]) >> 1;
>> >
>> > as an over-widening pattern, since the addition and shift can be done
>> > on shorts rather than ints. However, it ended up being a lot more
>> > general than that.
>> >
>> > The current over-widening pattern detection is limited to a few simple
>> > cases: logical ops with immediate second operands, and shifts by a
>> > constant. These cases are enough for common pixel-format conversion
>> > and can be detected in a peephole way.
>> >
>> > The loop above requires two generalisations of the current code: support
>> > for addition as well as logical ops, and support for non-constant second
>> > operands. These are harder to detect in the same peephole way, so the
>> > patch tries to take a more global approach.
>> >
>> > The idea is to get information about the minimum operation width
>> > in two ways:
>> >
>> > (1) by using the range information attached to the SSA_NAMEs
>> > (effectively a forward walk, since the range info is
>> > context-independent).
>> >
>> > (2) by back-propagating the number of output bits required by
>> > users of the result.
>> >
>> > As explained in the comments, there's a balance to be struck between
>> > narrowing an individual operation and fitting in with the surrounding
>> > code. The approach is pretty conservative: if we could narrow an
>> > operation to N bits without changing its semantics, it's OK to do that if:
>> >
>> > - no operations later in the chain require more than N bits; or
>> >
>> > - all internally-defined inputs are extended from N bits or fewer,
>> > and at least one of them is single-use.
>> >
>> > See the comments for the rationale.
>> >
>> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
>> > since the code seemed more readable without.
>> >
>> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
>>
>> Here's a version rebased on top of current trunk. Changes from last time:
>>
>> - reintroduce dump_generic_expr_loc, with the obvious change to the
>> prototype
>>
>> - fix a typo in a comment
>>
>> - use vect_element_precision from the new version of 12/n.
>>
>> Tested as before. OK to install?
>>
>
> Hi Richard,
>
> This patch introduces regressions on arm-none-linux-gnueabihf:
> gcc.dg/vect/vect-over-widen-1-big-array.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-1-big-array.c scan-tree-dump-times
> vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-4-big-array.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-4-big-array.c scan-tree-dump-times
> vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-widen-shift-s16.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 8
> gcc.dg/vect/vect-widen-shift-s16.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 8
> gcc.dg/vect/vect-widen-shift-s8.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
> gcc.dg/vect/vect-widen-shift-s8.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 1
> gcc.dg/vect/vect-widen-shift-u16.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
> gcc.dg/vect/vect-widen-shift-u16.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 1
> gcc.dg/vect/vect-widen-shift-u8.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-widen-shift-u8.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 2
Sorry about that, was caused by a stupid typo. I've applied the
below as obvious.
(For the record, it was actually 12/n that caused this. 14/n hasn't
been applied yet.)
Thanks,
Richard
2018-07-02 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-patterns.c (vect_recog_widen_shift_pattern): Fix typo
in dump string.
Index: gcc/tree-vect-patterns.c
===================================================================
--- gcc/tree-vect-patterns.c 2018-07-02 14:30:57.000000000 +0100
+++ gcc/tree-vect-patterns.c 2018-07-02 14:30:57.383750450 +0100
@@ -1739,7 +1739,7 @@ vect_recog_widen_shift_pattern (vec<gimp
{
return vect_recog_widen_op_pattern (stmts, type_out, LSHIFT_EXPR,
WIDEN_LSHIFT_EXPR, true,
- "vect_widen_shift_pattern");
+ "vect_recog_widen_shift_pattern");
}
/* Detect a rotate pattern wouldn't be otherwise vectorized: