Bug 85694 - Generation of vectorized AVG (Average) instruction
Summary: Generation of vectorized AVG (Average) instruction
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: unknown
: P3 normal
Target Milestone: 9.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2018-05-08 14:04 UTC by Uroš Bizjak
Modified: 2018-07-03 17:39 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-*-*, i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-05-08 00:00:00


Attachments
x86 target patch (822 bytes, patch)
2018-07-03 16:05 UTC, Uroš Bizjak
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Uroš Bizjak 2018-05-08 14:04:46 UTC
As mentioned in [1], page 12, following code:

--cut here--
#define N 1024

unsigned char src1[N], src2[N], dst[N];

void foo (void)
{
  int i;

  for (i = 0; i < N; i++)
    dst[i] = (src1[i] + src2[i] + 1) >> 1;
}
--cut here--

should vectorize with pavgb instruction.

[1] http://llvm.org/devmtg/2018-04/slides/Das-An%20Introduction%20to%20AMD%20Optimizing%20Compiler.pdf
Comment 1 Richard Biener 2018-05-08 14:27:05 UTC
Confirmed.  Note that in C you are writing 'int' arithmetic which we do not
shorten which confuses us enough to vectorize this with 'int' operations.

The main part, missed pavgb vectorization, is confirmed as well though with
more intelligent vectorization using v16qi vectors combine could come to
the rescue as well here (4 insn combine, so not very likely - also needs
unconditional shift / division support for v16qi).
Comment 2 Richard Biener 2018-05-08 14:29:51 UTC
Hmm, but if you have 255 + 255 + 1 then you need to use pavgw at least, otherwise the vectorization isn't semantically equivalent?  Or do the instructions compute
the intermediate results in greater precision than 8 bits?  The specification
doesn't seem to tell.

Can you clarify?
Comment 3 Uroš Bizjak 2018-05-08 16:18:33 UTC
(In reply to Richard Biener from comment #2)
> Hmm, but if you have 255 + 255 + 1 then you need to use pavgw at least,
> otherwise the vectorization isn't semantically equivalent?  Or do the
> instructions compute
> the intermediate results in greater precision than 8 bits?  The specification
> doesn't seem to tell.
> 
> Can you clarify?

According to [1], intermediate result has 9 bit precision for pavgb and 17bit precision for pavgw.

[1] http://www.felixcloutier.com/x86/PAVGB:PAVGW.html
Comment 4 Richard Biener 2018-05-09 07:48:38 UTC
OK, so adding another pattern plus IFN would be the canonical way of vectorizing this.
Comment 5 Richard Sandiford 2018-05-09 08:59:09 UTC
We'd identified a similar problem with the corresponding AArch64 instruction.  Hope to get round to this next week.
Comment 6 Richard Sandiford 2018-07-03 10:04:16 UTC
Author: rsandifo
Date: Tue Jul  3 10:03:44 2018
New Revision: 262335

URL: https://gcc.gnu.org/viewcvs?rev=262335&root=gcc&view=rev
Log:
[16/n] PR85694: Add detection of averaging operations

This patch adds detection of average instructions:

       a = (((wide) b + (wide) c) >> 1);
   --> a = (wide) .AVG_FLOOR (b, c);

       a = (((wide) b + (wide) c + 1) >> 1);
   --> a = (wide) .AVG_CEIL (b, c);

in cases where users of "a" need only the low half of the result,
making the cast to (wide) redundant.  The heavy lifting was done by
earlier patches.

This showed up another problem in vectorizable_call: if the call is a
pattern definition statement rather than the main pattern statement,
the type of vectorised call might be different from the type of the
original statement.

2018-07-03  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR tree-optimization/85694
	* doc/md.texi (avgM3_floor, uavgM3_floor, avgM3_ceil)
	(uavgM3_ceil): Document new optabs.
	* doc/sourcebuild.texi (vect_avg_qi): Document new target selector.
	* internal-fn.def (IFN_AVG_FLOOR, IFN_AVG_CEIL): New internal
	functions.
	* optabs.def (savg_floor_optab, uavg_floor_optab, savg_ceil_optab)
	(savg_ceil_optab): New optabs.
	* tree-vect-patterns.c (vect_recog_average_pattern): New function.
	(vect_vect_recog_func_ptrs): Add it.
	* tree-vect-stmts.c (vectorizable_call): Get the type of the zero
	constant directly from the associated lhs.

gcc/testsuite/
	PR tree-optimization/85694
	* lib/target-supports.exp (check_effective_target_vect_avg_qi): New
	proc.
	* gcc.dg/vect/vect-avg-1.c: New test.
	* gcc.dg/vect/vect-avg-2.c: Likewise.
	* gcc.dg/vect/vect-avg-3.c: Likewise.
	* gcc.dg/vect/vect-avg-4.c: Likewise.
	* gcc.dg/vect/vect-avg-5.c: Likewise.
	* gcc.dg/vect/vect-avg-6.c: Likewise.
	* gcc.dg/vect/vect-avg-7.c: Likewise.
	* gcc.dg/vect/vect-avg-8.c: Likewise.
	* gcc.dg/vect/vect-avg-9.c: Likewise.
	* gcc.dg/vect/vect-avg-10.c: Likewise.
	* gcc.dg/vect/vect-avg-11.c: Likewise.
	* gcc.dg/vect/vect-avg-12.c: Likewise.
	* gcc.dg/vect/vect-avg-13.c: Likewise.
	* gcc.dg/vect/vect-avg-14.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-1.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-10.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-11.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-12.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-13.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-14.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-2.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-3.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-4.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-5.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-6.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-7.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-8.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-avg-9.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/doc/md.texi
    trunk/gcc/doc/sourcebuild.texi
    trunk/gcc/internal-fn.def
    trunk/gcc/optabs.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/lib/target-supports.exp
    trunk/gcc/tree-vect-patterns.c
    trunk/gcc/tree-vect-stmts.c
Comment 7 Richard Sandiford 2018-07-03 14:28:00 UTC
Author: rsandifo
Date: Tue Jul  3 14:27:28 2018
New Revision: 262347

URL: https://gcc.gnu.org/viewcvs?rev=262347&root=gcc&view=rev
Log:
[17/n] PR85694: AArch64 support for AVG_FLOOR/CEIL

This patch adds AArch64 patterns for the new AVG_FLOOR/CEIL operations.
AVG_FLOOR is [SU]HADD and AVG_CEIL is [SU]RHADD.

2018-07-03  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR tree-optimization/85694
	* config/aarch64/iterators.md (HADD, RHADD): New int iterators.
	(u): Handle UNSPEC_SHADD, UNSPEC_UHADD, UNSPEC_SRHADD and
	UNSPEC_URHADD.
	* config/aarch64/aarch64-simd.md (<u>avg<mode>3_floor)
	(<u>avg<mode>3_ceil): New patterns.

gcc/testsuite/
	PR tree-optimization/85694
	* lib/target-supports.exp (check_effective_target_vect_avg_qi):
	Return true for AArch64 without SVE.
	* gcc.target/aarch64/vect_hadd_1.h: New file.
	* gcc.target/aarch64/vect_shadd_1.c: New test.
	* gcc.target/aarch64/vect_srhadd_1.c: Likewise.
	* gcc.target/aarch64/vect_uhadd_1.c: Likewise.
	* gcc.target/aarch64/vect_urhadd_1.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.target/aarch64/vect_hadd_1.h
    trunk/gcc/testsuite/gcc.target/aarch64/vect_shadd_1.c
    trunk/gcc/testsuite/gcc.target/aarch64/vect_srhadd_1.c
    trunk/gcc/testsuite/gcc.target/aarch64/vect_uhadd_1.c
    trunk/gcc/testsuite/gcc.target/aarch64/vect_urhadd_1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/aarch64/aarch64-simd.md
    trunk/gcc/config/aarch64/iterators.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/lib/target-supports.exp
Comment 8 Richard Sandiford 2018-07-03 16:01:57 UTC
Fixed on trunk for AArch64.  Please could someone define the appropriate avgM3_floor, uavgM3_floor, avgM3_ceil and uavgM3_ceil patterns for x86?
Comment 9 Uroš Bizjak 2018-07-03 16:05:08 UTC
Created attachment 44348 [details]
x86 target patch

I'm testing the attached patch for x86 targets.
Comment 10 uros 2018-07-03 17:34:00 UTC
Author: uros
Date: Tue Jul  3 17:33:28 2018
New Revision: 262354

URL: https://gcc.gnu.org/viewcvs?rev=262354&root=gcc&view=rev
Log:
	PR target/85694
	* config/i386/sse.md (uavg<mode>3_ceil): New expander.
	(<sse2_avx2>_uavg<mode>3<mask_name>): Simplify expander.

testsuite/ChangeLog:

	PR target/85694
	* gcc.target/i386/pr85694.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr85694.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
Comment 11 Uroš Bizjak 2018-07-03 17:39:42 UTC
Fixed also for x86 targets.