As mentioned in [1], page 12, following code: --cut here-- #define N 1024 unsigned char src1[N], src2[N], dst[N]; void foo (void) { int i; for (i = 0; i < N; i++) dst[i] = (src1[i] + src2[i] + 1) >> 1; } --cut here-- should vectorize with pavgb instruction. [1] http://llvm.org/devmtg/2018-04/slides/Das-An%20Introduction%20to%20AMD%20Optimizing%20Compiler.pdf
Confirmed. Note that in C you are writing 'int' arithmetic which we do not shorten which confuses us enough to vectorize this with 'int' operations. The main part, missed pavgb vectorization, is confirmed as well though with more intelligent vectorization using v16qi vectors combine could come to the rescue as well here (4 insn combine, so not very likely - also needs unconditional shift / division support for v16qi).
Hmm, but if you have 255 + 255 + 1 then you need to use pavgw at least, otherwise the vectorization isn't semantically equivalent? Or do the instructions compute the intermediate results in greater precision than 8 bits? The specification doesn't seem to tell. Can you clarify?
(In reply to Richard Biener from comment #2) > Hmm, but if you have 255 + 255 + 1 then you need to use pavgw at least, > otherwise the vectorization isn't semantically equivalent? Or do the > instructions compute > the intermediate results in greater precision than 8 bits? The specification > doesn't seem to tell. > > Can you clarify? According to [1], intermediate result has 9 bit precision for pavgb and 17bit precision for pavgw. [1] http://www.felixcloutier.com/x86/PAVGB:PAVGW.html
OK, so adding another pattern plus IFN would be the canonical way of vectorizing this.
We'd identified a similar problem with the corresponding AArch64 instruction. Hope to get round to this next week.
Author: rsandifo Date: Tue Jul 3 10:03:44 2018 New Revision: 262335 URL: https://gcc.gnu.org/viewcvs?rev=262335&root=gcc&view=rev Log: [16/n] PR85694: Add detection of averaging operations This patch adds detection of average instructions: a = (((wide) b + (wide) c) >> 1); --> a = (wide) .AVG_FLOOR (b, c); a = (((wide) b + (wide) c + 1) >> 1); --> a = (wide) .AVG_CEIL (b, c); in cases where users of "a" need only the low half of the result, making the cast to (wide) redundant. The heavy lifting was done by earlier patches. This showed up another problem in vectorizable_call: if the call is a pattern definition statement rather than the main pattern statement, the type of vectorised call might be different from the type of the original statement. 2018-07-03 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR tree-optimization/85694 * doc/md.texi (avgM3_floor, uavgM3_floor, avgM3_ceil) (uavgM3_ceil): Document new optabs. * doc/sourcebuild.texi (vect_avg_qi): Document new target selector. * internal-fn.def (IFN_AVG_FLOOR, IFN_AVG_CEIL): New internal functions. * optabs.def (savg_floor_optab, uavg_floor_optab, savg_ceil_optab) (savg_ceil_optab): New optabs. * tree-vect-patterns.c (vect_recog_average_pattern): New function. (vect_vect_recog_func_ptrs): Add it. * tree-vect-stmts.c (vectorizable_call): Get the type of the zero constant directly from the associated lhs. gcc/testsuite/ PR tree-optimization/85694 * lib/target-supports.exp (check_effective_target_vect_avg_qi): New proc. * gcc.dg/vect/vect-avg-1.c: New test. * gcc.dg/vect/vect-avg-2.c: Likewise. * gcc.dg/vect/vect-avg-3.c: Likewise. * gcc.dg/vect/vect-avg-4.c: Likewise. * gcc.dg/vect/vect-avg-5.c: Likewise. * gcc.dg/vect/vect-avg-6.c: Likewise. * gcc.dg/vect/vect-avg-7.c: Likewise. * gcc.dg/vect/vect-avg-8.c: Likewise. * gcc.dg/vect/vect-avg-9.c: Likewise. * gcc.dg/vect/vect-avg-10.c: Likewise. * gcc.dg/vect/vect-avg-11.c: Likewise. * gcc.dg/vect/vect-avg-12.c: Likewise. * gcc.dg/vect/vect-avg-13.c: Likewise. * gcc.dg/vect/vect-avg-14.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/vect/vect-avg-1.c trunk/gcc/testsuite/gcc.dg/vect/vect-avg-10.c trunk/gcc/testsuite/gcc.dg/vect/vect-avg-11.c trunk/gcc/testsuite/gcc.dg/vect/vect-avg-12.c trunk/gcc/testsuite/gcc.dg/vect/vect-avg-13.c trunk/gcc/testsuite/gcc.dg/vect/vect-avg-14.c trunk/gcc/testsuite/gcc.dg/vect/vect-avg-2.c trunk/gcc/testsuite/gcc.dg/vect/vect-avg-3.c trunk/gcc/testsuite/gcc.dg/vect/vect-avg-4.c trunk/gcc/testsuite/gcc.dg/vect/vect-avg-5.c trunk/gcc/testsuite/gcc.dg/vect/vect-avg-6.c trunk/gcc/testsuite/gcc.dg/vect/vect-avg-7.c trunk/gcc/testsuite/gcc.dg/vect/vect-avg-8.c trunk/gcc/testsuite/gcc.dg/vect/vect-avg-9.c Modified: trunk/gcc/ChangeLog trunk/gcc/doc/md.texi trunk/gcc/doc/sourcebuild.texi trunk/gcc/internal-fn.def trunk/gcc/optabs.def trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/lib/target-supports.exp trunk/gcc/tree-vect-patterns.c trunk/gcc/tree-vect-stmts.c
Author: rsandifo Date: Tue Jul 3 14:27:28 2018 New Revision: 262347 URL: https://gcc.gnu.org/viewcvs?rev=262347&root=gcc&view=rev Log: [17/n] PR85694: AArch64 support for AVG_FLOOR/CEIL This patch adds AArch64 patterns for the new AVG_FLOOR/CEIL operations. AVG_FLOOR is [SU]HADD and AVG_CEIL is [SU]RHADD. 2018-07-03 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR tree-optimization/85694 * config/aarch64/iterators.md (HADD, RHADD): New int iterators. (u): Handle UNSPEC_SHADD, UNSPEC_UHADD, UNSPEC_SRHADD and UNSPEC_URHADD. * config/aarch64/aarch64-simd.md (<u>avg<mode>3_floor) (<u>avg<mode>3_ceil): New patterns. gcc/testsuite/ PR tree-optimization/85694 * lib/target-supports.exp (check_effective_target_vect_avg_qi): Return true for AArch64 without SVE. * gcc.target/aarch64/vect_hadd_1.h: New file. * gcc.target/aarch64/vect_shadd_1.c: New test. * gcc.target/aarch64/vect_srhadd_1.c: Likewise. * gcc.target/aarch64/vect_uhadd_1.c: Likewise. * gcc.target/aarch64/vect_urhadd_1.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/aarch64/vect_hadd_1.h trunk/gcc/testsuite/gcc.target/aarch64/vect_shadd_1.c trunk/gcc/testsuite/gcc.target/aarch64/vect_srhadd_1.c trunk/gcc/testsuite/gcc.target/aarch64/vect_uhadd_1.c trunk/gcc/testsuite/gcc.target/aarch64/vect_urhadd_1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64-simd.md trunk/gcc/config/aarch64/iterators.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/lib/target-supports.exp
Fixed on trunk for AArch64. Please could someone define the appropriate avgM3_floor, uavgM3_floor, avgM3_ceil and uavgM3_ceil patterns for x86?
Created attachment 44348 [details] x86 target patch I'm testing the attached patch for x86 targets.
Author: uros Date: Tue Jul 3 17:33:28 2018 New Revision: 262354 URL: https://gcc.gnu.org/viewcvs?rev=262354&root=gcc&view=rev Log: PR target/85694 * config/i386/sse.md (uavg<mode>3_ceil): New expander. (<sse2_avx2>_uavg<mode>3<mask_name>): Simplify expander. testsuite/ChangeLog: PR target/85694 * gcc.target/i386/pr85694.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr85694.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
Fixed also for x86 targets.