This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, Given below test case, int foo (unsigned short a[], unsigned int x) { unsigned int i; for (i = 0; i < 1000; i++) { x = a[i]; a[i] = (unsigned short)(x >= 32768 ? x - 32768 : 0); } return x; } it now can be vectorized on AArch64, but generated assembly is way from optimal: .L4: ldr q4, [x3, x1] add w2, w2, 1 cmp w2, w0 ushll v1.4s, v4.4h, 0 ushll2 v0.4s, v4.8h, 0 add v3.4s, v1.4s, v6.4s add v2.4s, v0.4s, v6.4s cmhi v1.4s, v1.4s, v5.4s cmhi v0.4s, v0.4s, v5.4s and v1.16b, v3.16b, v1.16b and v0.16b, v2.16b, v0.16b xtn v2.4h, v1.4s xtn2 v2.8h, v0.4s str q2, [x3, x1] add x1, x1, 16 bcc .L4 The vectorized loop has 15 instructions, which can be greatly simplified by turning cond_expr into max_expr, as below: .L4: ldr q1, [x3, x1] add w2, w2, 1 cmp w2, w0 umax v0.8h, v1.8h, v2.8h add v0.8h, v0.8h, v2.8h str q0, [x3, x1] add x1, x1, 16 bcc .L4 This patch addresses the issue by adding new vectorization pattern. Bootstrap and test on x86_64 and AArch64. Is it OK? Thanks, bin 2016-10-11 Bin Cheng <bin.cheng@arm.com> * tree-vect-patterns.c (vect_recog_min_max_modify_pattern): New. (vect_vect_recog_func_ptrs): New element for above pattern. * tree-vectorizer.h (NUM_PATTERNS): Increase by 1. gcc/testsuite/ChangeLog 2016-10-11 Bin Cheng <bin.cheng@arm.com> * gcc.dg/vect/vect-umax-modify-pattern.c: New test. * gcc.dg/vect/vect-umin-modify-pattern.c: New test.
Attachment:
umin-max-modify-pattern-20160924.txt
Description: umin-max-modify-pattern-20160924.txt
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |