Test normalizing 3D coordinates failed to vectorize #include <math.h> struct XYZ { float x; float y; float z; }; void norm (struct XYZ *in, struct XYZ *out, int size) { int i; for (i = 0; i < size; ++i) { float n = sqrt(in[i].x * in[i].x + in[i].y * in[i].y + in[i].z * in[i].z); out[i].x = in[i].x / n; out[i].y = in[i].y / n; out[i].z = in[i].z / n; } } gcc norm.c -Ofast -S -mssse3 -fdump-tree-vect-details norm.c:14:3: note: type of def: 3. norm.c:14:3: note: no array mode for V4SF[3] norm.c:14:3: note: the size of the group of accesses is not a power of 2 norm.c:14:3: note: not vectorized: relevant stmt not supported: _19->x = _20; Vectorization should give ~40% gain on x86
Author: kyukhin Date: Wed Jun 18 07:46:18 2014 New Revision: 211769 URL: https://gcc.gnu.org/viewcvs?rev=211769&root=gcc&view=rev Log: gcc/ * config/i386/i386.c (ix86_reassociation_width): Add alternative for vector case. * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New. * config/i386/x86-tune.def (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New. * tree-vect-data-refs.c (vect_shift_permute_load_chain): New. Introduces alternative way of loads group permutaions. (vect_transform_grouped_load): Try alternative way of permutations. gcc/testsuite/ PR tree-optimization/52252 * gcc.target/i386/pr52252-atom.c: Test on loads group of size 3. * gcc.target/i386/pr52252-core.c: Ditto. PR tree-optimization/61403 * gcc.target/i386/pr61403.c: Test on loads and stores group of size 3. Added: trunk/gcc/testsuite/gcc.target/i386/pr52252-atom.c trunk/gcc/testsuite/gcc.target/i386/pr52252-core.c trunk/gcc/testsuite/gcc.target/i386/pr61403.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h trunk/gcc/config/i386/x86-tune.def trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-data-refs.c
Author: jakub Date: Fri Oct 3 18:16:09 2014 New Revision: 215866 URL: https://gcc.gnu.org/viewcvs?rev=215866&root=gcc&view=rev Log: PR tree-optimization/61403 * config/i386/i386.c (expand_vec_perm_palignr): Fix a spelling error in comment. Also optimize 256-bit vectors for AVX2 or AVX (floating vectors only), provided the first permutation can be performed in one insn. * gcc.dg/torture/vshuf-32.inc: Add a new test 29. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/torture/vshuf-32.inc
So is this fixed now?
(In reply to Jakub Jelinek from comment #3) > So is this fixed now? Yes. It is fixed.