Bug 61403 - An opportunity for x86 gcc vectorizer (~40% gain)
Summary: An opportunity for x86 gcc vectorizer (~40% gain)
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 5.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2014-06-03 13:15 UTC by Stupachenko Evgeny
Modified: 2015-01-22 15:04 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stupachenko Evgeny 2014-06-03 13:15:56 UTC
Test normalizing 3D coordinates failed to vectorize

#include <math.h> 
 
struct XYZ 
{ 
  float x; 
  float y; 
  float z; 
}; 
 
void 
norm (struct XYZ *in, struct XYZ *out, int size) 
{ 
  int i; 
  for (i = 0; i < size; ++i) 
    { 
      float n = sqrt(in[i].x * in[i].x + in[i].y * in[i].y + in[i].z * in[i].z); 
      out[i].x = in[i].x / n;
      out[i].y = in[i].y / n;
      out[i].z = in[i].z / n;
    } 
} 

gcc norm.c -Ofast -S -mssse3 -fdump-tree-vect-details

norm.c:14:3: note: type of def: 3. 
norm.c:14:3: note: no array mode for V4SF[3] 
norm.c:14:3: note: the size of the group of accesses is not a power of 2
norm.c:14:3: note: not vectorized: relevant stmt not supported: _19->x = _20;

Vectorization should give ~40% gain on x86
Comment 1 Kirill Yukhin 2014-06-18 07:46:50 UTC
Author: kyukhin
Date: Wed Jun 18 07:46:18 2014
New Revision: 211769

URL: https://gcc.gnu.org/viewcvs?rev=211769&root=gcc&view=rev
Log:
gcc/
	* config/i386/i386.c (ix86_reassociation_width): Add alternative for
	vector case.
	* config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New.
	* config/i386/x86-tune.def (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New.
	* tree-vect-data-refs.c (vect_shift_permute_load_chain): New.
	Introduces alternative way of loads group permutaions.
	(vect_transform_grouped_load): Try alternative way of permutations.

gcc/testsuite/
	PR tree-optimization/52252
	* gcc.target/i386/pr52252-atom.c: Test on loads group of size 3.
	* gcc.target/i386/pr52252-core.c: Ditto.

	PR tree-optimization/61403
	* gcc.target/i386/pr61403.c: Test on loads and stores group of size 3.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr52252-atom.c
    trunk/gcc/testsuite/gcc.target/i386/pr52252-core.c
    trunk/gcc/testsuite/gcc.target/i386/pr61403.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.h
    trunk/gcc/config/i386/x86-tune.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-data-refs.c
Comment 2 Jakub Jelinek 2014-10-03 18:16:41 UTC
Author: jakub
Date: Fri Oct  3 18:16:09 2014
New Revision: 215866

URL: https://gcc.gnu.org/viewcvs?rev=215866&root=gcc&view=rev
Log:
	PR tree-optimization/61403
	* config/i386/i386.c (expand_vec_perm_palignr): Fix a spelling
	error in comment.  Also optimize 256-bit vectors for AVX2
	or AVX (floating vectors only), provided the first permutation
	can be performed in one insn.

	* gcc.dg/torture/vshuf-32.inc: Add a new test 29.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/torture/vshuf-32.inc
Comment 3 Jakub Jelinek 2015-01-22 14:13:34 UTC
So is this fixed now?
Comment 4 Stupachenko Evgeny 2015-01-22 15:04:55 UTC
(In reply to Jakub Jelinek from comment #3)
> So is this fixed now?

Yes. It is fixed.