This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/55645] New: skipping unlike branch in vectorized loops using movmsk or equivalent
- From: "vincenzo.innocente at cern dot ch" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 11 Dec 2012 06:42:40 +0000
- Subject: [Bug tree-optimization/55645] New: skipping unlike branch in vectorized loops using movmsk or equivalent
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55645
Bug #: 55645
Summary: skipping unlike branch in vectorized loops using
movmsk or equivalent
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: vincenzo.innocente@cern.ch
I'm wondering if the vectorization engine could accommodate some mechanism to
skip unlike branches using a global test based on movmsk or similar
below a trivial example including a possible SLP implementation that happens to
compile with 4.8 as
c++ -std=c++11 -Ofast -mavx2 -S divergent.cc; less divergent.s
float a[1024];
float b[1024];
float c[1024];
#define likely(x) (__builtin_expect(x, true))
// possible syntax
void compute() {
for (int i=0;i!=1024;++i) {
if likely(a[i]<b[i]) // very often
c[i]=a[i]+b[i];
else // rare
c[i]=a[i]-b[i];
}
}
// hand-made implementation that compile with 4.8 today
#include <x86intrin.h>
typedef float __attribute__( ( vector_size( 32 ) ) ) float32x8_t;
typedef int __attribute__( ( vector_size( 32 ) ) ) int32x8_t;
float32x8_t va[1024];
float32x8_t vb[1024];
float32x8_t vc[1024];
void computeV() {
for (int i=0;i!=1024;++i) {
float32x8_t mask = va[i]<vb[i];
if likely(_mm256_movemask_ps(mask) == 255) {
vc[i]=va[i]+vb[i];
} else {
vc[i]= va[i]<vb[i] ? va[i]+vb[i] : va[i]-vb[i];
}
}
}