[Bug c/63351] New: Optimization: contract broadcast intrinsics when AVX512 is enabled
agner at agner dot org
gcc-bugzilla@gcc.gnu.org
Wed Sep 24 05:39:00 GMT 2014
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63351
Bug ID: 63351
Summary: Optimization: contract broadcast intrinsics when
AVX512 is enabled
Product: gcc
Version: 4.9.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: agner at agner dot org
The AVX512 instruction set allows instructions with broadcast, but there are no
corresponding intrinsic functions. The programmer has to write a broadcast
intrinsic followed by some other intrinsic and rely on the compiler to contract
this into a single instruction.
I would expect the optimizer to contract a broadcast intrinsic with any
subsequent intrinsic into a single instruction. For example:
// gcc -Ofast -mavx512f
#include "x86intrin.h"
void dummyz(__m512i a, __m512i b);
void broadcastz(__m512i a, int b) {
// expect reduction to instruction with broadcast,
// something like: vpaddd b, %zmm0, %zmm3 {1to16}
__m512i bb = _mm512_set1_epi32(b);
__m512i ab = _mm512_add_epi32(a,bb);
__m512i cc = _mm512_set1_epi32(5);
__m512i ac = _mm512_add_epi32(a,cc);
dummyz(ab, ac);
}
This should actually be possible for smaller vector sizes as well when AVX512
is enabled:
void dummyx(__m128 a, __m128 b);
void broadcastx(__m128 a, float b) {
// broadcasting should even be possible with smaller vectors
__m128 bb = _mm_set1_ps(b);
__m128 ab = _mm_add_ps(a,bb);
__m128 cc = _mm_set1_ps(5.0);
__m128 ac = _mm_add_ps(a,cc);
dummyx(ab, ac);
}
More information about the Gcc-bugs
mailing list