This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

_mm{,256}_i{32,64}gather_{ps,pd,epi32,epi64} intrinsics semantics


Hi!

As the vgather* insns are designed to support both
unconditional and conditional gather loads, the current
pattern consume the previous content of the destination
register, so we end up with code like:
        vmovaps .LC0(%rip), %ymm0
        vmovdqa .LC1(%rip), %ymm5
        vmovdqa .LC2(%rip), %ymm4
        .p2align 4,,10
        .p2align 3
.L6:
        vmovdqa k(%rax,%rax), %ymm1
        vmovaps %ymm0, %ymm6
        vmovaps %ymm0, %ymm2
        vmovdqa k+32(%rax,%rax), %ymm3
        vgatherdps      %ymm6, vf1(,%ymm1,4), %ymm2
        vmovaps %ymm0, %ymm1
        vmovaps %ymm0, %ymm6
        vcvttps2dq      %ymm2, %ymm2
        vpshufb %ymm5, %ymm2, %ymm2
        vgatherdps      %ymm6, vf1(,%ymm3,4), %ymm1
...
note: each vgather* preceeded usually by two movaps, one
copying usually before the loop computed/loaded mask of
all ones and the other initializes the destination register.
But with mask of all ones the whole destination register is
overwritten unless there is a segfault, so IMNSHO at least for
autovectorization it would be nice to just leave the content
of the destination register undefined in case of a segfault.
The only way users can see a difference is if a segfault happens
and in a segfault handler they inspect the destination register
or transfer control to the next insn from the segfault handler.

My question is about the avx2intrin.h intrinsics, in the AVX2
manual the insns are well documented, but there are no details
about the intrinsics.  There are 2 kind of intrinsics for
gather, one without mask/src operands, one with them.

So, my question is, for the intrinsics without mask/src
operands, is it supposed to be well defined what dest register will
contain after a segfault?  Currently we load zeros into src,
but would it be a valid optimization to just leave that register
undefined in case of segfault?  And, what about the other intrinsics
if mask is known to be all ones?  Can the compiler optimize this
and assume the destination is just overwritten rather than
being in/out operand?

What could be done is during expansion check if mask has all high
bits set and if so, just use different insn patterns that wouldn't
consume the register with "0" constraint.  Or have second set
of compiler builtins that wouldn't have src/mask arguments.

On large testcases (like Toon's weather forecast routine which has
over 260 vgather* insns) this would allow us to get rid of one
extra insn per vgather* insn.

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]