This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
AVX512 woes
- From: "Jan Beulich" <JBeulich at suse dot com>
- To: "Kirill Yukhin" <kirill dot yukhin at gmail dot com>
- Cc: <gcc at gcc dot gnu dot org>
- Date: Thu, 20 Sep 2018 01:44:33 -0600
- Subject: AVX512 woes
Kirill, others,
in the course of putting together test harness extensions for AVX512
additions to the Xen hypervisor's built-in instruction emulator I've
come across a number of issues. Since it may easily be that I'm
simply not knowing the full background, rather than adding bugzilla
entries for all of them I thought I'd inquire first:
1) An initial idea of mine was to use -ffixed-* to force the use of the
high 16 {x,y,z}mm registers (effectively by disallowing the use of
the lower ones), such that I'd easily get EVEX encoded insns for
whatever is possible to be EVEX-encoded with the given -mavx512*
option(s). This doesn't even come close to working - all sorts of
internal compiler errors result for other than the most trivial
examples, most notably with AVX512VL support enabled. I can't
observe similar bad effects from using -ffixed-* for other register
sub-groups. I realize the interactions between the various insns
the *.md files provide may be difficult to sort out, and perhaps
the root cause is the same as that of bug 87354, but is this really
something that's not supposed to work?
2) There looks to be quite wide a mixup of Yk and k constraints on
insns. Most instructions having mask register outputs can very
well use %k0, yet they're commonly using "=Yk". Exceptions are
scatter/gather insns only, afaict. And of course insns using
destination field masking have to use "Yk" inputs. Is there
anything I'm overlooking here that prevents "=k" to be used as
outlined?
2b) Both k and Yk are marked @internal in constraints.md, suggesting
(to me) that I'm not supposed to use these constraints in inline
asm() constructs. If that implication of mine is correct, how would
I express respective constraints?
3) Certain AVX512_VBMI2, AVX512_BITALG, and GFNI+AVX512F inline
functions are unavailable without AVX512BW also enabled (other than
implied by SDM, XED, and binutils/gas, and other than for AVX512_VBMI).
I can see why, without the SDM suggesting so, VBMI implies BW, but
if this is done, other ISA extensions imo should also enable BW if need
be, rather than hiding part of their inline/builtin helpers. Or the
opposite position should be taken and no such implications should be
made at all - aiui they're there solely for mask register size
considerations, yet the respective insns could be used without
masking, in which case no direct dependency on BW exists.
4) Even in very obvious situations there does not appear to be any
use of embedded broadcasting. Is this something that's planned,
or something I can only possibly make use of using inline assembly?
5) The VPTERNLOG* instructions look to be heavily underutilized. Not
only do I observe strange VPTERNLOG*/VMODQA* (and alike) pairs,
where the latter uses zeroing-masking just to produce a mix of
all-zeroes and all-ones vector elements, when this same effect
could have been achieved by using zeroing-masking on the
VPTERNLOG* right away. Afaict the instructions can even be used
for any up to 3-way logical (bit-wise boolean) operation for which
no specific insn exists (with a suitably calculated immediate), yet
even a simple ~ gets carried out by VPXOR-ing with a vector of all
ones.
Thanks, Jan