This is the mail archive of the
mailing list for the GCC project.
Re: RISC-V vector extension cauldron discussion
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Richard Henderson <rth at twiddle dot net>
- Cc: roger dot espasa at esperanto dot ai, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, Richard Biener <rguenther at suse dot de>, Jim Wilson <jimw at sifive dot com>, Palmer Dabbelt <palmer at sifive dot com>
- Date: Sat, 8 Sep 2018 23:08:16 +0200
- Subject: Re: RISC-V vector extension cauldron discussion
- References: <firstname.lastname@example.org>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Sat, Sep 08, 2018 at 03:04:38AM -0700, Richard Henderson wrote:
> Something similar must be defined for RISC-V. Such an abi must
> consider how vconfig is to be managed across function boundaries
> and with separate compilation. In my opinion this should be done
> before finalizing the ISA, as detailed below.
> (II-a) The callee must know how many registers are enabled by vconfig.
> The simplest solution is simply to require all 32 registers to be enabled.
If masked vs. non-masked vector operations have approximately the same cost,
and due to the properties of the V extension only a single simdlen (variable
one) is meaningful, then it is possible to avoid duplicating
#pragma omp declare simd/__attribute__((simd)) functions without explicit
notinbranch or inbranch attributes, just use the masked ones in the ABI and
pass all true vector in v1.
If we reduce the number of copies that way, perhaps there is a way to offer
next to the scalar copy 2 or 3 vector variants that would differ by the
number of vector registers, say in the ABI document say that each function
should be emitted in 3 vector variants, one with number of registers 32,
another one for 16 and another one for 8 registers (of course, if the
function isn't externally visible, compiler can choose to emit just the ones
that are needed or use any other ABI it chooses) and always just require
that maxel is 8 in the simd ABI, because trying to spill all the arguments,
save current vconfig setting, reconfigure, do the work with multiple loop
iterations, spill all the results, restore previous vconfig setting and fill
in the result value might be too costly in case the implementation needs
MAXEL higher than the caller provided.