[PATCH] RISC-V: Fix RVV mask mode size

Sat Dec 17 01:44:30 GMT 2022

Yes, VNx4DF only has 4 bit in mask mode in case of load and store.
For example vlm or vsm we will load store 8-bit ??? (I am not sure hardward can load store 4bit,but I am sure it definetly not load store the whole register size)
So ideally it should be model more accurate. However, since GCC assumes that 1 BOOL is 1-byte, the only thing I do is to model mask mode as smallest as possible.
Maybe in the future, I can support 1BOOL for 1-bit?? I am not sure since it will need to change GCC framework.

juzhe.zhong@rivai.ai

From: Jeff Law
Date: 2022-12-17 04:22
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Fix RVV mask mode size

On 12/13/22 23:48, juzhe.zhong@rivai.ai wrote:
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> This patch is to fix RVV mask modes size. Since mask mode size are adjust
> as a whole RVV register size LMUL = 1 which not only make each mask type for
> example vbool32_t tied to vint8m1_t but also increase memory consuming.
> 
> I notice this issue during development of VSETVL PASS. Since it is not part of
> VSETVL support, I seperate it into a single fix patch now.
> 
> gcc/ChangeLog:
> 
>          * config/riscv/riscv-modes.def (ADJUST_BYTESIZE): Reduce RVV mask mode size.
>          * config/riscv/riscv.cc (riscv_v_adjust_bytesize): New function.
>          (riscv_modes_tieable_p): Don't tie mask modes which will create issue.
>          * config/riscv/riscv.h (riscv_v_adjust_bytesize): New function.
So I haven't really studied the masking model for RVV (yet).  But 
there's two models that I'm generally aware of.

One model has a bit per element in the vector we're operating on.  So a 
V4DF will have 4 bits in the mask.  I generally call this the dense or 
packed model.

The other model has a bit for every element for the maximal number of 
elements that can ever appear in a vector.  So if we support an element 
length of 8bits and a 1kbit vector, then the sparse model would have 128 
bits regardless of the size of the object being operated on.  So we'd 
still have 128 bits for V4DF, but the vast majority would be don't cares.

ISTM that you're trying to set the mode size to the smallest possible 
which would seem to argue that you want the dense/packed mask model. 
Does that actually match what the hardware does?  If not, then don't we 
need to convert back and forth?

Or maybe I'm missing something here?!?

Jeff