[PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]

钟居哲 juzhe.zhong@rivai.ai
Thu Sep 14 22:28:43 GMT 2023


>> Now, whether that's efficient (and desirable) is a separate issue and
>> should probably be defined by register_move_costs as well as instruction
>> costs.  I wasn't actually aware of this call/argument optimization that
>> uses vec_duplicate and I haven't checked what costing (if at all) it
>> uses.

This is patch is not the performance improve patch. It's a bug fix patch.
I am not optimize the codegen. That's why I put it into move pattern to handle that statically.



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-09-15 05:06
To: Kito Cheng; Juzhe-Zhong
CC: rdapp.gcc; gcc-patches; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V4] RISC-V: Expand VLS mode to scalar mode move[PR111391]
> I am thinking what we are doing is something like we are allowing
> scalar mode within the vector register, so...not sure should we try to
> implement that within the mov pattern?
> 
> I guess we need some inputs from Jeff.
 
Sorry for the late response.  I have also been thinking about this and
it feels a bit like a bandaid to me.  Usually register-class moves like
this are performed by reload (which consults register_move_costs among
other things) and we are working around it.
 
The situation is that we move a vec_duplicate of QImodes into a vector
register.  Then we want to use this as scalar call argument so we need
to transfer it back to a DImode register.
 
One maybe more typical solution would be to allow small VLS vector modes
like V8QI in GPRs (via hard_regno_mode_ok) until reload so we could have
a (set (reg:V8QI a0) (vec_duplicate:V8QI ...)).
 
The next step would be to have a mov<mode> expander with target "r"
constraint (and source "vr") that performs the actual move.  This is
where Juzhe's mov code could fit in (without the subreg handling).
If I'm not mistaken vmv.x.s without slidedown should be sufficient for
our case as we'd only want to use the whole thing when the full vector
fits into a GPR. 
 
All that's missing is a (reinterpreting) vtype change to Pmode-sized
elements before. I quickly hacked something together (without the proper
mode change) and the resulting code looks like:
 
vsetvli zero, 8, e8, ...
vmv.v.x v1,a5
        # missing vsetivli zero, 1, e64, ... or something 
vmv.x.s a0,v1
 
Now, whether that's efficient (and desirable) is a separate issue and
should probably be defined by register_move_costs as well as instruction
costs.  I wasn't actually aware of this call/argument optimization that
uses vec_duplicate and I haven't checked what costing (if at all) it
uses.
 
Regards
Robin
 


More information about the Gcc-patches mailing list