This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC PATCH] Enable V32HI/V64QI const permutations


On 06 Oct 09:08, Jakub Jelinek wrote:
> On Fri, Oct 03, 2014 at 04:39:08PM +0200, Jakub Jelinek wrote:
> > Just to stress the new testcases some more, I've enabled the
> > vec_perm_const{32hi,64qi} patterns.
> > Got several ICEs in expand_vec_perm_broadcast_1,
> > on the final gcc_unreachable () in the function.  That function
> > is only called if it couldn't be broadcasted in a single insn,
> > which I believe for TARGET_AVX512BW must be always possible.
> > Shall I look at this, or do you plan to address this in the near future?
> 
> Speaking of -mavx512{bw,vl,f}, there apparently is a full 2 operand shuffle
> for V32HI, V16S[IF], V8D[IF], so the only one instruction full
> 2 operand shuffle we are missing is V64QI, right?
> 
> What would be best worst case sequence for that?
> 
> I'd think 2x vpermi2w, 2x vpshufb and one vpor could achieve that,
> (first vpermi2w would put the even bytes into the right word positions
> (i.e. at the right position or one above it), second vpermi2w would put
> the odd bytes into the right word positions (i.e. at the right position
> or one below it),
I think we will also need to spend insns converting byte-sized mask into
word-sized mask.
> each vpshufb would swap the byte pairs where necessary
> and zero out the other (odd or even) byte,
This will probably also require vpshufb mask preparation (setting high
bit for zeroing)
> and vpor merge the results), do you have something better?
Currently (in branch) it's implemented as  2x vpermi2w + 4x shift +
blend. 3 shifts to prepare masks for vpermi2w,
2 vpermi2w to put odd/even bytes in low part of right position,
shift to move low part into high part and finally blend with
101010.. mask to get a result.
> What about arbitrary one operand V64QI const permutation?
>
Currently it loads const-vector into register and uses the same codepath
as non-const version (this probably can be improved).


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]