[RFC] Masking vectorized loops with bound not aligned to VF.
Mon Sep 14 20:21:00 GMT 2015
I'd like to initiate discussion on vectorization of loops which boundaries are not
aligned to VF. Main target for this optimization right now is x86's AVX-512, which
features per-element embedded masking for all instructions.
The main goal for this mail is to agree on overall design of the feature.
This approach was presented @ GNU Cauldron 2015 by Ilya Enkovich .
Here's a sketch of the algorithm:
1. Add check on basic stmts for masking: possibility to introduce index vector and
2. At the check if statements are vectorizable we additionally check if stmts
need and can be masked and compute masking cost. Result is stored in `stmt_vinfo`.
We are going to mask only mem. accesses, reductions and modify mask for already
masked stmts (mask load, mask store and vect. condition)
3. Make a decision about masking: take computed costs and est. iterations count
4. Modify prologue/epilogue generation according decision made at analysis. Three
a. Use scalar remainder
b. Use masked remainder. Won't be supported in first version
c. Mask main loop
5.Support vectorized loop masking:
- Create stmts for mask generation
- Support generation of masked vector code (create generic vector code then
patch it w/ masks)
- Mask loads/stores/vconds/reductions only
In first version (targeted v6) we're not going to support 4.b and loop mask pack/unpack.
No `pack/unpack` means that masking will be supported only for types w/ the same
size as index variable
 - https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Vectorization+for+Intel+AVX-512.pdf
What do you think?
More information about the Gcc-patches