[RFC] Masking vectorized loops with bound not aligned to VF.

Kirill Yukhin kirill.yukhin@gmail.com
Mon Sep 14 20:21:00 GMT 2015

I'd like to initiate discussion on vectorization of loops which boundaries are not
aligned to VF. Main target for this optimization right now is x86's AVX-512, which
features per-element embedded masking for all instructions.
The main goal for this mail is to agree on overall design of the feature.

This approach was presented @ GNU Cauldron 2015 by Ilya Enkovich [1].
Here's a sketch of the algorithm:
  1. Add check on basic stmts for masking: possibility to introduce index vector and
     corresponding mask
  2. At the check if statements are vectorizable we additionally check if stmts 
     need and can be masked and compute masking cost. Result is stored in `stmt_vinfo`.
     We are going  to mask only mem. accesses, reductions and modify mask for already 
     masked stmts (mask load, mask store and vect. condition)
  3. Make a decision about masking: take computed costs and est. iterations count
     into consideration
  4. Modify prologue/epilogue generation according decision made at analysis. Three
     options available:
    a. Use scalar remainder
    b. Use masked remainder. Won't be supported in first version
    c. Mask main loop
  5.Support vectorized loop masking: 
    - Create stmts for mask generation
    - Support generation of masked vector code (create generic vector code then
      patch it w/ masks)
      -  Mask loads/stores/vconds/reductions only
In first version (targeted v6) we're not going to support 4.b and loop mask pack/unpack.
No `pack/unpack` means that masking will be supported only for types w/ the same
size as index variable
[1] - https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Vectorization+for+Intel+AVX-512.pdf

What do you think?

Thanks, K

More information about the Gcc-patches mailing list