[RFC] Combine vectorized loops with its scalar remainder.
Tue Feb 9 16:10:00 GMT 2016
2015-12-15 19:41 GMT+03:00 Yuri Rumyantsev <firstname.lastname@example.org>:
> Hi Richard,
> I re-designed the patch to determine ability of loop masking on fly of
> vectorization analysis and invoke it after loop transformation.
> Test-case is also provided.
> what is your opinion?
I'm going to start work on extending this patch to handle mixed mask sizes,
support vectorization of peeled loop tail and fix profitability
estimation to choose
proper loop tail processing. Here is shortly a planned changes list:
1. Don't put any restriction on mask type when check if statement can be masked.
Instead just store all required masks in LOOP_VINFO_REQUIRED_MASKS. After
all statements are checked we additionally check all required masks
can be produced
(we have proper comparison, widening and narrowing support).
2. In vect_estimate_min_profitable_iters compute overhead for masks creation,
decide what we should do with a loop tail (nothing, vectorize, combine
with loop body),
additionally return a number of tail iterations required for chosen
3. In vect_transform_loop depending on chosen strategy either mask whole loop or
produce vectorized tail. For now it's not fully clear to me what is
the best way to get
The first option is to just peel one iteration after loop is
vectorized. But in our masking
functions we use LOOP_VINFO and STMT_VINFO structures we loose during peeling.
Another option is to peel scalar loop and then just run vectorizer one more time
to vectorize and mask it.
Also we may peel vectorized loop and use original version (with all
available) as a tail and peeled version as a main loop.
Currently I think the best option is to peel scalar loop and run
vectorizer one more time
for it. This option is simpler and can also be used to vectorize loop
tail with a smaller vector
size when target doesn't support masking or masking is not profitable.
More information about the Gcc-patches