[PATCH GCC 6/9]Simplify control flow graph for vectorized loop
Jeff Law
law@redhat.com
Wed Sep 14 16:52:00 GMT 2016
On 09/14/2016 07:21 AM, Richard Biener wrote:
> On Tue, Sep 6, 2016 at 8:52 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> This is the main patch improving control flow graph for vectorized loop. It generally rewrites loop peeling stuff in vectorizer. As described in patch, for a typical loop to be vectorized like:
>>
>> preheader:
>> LOOP:
>> header_bb:
>> loop_body
>> if (exit_loop_cond) goto exit_bb
>> else goto header_bb
>> exit_bb:
>>
>> This patch peels prolog and epilog from the loop, adds guards skipping PROLOG and EPILOG for various conditions. As a result, the changed CFG would look like:
>>
>> guard_bb_1:
>> if (prefer_scalar_loop) goto merge_bb_1
>> else goto guard_bb_2
>>
>> guard_bb_2:
>> if (skip_prolog) goto merge_bb_2
>> else goto prolog_preheader
>>
>> prolog_preheader:
>> PROLOG:
>> prolog_header_bb:
>> prolog_body
>> if (exit_prolog_cond) goto prolog_exit_bb
>> else goto prolog_header_bb
>> prolog_exit_bb:
>>
>> merge_bb_2:
>>
>> vector_preheader:
>> VECTOR LOOP:
>> vector_header_bb:
>> vector_body
>> if (exit_vector_cond) goto vector_exit_bb
>> else goto vector_header_bb
>> vector_exit_bb:
>>
>> guard_bb_3:
>> if (skip_epilog) goto merge_bb_3
>> else goto epilog_preheader
>>
>> merge_bb_1:
>>
>> epilog_preheader:
>> EPILOG:
>> epilog_header_bb:
>> epilog_body
>> if (exit_epilog_cond) goto merge_bb_3
>> else goto epilog_header_bb
>>
>> merge_bb_3:
>>
>>
>> Note this patch peels prolog and epilog only if it's necessary, as well as adds different guard_conditions/branches. Also the first guard/branch could be further improved by merging it with loop versioning.
>>
>> Before this patch, up to 4 branch instructions need to be executed before the vectorized loop is reached in the worst case, while the number is reduced to 2 with this patch. The patch also does better in compile time analysis to avoid unnecessary peeling/branching.
>> From implementation's point of view, vectorizer needs to update induction variables and iteration bounds along with control flow changes. Unfortunately, it also becomes much harder to follow because slpeel_* functions updates SSA by itself, rather than using update_ssa interface. This patch tries to factor out SSA/IV/Niter_bound changes from CFG changes. This should make the implementation easier to read, and I think it maybe a step forward to replace slpeel_* functions with generic GIMPLE loop copy interfaces as Richard suggested.
>
> I've skimmed over the patch and it looks reasonable to me.
THanks. I was maybe 15% of the way through the main patch. Nothing
that gave me cause for concern, but I wasn't ready to ACK it myself yet.
jeff
More information about the Gcc-patches
mailing list