This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: dom1 prevents vectorization via partial loop peeling?
- From: Jeff Law <law at redhat dot com>
- To: Alan Lawrence <alan dot lawrence at arm dot com>, Ajit Kumar Agarwal <ajit dot kumar dot agarwal at xilinx dot com>
- Cc: Richard Biener <richard dot guenther at gmail dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Wed, 29 Apr 2015 13:04:16 -0600
- Subject: Re: dom1 prevents vectorization via partial loop peeling?
- Authentication-results: sourceware.org; auth=none
- References: <553E5FEF dot 9070905 at arm dot com> <553E6C8A dot 4070202 at redhat dot com> <CAFiYyc2VTuRtqDuQVvwfrYdkefVtZJdmwNCK038xugfZKvndNQ at mail dot gmail dot com> <5443dde8-a1fa-4645-8ea4-8d3fe3e6d128 at BL2FFO11FD020 dot protection dot gbl> <553F9AD9 dot 3090002 at arm dot com>
On 04/28/2015 08:36 AM, Alan Lawrence wrote:
Ah, yes, I'd not realized this was connected to the jump-threading
issue, but I see that now. As you say, the best heuristics are unclear,
and I'm not keen on trying *too hard* to predict what later phases
will/won't do or do/don't want...maybe if there are simple heuristics
that work, but I would aim more at making later phases work with
what(ever) they might get???
Yea, in various places we do try and "predict" what form will be best
for later passes, but it's rarely
the best way to do things.
One (horrible) possibility that I will just throw out (and then duck),
is to do something akin to tree-if-conversion's
"gimple_build_call_internal (IFN_LOOP_VECTORIZED, " ...
It's not as terrible as you might think. The ability to present two
forms to the vectorizer has a variety of uses.
The one thought we've never explored was re-rolling that first
iteration back into the loop in the vectorizer.
Yeah, there is that ;).
So besides trying to partially-peel the next N iterations, the other
approach - that strikes me as sanest - is to finish (fully-)peeling off
the first iteration, and then to vectorize from then on.
WHich is better may depend on a variety of factors. Complexity of the
loop, the iteration space, etc. etc. I suspect it's not terrible to
unpeel -- there's two parts. One recognizing particular patterns in the
CFG that come from the partial peeling, then detecting that two blocks
are still essentially the same, except for the control flow bits at the
end of the block.
whereas with rerolling ;)...is there perhaps some reasonable way to keep
markers around to make the rerolling approach more feasible???
I suspect (but certainly haven't really investigated) that the CFG will
have tell-tale signs and that we then look at the key blocks.
Jeff