This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/79460] gcc fails to optimise out a trivial additive loop for seemingly arbitrary numbers of iterations
- From: "jakub at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 13 Feb 2017 11:15:46 +0000
- Subject: [Bug tree-optimization/79460] gcc fails to optimise out a trivial additive loop for seemingly arbitrary numbers of iterations
- Auto-submitted: auto-generated
- References: <bug-79460-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79460
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
> In this case it is complete unrolling that can estimate the non-vector code
> to constant fold but not the vectorized code. OTOH it's quite excessive
> work done by the unroller when doing this for large N...
>
> And yes, SCEV final value replacement doesn't know how to handle float
> reductions
> (we have a different PR for that).
Doesn't handle float reductions nor vector (integer or vector) reductions.
Even the vector ones would be useful, if e.g. to a vector every iteration adds
a VECTOR_CST or similar, then it could be still nicely optimized.
For the 202 case, it seems we are generating a scalar loop epilogue (not needed
for 200) and somehow it seems something in the vector is actually able to
figure out the floating point final value, because we get:
# p_2 = PHI <2.01e+2(5), p_12(7)>
# i_3 = PHI <200(5), i_13(7)>
on the scalar loop epilogue. So if something in the vectorizer is able to
figure it out, why can't it just use that even in the case where no epilogue
loop is needed?