This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Question about vectorization limit
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Toon Moene <toon at moene dot org>
- Cc: Richard Biener <richard dot guenther at gmail dot com>, Dehao Chen <dehao at google dot com>, GCC Development <gcc at gcc dot gnu dot org>
- Date: Fri, 31 May 2013 15:54:14 +0200
- Subject: Re: Question about vectorization limit
- References: <CAO2gOZX7_-08m_+AEybF0RwG=8Y_qPG_+wjmgsq6ymVWTr3=Vw at mail dot gmail dot com> <CAFiYyc3ehiZeyrXUb+wgj_tBi7WqmeHNQdFK9vDinnMWYHYswA at mail dot gmail dot com> <51A8A3EF dot 3010508 at moene dot org> <20130531134131 dot GT1493 at tucnak dot redhat dot com> <51A8AA4B dot 6080001 at moene dot org>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Fri, May 31, 2013 at 03:48:59PM +0200, Toon Moene wrote:
> >If you rewrite the above into:
> >SUBROUTINE XYZ(A, B, N)
> >DIMENSION A(N), B(N)
> >DO I = 1, N
> > C = B(I)
> > IF (A(I)> 0.0) THEN
> > A(I) = C / A(I)
> > ELSE
> > A(I) = C
> > ENDIF
> >ENDDO
> >END
> >
> >then it is vectorized just fine.
>
> But this "inner loop" has at least 3 basic blocks - so what does the
> "loop->num_nodes != 2" test exactly codify ?
With the above testcase it has just 2.
Before ifcvt pass it still has 4:
<bb 4>:
# i_1 = PHI <1(3), i_18(7)>
_8 = (integer(kind=8)) i_1;
_9 = _8 + -1;
c_11 = *b_10(D)[_9];
_13 = *a_12(D)[_9];
if (_13 > 0.0)
goto <bb 5>;
else
goto <bb 6>;
<bb 5>:
_14 = c_11 / _13;
<bb 6>:
# cstore_17 = PHI <_14(5), c_11(4)>
*a_12(D)[_9] = cstore_17;
i_18 = i_1 + 1;
if (i_1 == _7)
goto <bb 8>;
else
goto <bb 7>;
<bb 7>:
goto <bb 4>;
but ifcvt transforms that into:
<bb 4>:
# i_1 = PHI <1(3), i_18(5)>
_8 = (integer(kind=8)) i_1;
_9 = _8 + -1;
c_11 = *b_10(D)[_9];
_13 = *a_12(D)[_9];
_14 = c_11 / _13;
cstore_17 = _13 > 0.0 ? _14 : c_11;
*a_12(D)[_9] = cstore_17;
i_18 = i_1 + 1;
if (i_1 == _7)
goto <bb 6>;
else
goto <bb 5>;
<bb 5>:
goto <bb 4>;
which is already generally vectorizable. Guess ifcvt can be certainly
taught if it finds a possibly trapping statement to check if the same
statement isn't present in all possible branches, though the question is if
this is best done in ifcvt or some other pass.
Jakub