This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Fix PR48052: loop not vectorized if index is "unsigned int"
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: Abderrazek Zaafrani <az dot zaafrani at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Sebastian Pop <sebpop at gmail dot com>
- Date: Tue, 19 May 2015 19:02:25 +0800
- Subject: Re: Fix PR48052: loop not vectorized if index is "unsigned int"
- Authentication-results: sourceware.org; auth=none
- References: <CAGrkkCATyT28OgKzXpSbAY=5=NTZKpp1p60wA8BBdYohgjCY-w at mail dot gmail dot com> <CAFiYyc1Z1Xi4xmEkfKm0bTFP7aq1jFiMgROEUtX28P8YT0+MHQ at mail dot gmail dot com>
On Wed, May 6, 2015 at 7:02 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, May 4, 2015 at 9:47 PM, Abderrazek Zaafrani
> <az.zaafrani@gmail.com> wrote:
>> This is an old thread and we are still running into similar issues:
>> Code is not being vectorized on 64-bit target due to scev not being
>> able to optimally analyze overflow condition.
>>
>> While the original test case shown here seems to work now, it does not
>> work if the start value is not a constant and the loop index variable
>> is of unsigned type: Ex
>>
>> void loop2( double const * __restrict__ x_in, double * __restrict__
>> x_out, double const * __restrict__ c, unsigned int N, unsigned int
>> start) {
>> for(unsigned int i=start; i!=N; ++i)
>> x_out[i] = c[i]*x_in[i];
>> }
>>
>> Here is our unit test:
>>
>> int foo(int* A, int* B, unsigned start, unsigned B)
>> {
>> int s;
>> for (unsigned k = start; k <start+B; k++)
>> s += A[k] * B[k];
>> return s;
>> }
>>
>> Our unit test case is extracted from a matrix multiply of a
>> two-dimensional array and all loops are blocked by hand by a factor of
>> B. Even though a bit modified, above loop corresponds to the innermost
>> loop of the blocked matrix multiply.
>>
>> We worked on patch to solve the problem (see attachment.)
>> The attached patch passed bootstrap and make check on x86_64-linux.
>> Ok for trunk?
>
> Apart from coding style / API issues the case you handle is very special
> (IVs with step 1 only?!) I believe it is also wrong - the assumption that
> if there is a symbolic or constant expression for the number of iterations
> a BIV will not wrap is not true. niter analysis can very well compute
> the number of iterations for a loop with wrapping IVs. For your unit test
> this only works because of the special-casing of step 1 IVs.
I happen to look into similar issue right now. scev_probably_wraps_p
and thus chrec_convert_1 should be improved using niter information.
Actually all information (and the wrap behavior) has already been
computed in tree-ssa-loop-niter.c. We just need to find a way to used
it.
>
> Technically it might be more interesting to compute wrapping of IVs
> during niter analysis in some more generic way (we have iv->no_overflow
> computed by simple_iv, but that is rather not useful here).
For it iv->no_overflow is computed in simple_iv as below:
tmp = analyze_scalar_evolution (use_loop, ev);
ev = resolve_mixers (use_loop, tmp);
if (folded_casts && tmp != ev)
*folded_casts = true;
It's inaccurate because calling resolve_mixers doesn't mean the result
scev will wrap. resolve_mixers could have just done exact the same
transformation as instantiate_parameters. Also
chrec_convert_aggressive is incomplete and need to revised too.
Thanks,
bin
>
> Richard.
>
>> Thanks,
>> Abderrazek Zaafrani