This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug fortran/83064] DO CONCURRENT inconsistent results


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83064

--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 23 Nov 2017, dominiq at lps dot ens.fr wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83064
> 
> --- Comment #7 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> > I looked at the IL from the Fortran FE and it clearly uses a single memory
> > area for tmp for each outer loop iteration. That is, the memory is allocated
> > by the caller. 
> 
> I confirm that using
> 
>         pik = compute( low(i), high(i) )
>         pi(i) = sum(pik)
> 
> gives the right result.
> 
> Does it means that the 'sum' in 'sum(compute( low(i), high(i) ))' is not part
> of the parallelization?

no idea, I can't do the above, pik is not declared.

> 
> > > Do you understand why the code is not parallelized with
> > > -ftree-parallelize-loops=4?
> 
> > Because the outer loop has four iterations and we statically require
> > at least two per thread for outer loops. 
> 
> Why is it so? and is it documented?

It is documented:

@item parloops-min-per-thread
The minimum number of iterations per thread of an innermost parallelized
loop for which the parallelized variant is prefered over the single 
threaded
one.  The default is 100.  Note that for a parallelized loop nest the
minimum number of iterations of the outermost loop per thread is two.


note autopar isn't very well maintained and certainly the cost modeling
needs some work.

So for the issue in this bug the .original from the fortran FE looks
ok:

      while (1)
        {
          if (ANNOTATE_EXPR <count.9 <= 0, parallel>) goto L.10;
          {
            real(kind=4) val.5;
            integer(kind=8) * D.3618;
            integer(kind=8) * D.3619;
            struct array1_real(kind=4) atmp.6;
            real(kind=4) A.7[4];

            val.5 = 0.0;
            D.3618 = &low[NON_LVALUE_EXPR <i.4> + -1];
            D.3619 = &high[NON_LVALUE_EXPR <i.4> + -1];
                        typedef real(kind=4) [4];
            atmp.6.dtype = 281;
            atmp.6.dim[0].stride = 1;
            atmp.6.dim[0].lbound = 0;
            atmp.6.dim[0].ubound = 3;
            atmp.6.data = (void * restrict) &A.7;
            atmp.6.offset = 0;
            compute (&atmp.6, D.3618, D.3619);

so A.7 is in scope of the concurrent loop body and gimplification
adds a CLOBBER at the end of the scope.

I believe there's no logic in autopar that would use this to
force local allocation of that variable.  It might be also
fragile since we can't really rely on those CLOBBERs persisting(?)

This means a DO CONCURRENT isn't enough to skip the validity
check in autopar, in fact DO CONCURRENT doesn't tell us anything
but maybe skipping any cost modeling during autopar?

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]