This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Openacc] Adjust automatic loop partitioning


Hi Nathan!

On Fri, 29 Apr 2016 10:00:43 -0400, Nathan Sidwell <nathan@acm.org> wrote:
> currently automatic loop partitioning assigns from the innermost loop outwards 
> -- that was the simplest thing to implement.  A better algorithm is to assign 
> the outermost loop to the outermost available axis, and then assign from the 
> innermost loop outwards.   That way we (generally) get gang partitioning on the 
> outermost loop.  Just inside that we'll get non-partitioned loops if the nest is 
> too deep, and the two innermost nested loops will get worker and vector 
> partitioning.

> 	gcc/
> 	* omp-low.c (struct oacc_loop): Add 'inner' field.
> 	(new_oacc_loop_raw): Initialize it to zero.
> 	(oacc_loop_fixed_partitions): Initialize it.
> 	(oacc_loop_auto_partitions): Partition outermost loop to outermost
> 	available partitioning.

I'm now observing the sporadic failures (that you had mentioned before)
of libgomp.oacc-c-c++-common/atomic_capture-1.c and
libgomp.oacc-fortran/atomic_capture-1.f90.  I suppose the problem is that
constructs such as libgomp.oacc-c-c++-common/atomic_capture-1.c:

      fgot = 1.0;
      fexp = 0.0;
    
    #pragma acc data copy (fgot, fdata[0:N])
      {
    #pragma acc parallel loop
        for (i = 0; i < N; i++)
          {
            float expr = 32.0;
    
    #pragma acc atomic capture
            fdata[i] = fgot = expr - fgot;
          }
      }
    
      for (i = 0; i < N; i++)
        if (i % 2 == 0)
          {
            if (fdata[i] != 31.0)
              abort ();
          }
        else
          {
            if (fdata[i] != 1.0)
              abort ();
          }

... are no longer executed in stable/ascending order, and instead of the
exact "i % 2 == 0" classifier, we should now instead verify what the 31.0
and 1.0 cases each appear with probability 0.5?  Are you looking into
resolving that, or should somebody else have a look?


I'm also seeing the following regression for C and C++,
libgomp.oacc-c-c++-common/loop-auto-1.c with -O2:

    source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: In function 'vector_1._omp_fn.0':
    source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c:104:9: internal compiler error: Segmentation fault
     #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
             ^

    #4  0x0000000000f73d46 in internal_error (gmsgid=gmsgid@entry=0x105be63 "%s")
        at [...]/source-gcc/gcc/diagnostic.c:1270
    #5  0x00000000009fccb0 in crash_signal (signo=<optimized out>)
        at [...]/source-gcc/gcc/toplev.c:333
    #6  <signal handler called>
    #7  0x0000000000beaf2e in same_succ_flush_bb (bb=<optimized out>, bb=<optimized out>)
        at [...]/source-gcc/gcc/hash-table.h:919
    #8  0x0000000000bec499 in same_succ_flush_bbs (bbs=<optimized out>)
        at [...]/source-gcc/gcc/tree-ssa-tail-merge.c:823
    #9  update_worklist () at [...]/source-gcc/gcc/tree-ssa-tail-merge.c:870
    #10 tail_merge_optimize (todo=todo@entry=32)
        at [...]/source-gcc/gcc/tree-ssa-tail-merge.c:1716
    #11 0x0000000000b99057 in (anonymous namespace)::pass_pre::execute (this=<optimized out>, fun=<optimized out>)
        at [...]/source-gcc/gcc/tree-ssa-pre.c:4818
    #12 0x0000000000937e9d in execute_one_pass (pass=pass@entry=0x1530970)
        at [...]/source-gcc/gcc/passes.c:2348
    #13 0x00000000009384b8 in execute_pass_list_1 (pass=0x1530970)
        at [...]/source-gcc/gcc/passes.c:2432
    #14 0x00000000009384ca in execute_pass_list_1 (pass=0x152fa10)
        at [...]/source-gcc/gcc/passes.c:2433
    #15 0x0000000000938515 in execute_pass_list (fn=0x7ffff69a5930, pass=<optimized out>)
        at [...]/source-gcc/gcc/passes.c:2443
    #16 0x00000000005fdded in cgraph_node::expand (this=this@entry=0x7ffff6990170)
        at [...]/source-gcc/gcc/cgraphunit.c:1982
    #17 0x00000000005ff8c4 in expand_all_functions ()
        at [...]/source-gcc/gcc/cgraphunit.c:2118
    #18 symbol_table::compile (this=0x7ffff68d2000) at [...]/source-gcc/gcc/cgraphunit.c:2474
    #19 0x0000000000561db8 in lto_main () at [...]/source-gcc/gcc/lto/lto.c:3328
    #20 0x00000000009fccef in compile_file () at [...]/source-gcc/gcc/toplev.c:463
    #21 0x000000000052e5ba in do_compile () at [...]/source-gcc/gcc/toplev.c:1987
    #22 toplev::main (this=this@entry=0x7fffffffcc80, argc=argc@entry=18, argv=0x150aec0, argv@entry=0x7fffffffcd88)
        at [...]/source-gcc/gcc/toplev.c:2095
    #23 0x0000000000530247 in main (argc=18, argv=0x7fffffffcd88)
        at [...]/source-gcc/gcc/main.c:39

Are you seeing that, too?  I can't remember seeing that on
gomp-4_0-branch, so it may be due to a recent trunk change, independent
of your omp-low change.  Are you going to have a look, or want me to?


GrÃÃe
 Thomas

Attachment: signature.asc
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]