The testcase of PR36181 should be parallelized after being vectorized. /* { dg-do compile } */ /* { dg-options "-O3 -ftree-parallelize-loops=2" } */ int foo () { int i, sum = 0, data[1024]; for(i = 0; i<1024; i++) sum += data[i]; return sum; } The fix for PR36181 was to disable the parallelization of a loop when one of the phi nodes had a vector type. This testcase should also be parallelized. See also the comments from the fix for PR36181: http://gcc.gnu.org/ml/gcc-patches/2008-05/msg01217.html
Even worse: #define vector __attribute__((vector_size(16) )) vector int foo () { vector int i, sum = 0, data[1024]; for(i = 0; i<1024; i++) sum += data[i]; return sum; } With -O2 -ftree-parallelize-loops=2, this does not get parallelized at all even though we did not run the vectorizer.
Confirmed.
> ... this does not get parallelized at all ... Also see 34501 Perhaps we could make some use of Pluto. It is a fully automatic (C to OpenMP C) parallelizer that makes code amenable to auto-vectorization. http://pluto-compiler.sourceforge.net/ Also see these Parallelizers: http://cri.ensmp.fr/pips/ or http://pips4u.org/ There was something I found a few days ago from here that I can no longer locate http://en.wikipedia.org/wiki/Automatic_parallelization It would be great to take that inner loop (if it were much larger) and 'Kernelize' it for co-processing on our Graphics Card. We could expand GCCs 'x-parallelize-x' and threading options to automatically find the sweeter spots to offload for co=processing (on a GPU, using OpenCL). Barra - NVIDIA G80 GPU Functional Simulator http://gpgpu.univ-perp.fr/index.php/Barra If we were 'allowed' to call a post-processor (like LTO used to do) we could call ATI's GPU SDK which supports OpenCL and outputs code BOTH to x86 and it's own GPUs. Commercial Projects: Auto-parallelizer and SIMDinator by Dalsoft http://www.dalsoft.com/documentation_simdinator.html NVidia's PTX http://en.wikipedia.org/wiki/Parallel_Thread_Execution Cray's work with LLVM http://llvm.org/devmtg/2009-10/Greene_180k_Cores.pdf Larrabee http://www.drdobbs.com/architecture-and-design/216402188?pgno=5 Rob
(In reply to Sebastian Pop from comment #0) > The testcase of PR36181 should be parallelized after being vectorized. > > /* { dg-do compile } */ > /* { dg-options "-O3 -ftree-parallelize-loops=2" } */ > > int foo () > { > int i, sum = 0, data[1024]; > > for(i = 0; i<1024; i++) > sum += data[i]; > > return sum; > } > > The fix for PR36181 was to disable the parallelization of a loop when > one of the phi nodes had a vector type. This testcase should also be > parallelized. See also the comments from the fix for PR36181: > http://gcc.gnu.org/ml/gcc-patches/2008-05/msg01217.html Are you still working on this?
(In reply to Eric Gallager from comment #4) > (In reply to Sebastian Pop from comment #0) > > The testcase of PR36181 should be parallelized after being vectorized. > > > > /* { dg-do compile } */ > > /* { dg-options "-O3 -ftree-parallelize-loops=2" } */ > > > > int foo () > > { > > int i, sum = 0, data[1024]; > > > > for(i = 0; i<1024; i++) > > sum += data[i]; > > > > return sum; > > } > > > > The fix for PR36181 was to disable the parallelization of a loop when > > one of the phi nodes had a vector type. This testcase should also be > > parallelized. See also the comments from the fix for PR36181: > > http://gcc.gnu.org/ml/gcc-patches/2008-05/msg01217.html > > Are you still working on this? Guess not.