Bug 36281 - vector code is not parallelized
Summary: vector code is not parallelized
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.4.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2008-05-20 19:25 UTC by Sebastian Pop
Modified: 2021-11-29 00:06 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2008-12-28 03:23:49


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Pop 2008-05-20 19:25:52 UTC
The testcase of PR36181 should be parallelized after being vectorized.

/* { dg-do compile } */
/* { dg-options "-O3 -ftree-parallelize-loops=2" } */

int foo ()
{
  int i, sum = 0, data[1024];

  for(i = 0; i<1024; i++)
    sum += data[i];

  return sum;
}

The fix for PR36181 was to disable the parallelization of a loop when
one of the phi nodes had a vector type.  This testcase should also be
parallelized.  See also the comments from the fix for PR36181:
http://gcc.gnu.org/ml/gcc-patches/2008-05/msg01217.html
Comment 1 Andrew Pinski 2008-05-20 19:42:00 UTC
Even worse:
#define vector __attribute__((vector_size(16) ))
vector int foo ()
{
  vector int i, sum = 0, data[1024];

  for(i = 0; i<1024; i++)
    sum += data[i];

  return sum;
}

With -O2 -ftree-parallelize-loops=2, this does not get parallelized at all even though we did not run the vectorizer.
Comment 2 Andrew Pinski 2008-12-28 03:23:49 UTC
Confirmed.
Comment 3 Rob 2010-07-19 08:25:01 UTC
> ... this does not get parallelized at all ...
Also see 34501

Perhaps we could make some use of Pluto. It is a fully automatic (C to OpenMP C) parallelizer that makes code amenable to auto-vectorization.

http://pluto-compiler.sourceforge.net/


Also see these Parallelizers:
http://cri.ensmp.fr/pips/ or http://pips4u.org/
There was something I found a few days ago from here that I can no longer locate
http://en.wikipedia.org/wiki/Automatic_parallelization

It would be great to take that inner loop (if it were much larger) and 'Kernelize' it for co-processing on our Graphics Card. We could expand GCCs 'x-parallelize-x' and threading options to automatically find the sweeter spots to offload for co=processing (on a GPU, using OpenCL).

Barra - NVIDIA G80 GPU Functional Simulator
http://gpgpu.univ-perp.fr/index.php/Barra

If we were 'allowed' to call a post-processor (like LTO used to do) we could call ATI's GPU SDK which supports OpenCL and outputs code BOTH to x86 and it's own GPUs. 


Commercial Projects:
Auto-parallelizer and SIMDinator by Dalsoft http://www.dalsoft.com/documentation_simdinator.html

NVidia's PTX
http://en.wikipedia.org/wiki/Parallel_Thread_Execution

Cray's work with LLVM
http://llvm.org/devmtg/2009-10/Greene_180k_Cores.pdf

Larrabee
http://www.drdobbs.com/architecture-and-design/216402188?pgno=5


Rob
Comment 4 Eric Gallager 2018-06-29 03:50:26 UTC
(In reply to Sebastian Pop from comment #0)
> The testcase of PR36181 should be parallelized after being vectorized.
> 
> /* { dg-do compile } */
> /* { dg-options "-O3 -ftree-parallelize-loops=2" } */
> 
> int foo ()
> {
>   int i, sum = 0, data[1024];
> 
>   for(i = 0; i<1024; i++)
>     sum += data[i];
> 
>   return sum;
> }
> 
> The fix for PR36181 was to disable the parallelization of a loop when
> one of the phi nodes had a vector type.  This testcase should also be
> parallelized.  See also the comments from the fix for PR36181:
> http://gcc.gnu.org/ml/gcc-patches/2008-05/msg01217.html

Are you still working on this?
Comment 5 Eric Gallager 2018-09-30 01:54:30 UTC
(In reply to Eric Gallager from comment #4)
> (In reply to Sebastian Pop from comment #0)
> > The testcase of PR36181 should be parallelized after being vectorized.
> > 
> > /* { dg-do compile } */
> > /* { dg-options "-O3 -ftree-parallelize-loops=2" } */
> > 
> > int foo ()
> > {
> >   int i, sum = 0, data[1024];
> > 
> >   for(i = 0; i<1024; i++)
> >     sum += data[i];
> > 
> >   return sum;
> > }
> > 
> > The fix for PR36181 was to disable the parallelization of a loop when
> > one of the phi nodes had a vector type.  This testcase should also be
> > parallelized.  See also the comments from the fix for PR36181:
> > http://gcc.gnu.org/ml/gcc-patches/2008-05/msg01217.html
> 
> Are you still working on this?

Guess not.