This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [og7] vector_length extension part 2: Generalize state propagation and synchronization


On 03/02/2018 05:55 PM, Cesar Philippidis wrote:
The attached patch generalizes the worker state propagation and
synchronization code to handle large vectors. When the vector_length is
larger than a CUDA warp, the nvptx BE will now use shared-memory to
spill-and-fill vector state when transitioning from vector-single mode
to vector partitioned.

I've compiled this test-case:
...
int
main (void)
{
  int a[10];
#pragma acc parallel loop worker
  for (int i = 0; i < 10; i++)
    a[i] = i;

  return 0;
}
...

without and with the patch series, and observed the following difference in generated ptx:
...
-.shared .align 8 .u8 __oacc_bcast[8];
+.shared .align 8 .u8 __oacc_bcast[264];
...

Why is the example using 33 times more shared memory space with the patch series applied?

Thanks,
- Tom


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]