This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Offloading Support in libgomp

From: Jakub Jelinek <jakub at redhat dot com>
To: "Michael V. Zolotukhin" <michael dot v dot zolotukhin at gmail dot com>
Cc: Kirill Yukhin <kirill dot yukhin at gmail dot com>, Richard Henderson <rth at redhat dot com>, gcc at gcc dot gnu dot org, triegel at redhat dot com
Date: Mon, 26 Aug 2013 14:51:16 +0200
Subject: Re: [RFC] Offloading Support in libgomp
Authentication-results: sourceware.org; auth=none
References: <20130822140810 dot GA27868 at msticlxl57 dot ims dot intel dot com> <20130822142814 dot GB1814 at tucnak dot redhat dot com> <20130823092810 dot GA36483 at msticlxl57 dot ims dot intel dot com> <20130823095250 dot GJ1814 at tucnak dot redhat dot com> <20130823153052 dot GA2974 at msticlxl57 dot ims dot intel dot com> <20130823161631 dot GO1814 at tucnak dot redhat dot com> <20130826115911 dot GA40923 at msticlxl57 dot ims dot intel dot com>
Reply-to: Jakub Jelinek <jakub at redhat dot com>

On Mon, Aug 26, 2013 at 03:59:11PM +0400, Michael V. Zolotukhin wrote:
> As I currently see it, the given code would be expanded to something like
> this:
> 
>   // Create two versions of V: for host and for target
>   int v;
>   int v_target __attribute(target);
> 
>   // The same for TGT function
>   int tgt ()
>   {
>     .. update v ..
>   }
>   int tgt_target () __attribute(target)
>   {
>     .. update v_target ..
>   }

Actually, not two versions of those during the compilation, you have
just one v and one tgt, both have __attribute__(("omp declare target"))
on them (note, you can't specify that attribute manually).
And just when streaming into .gnu.target_lto_* sections you only stream
everything that has those attributes and types used by it, but nothing else.
> 
>   float
>   bar (int x, int y, int z)
>   {
>     float b[1024], c[1024], s = 0;
>     int i, j;
>     baz (b, c, x);
>     // #pragma omp target data map(to: b)
>     vec<data_descriptor> data_desc;
>     data_desc.push ({&b, 1024*sizeof(float), TO});
>     GOMP_target_data (&data_desc);

Nope.  It would be:
  struct data_descriptor data_desc1[1] = { { &b, 1024*sizeof(float), TO } };
  GOMP_target_data (-1, data_desc1, 1);
or so.  The compiler always knows how many vector elements it needs, there
is no point in making the vector dynamic, and vec<> is a compiler data
structure, while you want to emit runtime code.  The -1 in there stands
for missing device(device-id) clause, otherwise it would be the provided
device-id expression.  For the if clause, the question is if we want to pass
it down to the runtime library too (as bool, defaulting to true if missing),
or do something else.

>     {
>       // #pragma omp target map(tofrom: c) map(from:s)
>       data_desc.push ({&c, 1024*sizeof(float), TOFROM});
>       data_desc.push ({&s, sizeof(float), FROM});
>       GOMP_target_data (&data_desc); // Add mapping for S and C variables,
> 				     // mapping for B shouldn't change

Nope, there is only one target data pragma, so you would use here just:

>       GOMP_target (foo1, "foo1", &data_desc); // Call either FOO1 or offloaded
> 					      // FOO1_TARGET with arguments
> 					      // from vector DATA_DESC

  struct data_descriptor data_desc2[2] = { ... };
  GOMP_target (-1, bar.omp_fn.1, "bar.omp_fn.1", data_desc2, 2);

> 
>       // #pragma omp target update from(b, v)
>       vec<data_descriptor> data_desc_update; // target update pragma require a
> 					     // separate vector
>       data_desc_update.push ({&b, 1024*sizeof(float), FROM});
>       data_desc_update.push ({&v, sizeof(int), FROM});
>       GOMP_target_data (&data_desc_update);

Similarly here.

>     }
>     return s;
>   }
>   void
>   foo1 (vec<data_descriptor> data_desc)
>   {
>     float b = *data_desc[0].host_address;
>     float c = *data_desc[1].host_address;
>     float s = 0;
>     int i;
>     for (i = 0; i < 1024; i++)
>       tgt (), s += b[i] * c[i];
>     *data_desc[2].host_address = s;

No, I didn't mean you'd do this.  omp-lower.c would simply create
a type here that would have the same layout as what would the runtime
library pass to it.
So it would be:

void
bar.omp_fn.1 (struct omp_target_data *.omp_data_in)
{
  int i;
  *.omp_data_in->s = 0;
  for (i = 0; i < 1024; i++)
    tgt (), *.omp_data_in->s += .omp_data_in->b[i] * .omp_data_in->c[i];
}

Just look what omplower pass does for normal OpenMP code, say
#pragma omp parallel, task etc.

	Jakub

Follow-Ups:
- Re: [RFC] Offloading Support in libgomp
  - From: Michael V. Zolotukhin

References:
- [RFC] Offloading Support in libgomp
  - From: Michael V. Zolotukhin
- Re: [RFC] Offloading Support in libgomp
  - From: Jakub Jelinek
- Re: [RFC] Offloading Support in libgomp
  - From: Michael V. Zolotukhin
- Re: [RFC] Offloading Support in libgomp
  - From: Jakub Jelinek
- Re: [RFC] Offloading Support in libgomp
  - From: Michael V. Zolotukhin
- Re: [RFC] Offloading Support in libgomp
  - From: Jakub Jelinek
- Re: [RFC] Offloading Support in libgomp
  - From: Michael V. Zolotukhin

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]