Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS
Prathamesh Kulkarni
prathameshk@nvidia.com
Wed Jul 31 14:58:34 GMT 2024
> -----Original Message-----
> From: Prathamesh Kulkarni <prathameshk@nvidia.com>
> Sent: Tuesday, July 30, 2024 4:44 PM
> To: Jakub Jelinek <jakub@redhat.com>; Richard Biener
> <rguenther@suse.de>
> Cc: Richard Sandiford <richard.sandiford@arm.com>; gcc-
> patches@gcc.gnu.org
> Subject: RE: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
>
> External email: Use caution opening links or attachments
>
>
> > -----Original Message-----
> > From: Jakub Jelinek <jakub@redhat.com>
> > Sent: Tuesday, July 30, 2024 3:16 PM
> > To: Richard Biener <rguenther@suse.de>
> > Cc: Richard Sandiford <richard.sandiford@arm.com>; Prathamesh
> Kulkarni
> > <prathameshk@nvidia.com>; gcc-patches@gcc.gnu.org
> > Subject: Re: Support streaming of poly_int for offloading when it's
> > degree <= accel's NUM_POLY_INT_COEFFS
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, Jul 30, 2024 at 11:25:42AM +0200, Richard Biener wrote:
> > > Only "relevant" stuff should be streamed - the offload code and
> all
> > > trees refered to.
> >
> > Yeah.
> >
> > > > > I think all current issues are because of poly-* leaking in
> for
> > > > > cases where a non-poly would have worked fine, but I have not
> > had
> > > > > a look myself.
> > > >
> > > > One of the cases that Prathamesh mentions is streaming the mode
> > sizes.
> > > > Are those modes "offload target modes" or "host modes"? It
> seems
> > > > like it shouldn't be an error for the host to have VLA modes per
> > se.
> > > > It's just that those modes can't be used in the host/offload
> > interface.
> > >
> > > There's a requirement that a mode mapping exists from the host to
> > > target enum machine_mode. I don't remember exactly how we compute
> > > that mapping and whether streaming of some data (and thus poly-
> int)
> > > are part of this.
> >
> > During streaming out, the code records what machine modes are being
> > streamed (in streamer_mode_table).
> > For those modes (and their inner modes) then lto_write_mode_table
> > should stream a table with mode details like class, bits, size,
> inner
> > mode, nunits, real mode format if any, etc.
> > That table is then streamed in in the offloading compiler and it
> > attempts to find corresponding modes (and emits fatal_error if there
> > is no such mode; consider say x86_64 long double with XFmode being
> > used in offloading code which doesn't have XFmode support).
> > Now, because Richard S. changed GET_MODE_SIZE etc. to give poly_int
> > rather than int, this has been changed to use bp_pack_poly_value;
> but
> > that relies on the same number of coefficients for poly_int, which
> is
> > not the case when e.g. offloading aarch64 to gcn or nvptx.
> Indeed, for the minimal test:
> int main()
> {
> int x;
> #pragma omp target map (to: x)
> {
> x = 0;
> }
> return x;
> }
>
> Streaming out mode_table from AArch64 shows:
> mode = SI, mclass = 2, size = 4, prec = 32 mode = DI, mclass = 2, size
> = 8, prec = 64
>
> While streaming-in for nvptx shows:
> mclass = 2, size = 4, prec = 0
>
> The discrepancy happens because of differing value of
> NUM_POLY_INT_COEFFS between AArch64 and nvptx.
> From AArch64 it streams out size and prec as <4, 0> and <32, 0>
> respectively, where 0 comes from coeffs[1].
> While streaming-in from nvptx, since NUM_POLY_INT_COEFFS is 1, it
> incorrectly reads size as 4, and prec as 0.
> >
> > From what I can see, this mode table handling are the only uses of
> > bp_pack_poly_value. So the options are either to stream at the
> start
> > of the mode table the NUM_POLY_INT_COEFFS value and in
> > bp_unpack_poly_value pass to it what we've read and fill in any
> > remaining coeffs with zeros, or in each bp_pack_poly_value stream
> the
> > number of coefficients and then stream that back in and fill in
> > remaining ones (and diagnose if it would try to read non-zero
> > coefficient which isn't stored).
> This is the approach taken in proposed patch (stream-out degree of
> poly_int followed by coeffs).
>
> > I think streaming NUM_POLY_INT_COEFFS once would be more compact (at
> > least for non-aarch64/riscv targets).
> I will try implementing this, thanks.
Hi,
The attached patch streams-out NUM_POLY_INT_COEFFS only once at beginning of mode_table, which should make LTO bytecode more compact
for non VLA hosts. And changes streaming-in of poly_int as follows:
if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
{
for (i = 0; i < host_num_poly_int_coeffs; i++)
poly_int.coeffs[i] = stream_in coeff;
/* Set remaining coeffs to zero (like zero-extension). */
for (; i < NUM_POLY_INT_COEFFS; i++)
poly_int.coeffs[i] = 0;
}
else
{
for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
poly_int.coeffs[i] = stream_in coeff;
/* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS. */
for (; i < host_num_poly_int_coeffs; i++)
{
val = stream_in coeff;
if (val != 0)
error ();
}
}
There are a couple of issues in the patch:
(1) The patch streams out NUM_POLY_INT_COEFFS at beginning of mode_table, which should work for bp_unpack_poly_value,
(since AFAIK, it's only called by lto_input_mode_table). However, I am not sure if we will always call lto_input_mode_table
before streaming in poly_int64 / poly_uint64 ? Or should we stream out host NUM_POLY_INT_COEFFS at a different place in LTO bytecode ?
(2) The patch defines POLY_INT_READ_COMMON macro for factoring out common code to read poly_int, however, I am not sure
how to define a callback for different streaming functions like streamer_read_[u]hwi, bp_unpack value since they have different
signatures. The patch uses an (ugly) kludge streamer_read_coeff, which is essentially a call to streaming-in function.
Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Jakub
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p-163-6.txt
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20240731/15ee6cbc/attachment-0001.txt>
More information about the Gcc-patches
mailing list