This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [ C ] [ C++ ] Efficient Array Construction / Binary Payload Handling

From: JeanHeyd Meneide <phdofthehouse at gmail dot com>
To: Richard Biener <richard dot guenther at gmail dot com>
Cc: GCC Development <gcc at gcc dot gnu dot org>
Date: Sun, 8 Dec 2019 22:41:41 -0500
Subject: Re: [ C ] [ C++ ] Efficient Array Construction / Binary Payload Handling
References: <CANHA4Oh5v5w0ufrH_9t4j5-VPpWzsYbUS9V2LAyk_D8fVp6d0A@mail.gmail.com> <CAFiYyc0zwKz0rDUN8dzPH3DBheqJ1gB3VUPYs-T24bnQPk2daA@mail.gmail.com>

Dear Richard Biener,

On Wed, Dec 4, 2019 at 5:48 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Sun, Dec 1, 2019 at 7:47 PM JeanHeyd Meneide <phdofthehouse@gmail.com> wrote:
> >
> > ...
> >      It worked, but this approach required removing some type checks
> > in digest_init just to be able to fake-up a proper initialization from
> > a string literal. It also could not initialize data beyond `unsigned
> > char`, as that is what I had pinned the array representation to upon
> > creation of the STRING_CST.
>
> Using a STRING_CST is an iteresting idea and probably works well
> for most data.
>
> ...
>
> Note we also have "special" CONSTRUCTOR fields like
> RANGE_EXPR for repetitive data.
>
> Since the large initializers are usually in static initializers
> tied to variables another option is to replace the DECL_INITIAL
> CONSTRUCTOR tree node with a new BINARY_BLOB
> tree node containing a pointer to target encoded (compressed)
> data.

Thank you so much for your feedback! Your ideas really helped me out
here. I'm using  RANGE_EXPR with an INDEX of 2 operands that are the
min and max of the array, and a VALUE that is the binary data to pull
from. I coded a special handling for digest_init for the C frontend:
I'll likely have to add some additional magic for the C++
initialization rules too. Some preliminary testing with large binary
files went like so:

- 50 MB binary file, huge.bin
- xxd generated include file, huge.bin.h (N.B. took 302 MB)
- compile a file with no library dependencies, using the #embed
directive or just relying on the xxd file

     It takes 11 seconds for #embed compilation to chew through the
file, encode it in a special way so it can survive external tools
applied between the preprocessor and the real compilation of the file
(e.g., a distcc or icecc workflow).

     It takes 621 seconds for the #include-based, xxd-like compilation.

I could get it even faster if I didn't have to do the encode/decode
step for the special way #embed handles data between when it exits the
preprocessor and when it enters the actual C/C++ front ends. I know of
an implementation to do it, but because #embed is not standard I have
to respect that other tools won't know how to behave in the presence
of such a special secondary implementation, so my encoded
implementation is the one that will have to stand for now.

Thank you so much,
JeanHeyd

References:
- [ C ] [ C++ ] Efficient Array Construction / Binary Payload Handling
  - From: JeanHeyd Meneide
- Re: [ C ] [ C++ ] Efficient Array Construction / Binary Payload Handling
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]