[ C ] [ C++ ] Efficient Array Construction / Binary Payload Handling

JeanHeyd Meneide phdofthehouse@gmail.com
Mon Dec 9 03:42:00 GMT 2019

Dear Richard Biener,

On Wed, Dec 4, 2019 at 5:48 AM Richard Biener
<richard.guenther@gmail.com> wrote:
> On Sun, Dec 1, 2019 at 7:47 PM JeanHeyd Meneide <phdofthehouse@gmail.com> wrote:
> >
> > ...
> >      It worked, but this approach required removing some type checks
> > in digest_init just to be able to fake-up a proper initialization from
> > a string literal. It also could not initialize data beyond `unsigned
> > char`, as that is what I had pinned the array representation to upon
> > creation of the STRING_CST.
> Using a STRING_CST is an iteresting idea and probably works well
> for most data.
> ...
> Note we also have "special" CONSTRUCTOR fields like
> RANGE_EXPR for repetitive data.
> Since the large initializers are usually in static initializers
> tied to variables another option is to replace the DECL_INITIAL
> CONSTRUCTOR tree node with a new BINARY_BLOB
> tree node containing a pointer to target encoded (compressed)
> data.

Thank you so much for your feedback! Your ideas really helped me out
here. I'm using  RANGE_EXPR with an INDEX of 2 operands that are the
min and max of the array, and a VALUE that is the binary data to pull
from. I coded a special handling for digest_init for the C frontend:
I'll likely have to add some additional magic for the C++
initialization rules too. Some preliminary testing with large binary
files went like so:

- 50 MB binary file, huge.bin
- xxd generated include file, huge.bin.h (N.B. took 302 MB)
- compile a file with no library dependencies, using the #embed
directive or just relying on the xxd file

     It takes 11 seconds for #embed compilation to chew through the
file, encode it in a special way so it can survive external tools
applied between the preprocessor and the real compilation of the file
(e.g., a distcc or icecc workflow).

     It takes 621 seconds for the #include-based, xxd-like compilation.

I could get it even faster if I didn't have to do the encode/decode
step for the special way #embed handles data between when it exits the
preprocessor and when it enters the actual C/C++ front ends. I know of
an implementation to do it, but because #embed is not standard I have
to respect that other tools won't know how to behave in the presence
of such a special secondary implementation, so my encoded
implementation is the one that will have to stand for now.

Thank you so much,

More information about the Gcc mailing list