Bug 66305 - -ffat-lto-objects create unreproducible objects
Summary: -ffat-lto-objects create unreproducible objects
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: lto (show other bugs)
Version: 5.1.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: lto
Depends on:
Blocks:
 
Reported: 2015-05-27 11:03 UTC by lunar
Modified: 2020-02-01 00:47 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description lunar 2015-05-27 11:03:42 UTC
Hi!

We are working in Debian—and I know other free software projects
care—in providing our users with a way to reproduce bit-for-bit
identical binary packages from the source and build enviroment.
See <https://wiki.debian.org/ReproducibleBuilds> for more information.

Python packages in Debian are built with `-flto -ffat-lto-objects`
on architectures that support it. Sadly, this makes the resulting
binary unreproducible.

For a trivial test case:

$ cat a.c
int a(void) {
        return 42;
}
$ gcc -flto -ffat-lto-objects -c a.c
$ strings a.o | grep lto_
.gnu.lto_.inline.4980dca9e6ae4d45
.gnu.lto_a.4980dca9e6ae4d45
.gnu.lto_.symbol_nodes.4980dca9e6ae4d45
.gnu.lto_.refs.4980dca9e6ae4d45
.gnu.lto_.decls.4980dca9e6ae4d45
.gnu.lto_.symtab.4980dca9e6ae4d45
.gnu.lto_.opts
__gnu_lto_v1
$ gcc -flto -ffat-lto-objects -c a.c
$ strings a.o | grep lto_
.gnu.lto_.inline.24c30dabb443e726
.gnu.lto_a.24c30dabb443e726
.gnu.lto_.symbol_nodes.24c30dabb443e726
.gnu.lto_.refs.24c30dabb443e726
.gnu.lto_.decls.24c30dabb443e726
.gnu.lto_.symtab.24c30dabb443e726
.gnu.lto_.opts
__gnu_lto_v1

Would it be possible to make the section names deterministic?
Comment 1 Richard Biener 2015-05-27 12:21:04 UTC
I think they become deterministic with -frandom-seed=0 for example.  They are not deterministic to support partial linking of LTO objects as far as I know.

Should have the same issue with -fno-fat-lto-objects.

Why are python packages shipping with LTO bytecode?
Comment 2 andi 2015-05-27 14:17:44 UTC
On Wed, May 27, 2015 at 12:21:04PM +0000, rguenth at gcc dot gnu.org wrote:
> --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
> I think they become deterministic with -frandom-seed=0 for example.  They are
> not deterministic to support partial linking of LTO objects as far as I know.

Yes that's right. It prevents the linker from merging sections.

In theory it would be possible to use the hash of the full output path name or
similar, which would be a bit more deterministic, but there are still
ways this could break (e.g. if someone copies object files around)

How about your "deterministic build" tools just learn to ignore that
suffix?

-Andi
Comment 3 lunar 2015-05-27 18:24:22 UTC
Richard Biener:
> I think they become deterministic with -frandom-seed=0 for example.
> They are not deterministic to support partial linking of LTO objects as far
> as I know.

They are indeed reproducible with `-frandom-seed=0`. But I guess there's a
downside to that, right?

Andi:
> In theory it would be possible to use the hash of the full output path name or
> similar, which would be a bit more deterministic, but there are still
> ways this could break (e.g. if someone copies object files around)

Would using a hash over the section content work? In any cases, in the context
of Debian (this applies for FreeBSD as well), we have a canonical build path
so it would probably be fine to use it as the source of the hash.

I guess one could already do this without further help by giving the
build path to `-frandom-seed=`. This only would need some Makefile trickery.

> How about your "deterministic build" tools just learn to ignore that
> suffix?

I am not sure I understand what you are talking about. We want to get
byte-for-byte identical packages on each build (provided the same software
versions are used), so we need a given version of GCC to always produce the
same output.
Comment 4 andi 2015-05-27 19:16:23 UTC
> --- Comment #3 from lunar at debian dot org ---
> Richard Biener:
> > I think they become deterministic with -frandom-seed=0 for example.
> > They are not deterministic to support partial linking of LTO objects as far
> > as I know.
> 
> They are indeed reproducible with `-frandom-seed=0`. But I guess there's a
> downside to that, right?

The downside is that incremential linking (ld -r) does not work.
But the random seed is used for other things in gcc too, so you may
have other problems too.

> > similar, which would be a bit more deterministic, but there are still
> > ways this could break (e.g. if someone copies object files around)
> 
> Would using a hash over the section content work? In any cases, in the context
> of Debian (this applies for FreeBSD as well), we have a canonical build path
> so it would probably be fine to use it as the source of the hash.
> 
> I guess one could already do this without further help by giving the
> build path to `-frandom-seed=`. This only would need some Makefile trickery.

Yes. It would probably be easier in gcc, e.g. with a new option.
Comment 5 Daniel Kahn Gillmor 2015-12-10 21:57:41 UTC
(In reply to lunar from comment #3)

> Would using a hash over the section content work? In any cases, in the context
> of Debian (this applies for FreeBSD as well), we have a canonical build path
> so it would probably be fine to use it as the source of the hash.

Please don't embed the full canonical build path into the solution for this.  It seems possible to vary the build path successfully and produce the same output (see bug 68848 for a minor fix needed for debugging symbols), so it would be a shame to re-embed it here.

> I guess one could already do this without further help by giving the
> build path to `-frandom-seed=`. This only would need some Makefile trickery.

using a cryptographic digest of the file contents would require roughly the same Makefile trickery without forcing a canonicalized build path.
Comment 6 Peter Wu 2020-02-01 00:47:22 UTC
-frandom-seed=0 did not work for me. It appears that since GCC 8 introduced a bug where zero values resulted in a new random value anyway:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=edabe3d8b479e47a1bb3ed495f2a1d94b0ecbb2d

Any of these would trigger that problematic case in get_random_seed:
-frandom-seed=0
-frandom-seed=0x0
-frandom-seed=jNCWWJ  (non-hexadecimal strings are hashed, see set_random_seed)

-frandom-seed=1 does seem to result in a reproducible build with GCC 9.2.0, but only as long as the main source file passed to the compiler is a relative path.

The main source path is hashed, so if it was absolute, then the source directory would still become part of the build environment:
https://github.com/gcc-mirror/gcc/blob/e98ebda074bf8fc5f630a93085af81f52437d851/gcc/tree.c#L9609-L9626

        file = LOCATION_FILE (input_location);

      len = strlen (file);
      q = (char *) alloca (9 + 19 + len + 1);
      memcpy (q, file, len + 1);

      snprintf (q + len, 9 + 19 + 1, "_%08X_" HOST_WIDE_INT_PRINT_HEX,
                crc32_string (0, name), get_random_seed (false));
                ^^^^^^^^^^^^^^^^^^^^^^

(LOCATION_FILE is unaffected by -ffile-prefix-map, unlike fold_builtin_FILE in gcc/builtins.c for example.)