This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH v2] Generate reproducible output independently of the build-path
- From: Ximin Luo <infinity0 at pwned dot gg>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: Ximin Luo <infinity0 at pwned dot gg>
- Date: Tue, 11 Apr 2017 13:34:43 +0200
- Subject: [PATCH v2] Generate reproducible output independently of the build-path
- Authentication-results: sourceware.org; auth=none
- Authentication-results: mail.headstrong.de (amavisd-new); dkim=pass (1024-bit key) reason="pass (just generated, assumed good)" header.d=headstrong.de
(Please keep me on CC, I am not subscribed)
Background
==========
Previous background is here: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00182.html
Upon further discussion, we decided to add support for multiple mappings and to
rename the environment variable to BUILD_PATH_PREFIX_MAP. We have also prepared
a document that describes how this works in detail, so that projects can be
confident that they are interoperable:
https://reproducible-builds.org/specs/build-path-prefix-map/
The specification is currently in DRAFT status, awaiting some final feedback,
including what the GCC maintainers think about it.
If one is interested in reading about this topic in the wider context of
reproducible builds, there's some more background here:
https://wiki.debian.org/ReproducibleBuilds/StandardEnvironmentVariables
Proposal
========
This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When
this is set, GCC will treat this as extra implicit "-fdebug-prefix-map=$value"
command-line arguments that precede any explicit ones. This makes the final
binary output reproducible, and also hides the unreproducible value (the source
path prefixes) from CFLAGS et. al. which many build tools (understandably)
embed as-is into their build output.
This environment variable also acts on the __FILE__ macro, mapping it in the
same way that debug-prefix-map works for debug symbols. We have seen that
__FILE__ is also a very large source of unreproducibility, and is represented
quite heavily in the 3k+ figure given earlier.
Finally, we tweak the mapping algorithm so that it applies only to whole path
components when matching prefixes. This algorithm contains fewer corner cases
and is more predictable, so it is easier for users to figure out how to set the
mapping appropriately, and it is better as a standardised algorithm that other
build tools might like to adopt. (The original idea came from discussions with
some rustc developers about this same topic.) This does technically break
backwards-compatibility, but I was under the impression that this option was
not seen as such a critical feature, that this would be too important. I am
also happy to justify it in more detail on request.
Nevertheless, for this reason our draft specification currently offers two
algorithms for implementers, but I would reduce this to one if the GCC
maintainers agree to accept this third patch.
Testing
=======
I've tested these patches on a Debian unstable x86_64-linux-gnu schroot running
inside a Debian jessie system, on a full-bootstrap build. The output of
contrib/compare_tests is as follows:
~~~~
gcc-7-20170409$ contrib/compare_tests ../gcc-build-0 ../gcc-build-1
# Comparing directories
## Dir1=../gcc-build-0: 8 sum files
## Dir2=../gcc-build-1: 8 sum files
# Comparing 8 common sum files
## /bin/sh contrib/compare_tests /tmp/gxx-sum1.24154 /tmp/gxx-sum2.24154
New tests that PASS:
gcc.dg/cpp/build_path_prefix_map-1.c (test for excess errors)
gcc.dg/cpp/build_path_prefix_map-1.c execution test
gcc.dg/cpp/build_path_prefix_map-2.c (test for excess errors)
gcc.dg/cpp/build_path_prefix_map-2.c execution test
gcc.dg/debug/dwarf2/build_path_prefix_map-1.c (test for excess errors)
gcc.dg/debug/dwarf2/build_path_prefix_map-1.c scan-assembler DW_AT_comp_dir: "DWARF2TEST/gcc
gcc.dg/debug/dwarf2/build_path_prefix_map-2.c (test for excess errors)
gcc.dg/debug/dwarf2/build_path_prefix_map-2.c scan-assembler DW_AT_comp_dir: "/
# No differences found in 8 common sum files
~~~~
I can also provide the full logs on request.
--
I've also fuzzed the prefix-map code using AFL with ASAN enabled. Due to how
AFL works I did not fuzz this patch directly but a smaller program with just
the parser and remapper, available here:
https://anonscm.debian.org/cgit/reproducible/build-path-prefix-map-spec.git/tree/consume
Over the course of about ~4k cycles, no crashes were found.
To reproduce, you could run something like:
$ echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
$ make CC=afl-gcc clean reset-fuzz-pecsplit.c fuzz-pecsplit.c
--
I will soon test this patch backported to Debian GCC-6 on
tests.reproducible-builds.org and will have results in a few days or weeks.
Some preliminary tests earlier gave good results (about +40 packages
reproducible over ~2 days) but we had to abort due to some misscheduling.
Copyright disclaimer
====================
I dedicate these patches to the public domain by waiving all of my rights to
the work worldwide under copyright law, including all related and neighboring
rights, to the extent allowed by law.
See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full text.
Please let me know if the above is insufficient and I will be happy to sign any
relevant forms.
However, I would prefer it if the prefix-map.{h,c} remain public domain since
its code is also duplicated in our "example code" repo (url above), which is
meant for other projects to copy+paste.