This is the mail archive of the
mailing list for the GCC project.
RE: Fwd: Building gcc-4.9 on OpenBSD
- From: Joe Buck <Joe dot Buck at synopsys dot com>
- To: Ian Grant <ian dot a dot n dot grant at googlemail dot com>, Jonathan Wakely <jwakely dot gcc at gmail dot com>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>, Tobias Ulmer <tobiasu at tmux dot org>, "marc dot glisse at inria dot fr" <marc dot glisse at inria dot fr>
- Date: Fri, 19 Sep 2014 01:37:02 +0000
- Subject: RE: Fwd: Building gcc-4.9 on OpenBSD
- Authentication-results: sourceware.org; auth=none
- References: <CAKFjmdzLbwLhYoNA3vg3LfWE_jbeq8npheNX6nGtOh--Tp-14g at mail dot gmail dot com> <CAKFjmdxKZBXXLXVqVR7pE6hmOfmF=8HdCV2gqJC6GxJ3822MMw at mail dot gmail dot com> <20140918212226 dot GA18317 at tin dot tmux dot org> <CAKFjmdxzORjCEafbKQGYXRFh5FZMY+DkYkvDUqVNU8QcCvx+fw at mail dot gmail dot com> <CAH6eHdSvvbb+kmDmqJVJ7fMuEEQ5p7DuO9Bcm7vuFrmRQEpQ6Q at mail dot gmail dot com> <CAKFjmdzfunurWmFhA6RZOOPe59giCTt58NkXAW=Mqzc690qSEQ at mail dot gmail dot com> <CAH6eHdR0kvRyaYywwPveO90WYQ-mKs7j1Lp9M3iMhDsxueY+Ow at mail dot gmail dot com> <CAKFjmdz346GoRkkWdz+K6nEaMYCq7B4iZ4rLTgmE=z60-cq=Mw at mail dot gmail dot com> <CAKFjmdxt7GhW3PNfNNEUED-TU3+eQh9j_FU_pnH47LhgrQ1AzQ at mail dot gmail dot com>
Ian Grant writes:
> In case it isn't obvious, what I am interested in is how easily we can know the problem of infeasibly large binaries isn't an instance of this one:
Ah, this is commonly called the Thompson hack, since Ken Thompson actually produced a successful demo:
The only way that the Thompson hack can survive a three-stage bootstrap is if the compiler used for the stage 1 build has the bad code. The comparison between stages 2 and 3 require exact match, and any imperfection in the object code injection would reveal itself.
So, you can build GCC with LLVM or Intel's compiler or Microsoft's or IBM's or Sun's, doing cross-compilation where necessary. The basic idea is:
1: build gcc with 3-stage bootstrap, starting with a compiler that you suspect might be infected. call the result A.
2: do it again, starting with a different compiler that you think is independent of the compiler you used in step 1. call it B.
3: compare A to B. If they differ, you've found something that should be investigated. If you don't, then either A and B are both clean, or A and B both have the identical inserted object code. Maybe they have a common ancestor?
Note that if you build gcc with a cross-compiler the object code will be different. You have to use the cross-compiler to build one more time to "normalize": GCC 4.9.0 built with GCC 4.9.0 on operating system X should always be the same.
As far as I know no one has been paranoid enough to put in the time to do the experiment on a large scale, and it's harder because you can't build a modern GCC (or LLVM for that matter) with an ancient compiler. But you can create a chain: grab an ancient gcc version off a 15-year-old CD, and build newer versions with it until you get up to the present. The result should be byte-for-byte identical with what you get when building the current compiler with a recent version. If it is, then either the infection is 15 years old or does not exist. Try it again by building cross-compilers from a Microsoft system. Don't trust Apple, they used to use GCC so maybe all their LLVM binaries caught the bug.
BTW, if "size" is reporting much smaller size than the executable file itself and that motivates this concern, most of the difference is likely to be debug info, which is bigger since gcc switched to C++. Might want to try "strip".