This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Tin foil hat GCC (Was: Re: Of Bounties and Mercenaries)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Apr 07, 2004 at 10:14:13PM -0700, Tom Lord wrote:
>     > From: Joe Buck <Joe.Buck@synopsys.COM>
> 
>     > On Wed, Apr 07, 2004 at 01:40:53PM -0700, Tom Lord wrote:
>     > >     > You misunderstand.  "same bits" means "same bits".  gcc's three-stage
>     > >     > bootstrap should produce identical bits regardless of the bootstrap
>     > >     > compiler.  It is designed to eliminate effects caused by different
>     > >     > starting compilers.  The compiler compiles itself with itself.
> 
>     > > I don't misunderstand.   That's what I meant by saying that the "fixed
>     > > point" part is easy but the "secure" part is not.
> 
>     > > By injecting other compilers in the bootstrapping phase, which
>     > > incidentally most customers won't currently bother to do, you're just
>     > > raising the bar by a very small amount from a 1-stage thompson virus
>     > > to an n-stage thompson virus.
> 
>     > Which is why I said that you could prove either that no compiler in the
>     > set has a Thompson bug, or they all do.
> 
> Right.  We agree about that (which is trivial --- it's a pretty basic
> factual thing).  I'm just saying that from published existing
> practice, "they all" is a disturbingly small set, for practical
> purposes.

(Really OT sidenote)  Dunno about the rest of you, but I find this
subthread tremendously interesting.  I didn't like the computing history
modules at school level, but somehow it's grown on me.

Just what is the bootstrap path that GCC took?  Can we trace it all the
way back to programmers sticking banana plugs into vast panels of
sockets?

I assume that GCC has been self-hosting for more than a decade now, but
at the same time people bootstrap it right now with other compilers.
Off the top of my head, some possibilities:
 - HP/UX's "C" compiler - that one that's really only guaranteed to
   compile their kernel
 - Intel's ICC (haven't heard this, but surely it's doable. anyone?)
 - Sun's cc
 - SCO UnixWare's cc
 - tinycc (www.tinycc.org) - last I checked it didn't work
 - Microsoft's C/C++ compiler

What is the heritage of each - and at which point do their ancestries
converge, if at all?  There are two things going on here - bloodlines
and non-kin associations: compilers evolve new features and bugs in the
same codebase, but they also use each other for bootstrapping.  I
imagine the infection path is harder in the latter - the "compiler
detection" module has to recognise compilers in general, or risk
detection and consequent elimination.

> Let's get more paranoid, shall we?   Better compare those binaries a
> few different ways to be sure the tools you use to compare them aren't
> also hosed.  Is that already done with GCC?  Where's the web page
> with results?

What do you suggest we do, something that takes less than a lifetime to
do - punching paper tape of the binaries, doing a vdiff and then feeding
it back to a paper tape reader, to divorce the hypothetical virus's
knowledge of this-is-a-compiler from the ex nihilo binary imported from
the external world, a binary that happens to be the very same compiler?

>     > > You say: "or else [...] all the free and proprietary compilers you
>     > > tried have the same hack" and I'm saying --- that's not currently
>     > > far-fetched enough to make me comfortable.  There aren't that many
>     > > other compilers I can throw in the mix there and many of them are
>     > > centrally controlled.
> 
>     > You're off in tinfoil hat land now, I'm afraid.  

http://tinfoilhat.shmoo.com/

If we're going to get hit by an asteroid, and any of us survive, I'd
like us to be able to recover all our cool technology.  It'd be a bummer
if we ended up with CDROMs full of GCC, binutils, bash, kernel etc.
source code, but not have a single surviving binary copy of a working
toolchain!  So close, and yet so far...  Now imagine if all of that
source code were C++ and not C, and you'll see why I don't like the idea
of rewriting GCC in C++.  In fact, C++ (or any language for that matter,
C++ just does more of it) *is* a Thompson-like virus.  When you say
   std::cout << "hello, world!";
the knowledge of what "<<" means is *in the compiler binary*, if you
don't watch out.  Just like the translation of '\n' to 012 can
inadvertently end up *not* being in the compiler source, ever-higher
level languages (which C++ is to C) provide a whole slew of new "machine
learning" opportunities.

> Don't be afraid.  I hope and suspect you are right.  At the same time,
> I think I have a not completely loony fear that you are wrong.  If I
> had to bet a dollar, I'd bet you're right.  If I had to bet a million
> dollars -- mmm.... I'd look for some hedges.  Is any org out there
> betting a million dollars on the security of GCC as deployed across
> the world?

RedHat, SuSe, IBM, Mandrake, etc.?  What does Microsoft use for building
their kernels?  (It would be a funny day indeed if it ever leaked out
that they used GCC and the GNU binutils for this.)

>     > Without a theory as to how someone could have gotten the same
>     > Thompson hack into Microsoft's compiler, Sun's compiler, HP's
>     > compiler, and gcc, and then made sure that the bug would keep
>     > functioning over the course of years of compiler evolution,
>     > that's simply ridiculous.
> 
> Hrm.  For one thing, I'm not aware of any ongoing effort to compare
> the results of GCC bootstrapping via all those paths.  Are you?

I guess we all sorta hope someone else is doing it. :(

> For another thing: a 3-way attack vector?   That's not huge.  Let's
> compare attack costs vs. attack rewards.   How many gazillions of
> dollars are modulated by GCC-generated code?

BTW don't depend on cross-platform variants giving you combinatorial
explosions: A Thompson hack could quite conceivably operate at GCC's
middle-end, or even front-end, level, transmogrifying the syntax trees
well before any backend code gets to see them.

>     > Remember, for the Thompson hack to work, the compiler has to
>     > recognize that it's compiling the compiler, and hack the output

The general case: is has to recognize that it's compiling *a* compiler.
I don't think I can come up with a reliable scheme to detect compiler
code... maybe look if functions are named gen_code, for example?

>     > to reinsert two sets of bugs into the output code.  But Thompson
>     > only had to recognize pcc.  Your hypothetical hack would have to
>     > recognize every C compiler in existence, propagating the bugs
>     > into each one, every time, no matter how they change.
> 
> Yeah, right.   Since the Thompson paper, noboby at all has worked on
> higher-level programming techniques.  Sure.
> 
> We agree about the factual issues -- just not about our guestimates of
> how they measure up against the economics.   I concede .... I'm
> expressing a paranoia.  I assert: it's not so far fetched as to be
> worth ignoring.

Paranoia can be fun, too!  I see my bootstrap-the-world daydreams as a
way to parody all these Illuminati/Jews/Aliens/Women conspiracy
theorists.

Maybe Dijkstra faked his death, or is playing a big nasty trick on us
from beyond the grave?

P.S. Can anyone send me a couple (tens, at least) of 7400's and 7402's
with date codes indicating manufacture in, say, the 1970's?  Then, I'd
like you to convince me too that you didn't just put an ARM chip in a
14-pin DIP and relabel it as a 1974 7400, so that your ARM can behave
like a NAND gate until it realises it's bit-slicing a compiler.  :)
Nah, I'll just etch off the epoxy and clean my microscope...

- -- 
http://voyager.abite.co.za/~berndj/ (up again for now - yay!)
bernd's stupid blog: http://voyager.abite.co.za/~berndj/blog.php
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQFAdRmy/FmLrNfLpjMRAiD7AKCg13KRtKk4lRUfsKCOaEPfrr22pgCfSft4
6htgxVcN7Wpd5zO/yyYoEVU=
=bba5
-----END PGP SIGNATURE-----


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]