This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] C undefined behavior fix
- From: mike stump <mrs at windriver dot com>
- To: torvalds at transmeta dot com
- Cc: gcc at gcc dot gnu dot org, rth at redhat dot com, trini at kernel dot crashing dot org
- Date: Sat, 5 Jan 2002 19:35:37 -0800 (PST)
- Subject: Re: [PATCH] C undefined behavior fix
> Date: Sat, 5 Jan 2002 17:01:02 -0800 (PST)
> From: Linus Torvalds <firstname.lastname@example.org>
> To: mike stump <email@example.com>
> the result on dereferencing ends up being undefined
> Which I cannot argue against ;/
Cool, now we are into the realm of pragmatics. I think the
transformation to memcpy is reasonable. I think the end result of the
bogus size is reasonable. I think that having the kernel `hide' what
it is doing in a RELOC is reasonable (single point). I think having
the kernel use memcpy would be obscure (bad).
To hide what is going on, I think it is sufficient to invent a hiding
primative, and use it. Logically something like:
#define hide(a) (a)
but with enough in there so that the optimizer cannot tell what is
going on. I would suggest an inline asm that just moves between the
source and dest or otherwise constrains the two to be in the same
place. The danger is that some day the compiler will read and
undestand the code in the asm("") and do something with it. That
won't happen for at least 10 years. I think it is reasonable to use
that in the kernel, and then fix the kernel in 10 years, when the
optimizer reads and understand asm("").
This avoids the use of an external asm routine, which I would
understand if someone called it unfortunate. This avoids the use of
washing the value through volatile, which may be deemed unfortunate.
Washing the value though volatile is the C only, it must work way of
fixing this. This won't break in 10 years.
Questions, does using the inline asm trick fix the problem? Is there
more than one place like RELOC where this can happen? If so, how many
I'm trying to envision changes to the compiler that we could put in,
to make gcc a better compiler for users and help solve this problem.
The danger of solutions that don't `hide' the value of the pointer, is
that the compiler doesn't understand that it doesn't know about the
`value' of the pointer. If the user fails to inform the compiler
about the compilers lack of knowledge about the pointer, the optimizer
will forever be making mistakes about it. Solutions where we can
reasonably forsee the same old problem, aren't solutions. The other
solution is to increase the compilers ability to know that it doesn't
know about the value without the users help. The down side of this is
it will be a never ending process where users want progressively more
intelligence and in turn leading into even more obscure bugs. I am
not sure I can overcome this downside to come up with a suggestion on
how to make the compiler realize that it doesn't know about the
What is the total number of ways in which the compiler should realize
that it doesn't know what is going on? What is the likelyhood of
adding new code to the compiler that should also realize when it
doesn't know what is going on bu that such required knowledge isn't
I am unsure exactly how to bound these down into small cases. Does
anyone see of ways to bound them down into small numbers of cases?
Would it be reasonable for the compiler to only know about a finitely
small number of such cases?
I do sense an age old theme here, stop lying to the compiler. Do we
have this documented in the manual someplace? If not, maybe we sould
consider documenting this, as apparently it isn't widely known or
understood concept. It sounds trivial, but, I think that only with
real examples from real code with an explanation of what went wrong,
will users gain the most out of such documentation.