This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] C undefined behavior fix



On Fri, 4 Jan 2002 dewar@gnat.com wrote:
>
> > But the C language has exactly the notion of "implementation defined"
> > issues that _allow_ the programmer to take advantage of knowing more about
> > the machine and the environment than the language lawyers did. Thats' the
> > whole _point_ of "implementation defined" vs "undefined".
>
> The notion of implementation defined *allows* but does not *require* a
> compiler to do sensible things in these situations. You can't have a
> requirement that only sensible things are allowed, because no one can
> define formally what sensible means.

Yes. However, you seem to think that "implementation defined" equals
"undefined".

Which it clearly doesn't.

It doesn't require the compiler to do something sane, but apart from the
QoI issue it _does_ require the documentation part. Which is going to be a
tough nut to crack if your behaviour isn't sane.

And yes, "implementation defined" is always a headache. Some languages
simply do not allow it, and lock you into the language model. That was one
of the downfalls of Pascal for example, which made it easier to verify,
but made it unusable for anything outside the scope of the language.

> That means that when we come to something that is implementation defined,
> we have to take a purely pragmatic viewpoint (something I have been arguing
> here for a while, but it is *you* who seems to be fixated on arguing your
> position from the language standard).

I brought up the issue of "implementation defined" simply because the
arguments I saw were bogus. They claimed that the behaviour was undefined,
which it isn't, with some simple syntactic changes.

I also explicitly pointed out that those syntactic changes make no
difference to the compiler on a lower level, so any optimizations done on
that lower level should be considered very carefully.

> Once again, the C standard *allows* a compiler to do what you want, whatever
> that is (we still don't have a clear statement), because it allows a compiler
> to do whatever it wants.

No, it still requires the documentation of the choices. I don't think you
can say that the source code documents it because gcc is open-source. You
might as well say that a closed-source compiler "documents" its choices by
virtue of the code it generates (ie "just compile the code into assembly
language, you'll see what the choices are"). That's a bogus argument too.

And saying something is "undefined" is _not_ documenting it. It's just
documenting that it _isn't_ documented.

> Furthermore, it seems resaonable to assume that what you want is sensible,
> though I can't be sure, because you have not defined it precisely, and sometim
> things that seem sensible on first glance turn out to be meaningless nonsense
> when subjected to strenuous semantic scrutiny.

Actually, I think I can actually argue for the sensible behaviour, but the
problem is that implementing it in gcc may be hard because gcc throws away
type and cast information at a fairly high level.

To take the example that Richard had, for example, which was something
like:

	int a[5];
	int b[5];

	for (i = 0; i < 10; i++)
		a[i] = b[0];

and Richard (correctly, in my opinion) claims that the compiler can and
should be able to see that b[0] is constant, even if the programmer laid
out "a" and "b" consecutively in memory (and thus changing "a[5]" really
does change the memory location that b[0] has).

However, I think he is correct, because the above actually clearly _does_
invoce undefined behaviour.

The fact that I agree with him that the optimization is valid does not
mean that I think it's valid in other cases.

Because, at the same time, I think that if the for-loop looked like

	for (i = 0; i < 10; i++)
		*(int *)(sizeof(int)*i + (unsigned long)a) = b[0];

then gcc should _not_ assume that "b[0]" is constant, because gcc has no
way of knowing whether the integer expression might alias with it, and
both sides are of the same type (so gcc cannot use type-base aliasing
information either).

(Yes, the above is assuming that you document the historical and
strightforward "bitwise" pointer conversion - but note that that _is_ easy
to document, and gcc actually already does that part).

Does that mean that gcc would create slower code for stuff that uses
computed pointers? Yes. That even makes sense - doing things at a lower
level means that the compiler cannot make the same number of high-level
assumptions and optimizations. That's a very common issue in optimization.

Now, the nasty thing for gcc is, that at a code generation level, gcc has
long since forgotten about all the integer/pointer casts by the time it
plays with the RTL, so gcc _cannot_ make the distinction at a RTL level.

Which is really Richards argument, I think, and is the argument for the
current problematic "definition" (and in my opinion lack there-of) of the
implementation-defined behaviour of pointers.

And I actually _agree_ with Richard on that level. I understand his
problem as a compiler implementor, and I would not disagree with the
technical decision to ignore the definition problem.

However, I do think that it would be sensible to just say:
 - gcc should try to do these optimizations at a tree level, and propagate
   the alias information to a lower level.
 - the current "documentation" on the implementation-defined nature of
   pointer/integer conversions simply _isn't_ documentation of what it
   does.
 - the C language clearly _does_ have different rules for pointer
   arithmetic and for "integer arithmetic with cast pointers" (the first
   one ends up being undefined and allows more aggressive optimization,
   while the second one is implementation-defined and thus tends to
   disallow some of those optimizations).

>   a) stay in that domain, and not try to weasle out wearing your pragmatic
>   teashirt when the going gets rough.
>
>   b) expect that arguments in this domain will be answered in that domain.

I do try to wear both hats, partly because I have noticed that if I do
_not_ wear the language lawyer hat I will invariably be answered by
language lawyer answers.

In short, I'm not a language lawyer because I want to be one, but because
I have been forced by past experience to be one. And quite often, in my
opinion, the language lawyers have used bogus arguments (ie the patently
untrue "undefined behaviour" argument).

		Linus


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]