This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Get rid of awkward semantics for subtypes


> we're almost ready to get rid of the awkward semantics that is
> implemented in the middle-end and the optimizers for subtypes
> (INTEGER_TYPEs with a non-null TREE_TYPE); this should overall simplify
> things, make the support for invalid values in Ada more robust and expose
> more optimization opportunities.

Let me say more about this since the issue here is very subtle.

If we have a type with TYPE_MIN_VALUE of 10 and TYPE_MAX_VALUE of 20, what
we'ree declaring to the middle-end is "any valid value of this type is
between 10 and 20".  That's consistent with Ada's usage.  The problem is
when you pose the question "what if the value is *invalid*?" (e.g., an
uninitialized variable that happens to be outside the range).  The
assumption of the middle-end, which is the case for C-like languages, is
that it's undefined.  This means the compiler is free to generate code
under the assumption it *is* valid, because *any* result is acceptable if
it isn't valid.

For Ada, there are two issues.  First, there's a language-defined way of
asking "is this object valid?" and it has to work.  If the compiler assumes
all values have to be valid, clearly it can't.  But the more major problem
is that an invalid value is *not* undefined (what Ada call "erroneous") in
Ada.  Instead, it's something weaker called a "bounded error" and there are
limit on "how bad it can get".  For example, if a variable is invalid, it
must have *the same* invalid value each time.

The middle-end says "for the result to be defined, the value must be valid,
so we assume it is".  But for Ada, the proper statement is "if I can prove
the valid is valid, then we can assume it's in the range for the type".

There are many cases when you can prove the value can be treated as valid.
One interesting case is based on the fact that suppressing a
language-defined check is erroneous if that check would fail.  So, for

	A: = B + 1;

you *can* assume A is valid.  That assignment *requires* a check and if the
check is suppressed and would fail, then the result is undefined
("erroneous") if A is invalid, which is the semantics that the middle-end
currently has.  There are many other cases when you can statically
determine that a variable must be valid.

So the "proper" way of dealing with this would be to have VRP propagate a
"valid" property using fairly easy-to-state rules and then only assume a
value within the subtype bounds if it can prove it to be valid (otherwise,
the type bounds must be used).

We've thought about doing this, but it's a lot of work and the benefit
isn't clear.  From a theoretical point of view, the more an optimizer knows
about a program, the better it can do and knowing that valid values are in
specified ranges certainly conveys some information. One could also argue
that such information is even more valuable in an IPA context.

But from a practical point of view, it's much less clear.  How much does
the range information actually buy in practice?  In how many cases do we
know that a variable will always be valid?  Might there be something about
that subset that makes the range information less useful for it?  We just
don't know the answers to these questions.

But in the absence of doing that, the present situation is problematic.
It's true that only the most pedantic programmer would care about the
distinction between a "bounded error" and "erroneous behavior", so one
could argue that this problem isn't that serious, but there do exist such
people in the Ada world.  Similarly, it's possible to find some solutions
to the "test for validity" case that don't involve the full solution above,
but they're also work.  But the existance of both problems together suggest
that the present situation isn't workable and it's not clear that it's
worth fixing "properly" at this time.

Because this decision may be revisited and it may be found worthwhile to
add the mechanism above and "do this right", it's important that we not
remove code that supports the ranges unless absolutely necessary because
doing so would greatly increase the amount of working needed to do this
right and thus make it even less likely.  (And, in any event, these types
*are* needed for array bounds, so must be supported at some level.)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]