Bug 43680 - [DR 1022] G++ is too aggressive in optimizing away bounds checking with enums
Summary: [DR 1022] G++ is too aggressive in optimizing away bounds checking with enums
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 4.5.0
: P3 normal
Target Milestone: 4.6.0
Assignee: Jason Merrill
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2010-04-07 21:51 UTC by Jason Merrill
Modified: 2014-02-12 18:55 UTC (History)
7 users (show)

See Also:
Host:
Target: x86_64-unknown-linux-gnu
Build:
Known to work:
Known to fail: 4.3.5, 4.4.3, 4.5.0
Last reconfirmed: 2010-04-07 21:56:00


Attachments
patch (2.96 KB, patch)
2010-04-09 04:44 UTC, Jason Merrill
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Merrill 2010-04-07 21:51:27 UTC
With -O2, this testcase aborts on x86_64:

extern "C" void abort ();

enum E { A, B, C, D };

void
CheckE(const E value)
{
  long v = value;
  if (v <= D)
    abort ();
}

int main() {
  CheckE(static_cast<E>(5));
}

Presumably something in the middle- or back-end is believing what the C++ front end sets TYPE_MAX_VALUE to and optimizing based on the assumption that "value" cannot be greater than 3.

I think the C++ standard can definitely be read to allow this optimization, but it means optimizing away bounds checking, which is a serious security problem.  The standard should be fixed.
Comment 1 Wolfgang Bangerth 2010-04-07 23:16:24 UTC
(In reply to comment #0)
> I think the C++ standard can definitely be read to allow this optimization

I would most definitely think so. 7.2/6 specifically says that the values
an enum variable can take on are, in your example, 0...3. See, in particular,
footnote 81 in C++98. Everything outside would invoke undefined behavior,
and I like that the compiler can make inferences from that.

Don't we already have a warning that warns about assigning values outside the
range of an enum?

W.

Comment 2 Jason Merrill 2010-04-08 02:58:46 UTC
Except that the conversion is defined to produce an unspecified value, not undefined behavior.  A strict reading of that suggests to me that the compiler is supposed to truncate on conversion so the resulting value is one of the values of the enumeration, but that's certainly not what we want.

I also raised this issue as

http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1022
Comment 3 bangerth@math.tamu.edu 2010-04-08 03:18:24 UTC
Subject: Re:  G++ is too aggressive in optimizing away bounds checking with enums


> Except that the conversion is defined to produce an unspecified
> value, not undefined behavior.  A strict reading of that suggests to me
> that the compiler is supposed to truncate on conversion so the resulting
> value is one of the values of the enumeration, but that's certainly not
> what we want.

I'm really not in a position to argue standardese with you, but my reading 
based on the paragraph I cited is that the enumeration has a certain range 
of values it can represent (which may be larger than the set of values in 
the enumeration). In my interpretation, any assignment of values outside 
the range must necessarily produce (unspecified) values within this range.

I can see no reason to prohibit a compiler from applying a mask every time.

W.
Comment 4 Jason Merrill 2010-04-08 05:19:21 UTC
Yes, that's my interpretation as well.  But it's not what we do now, and requiring a mask on conversion seems like it would hurt performance more than just doing the comparison.
Comment 5 Richard Biener 2010-04-08 08:39:41 UTC
The middle-end trusts the C++ frontends TYPE_MIN/MAX_VALUE which is according
to the C++ standard the range for an (unsigned) integer type with a precision
just as wide as to represent all values in the enum.

I did want to remove that optimization at some point but people complained
that it was highly useful (the C++ FE can still set TYPE_PRECISION
accordingly if it wants to, but TYPE_MIN/MAX_VALUE cannot really be trusted
in the middle-end as conversions between integer types of same precision
but different TYPE_MIN/MAX_VALUE are 1) useless and 2) do not produce code
to truncate the values).

It's VRP btw, that does this optimization.  If I have your support I'll rip
out this optimization from VRP ... (I didn't yet manage to create a wrong-code
bug due to the useless conversion thing, but I believe that is possible).

I also believe that the TYPE_MIN/MAX_VALUE on enums might be needed for
proper debug information.
Comment 6 Richard Biener 2010-04-08 08:40:26 UTC
Oh, and actually I don't think this bug is a wrong-code bug but at most
a quality of implementation issue.
Comment 7 Manuel López-Ibáñez 2010-04-08 09:02:10 UTC
(In reply to comment #1)
> 
> Don't we already have a warning that warns about assigning values outside the
> range of an enum?
> 

pr43680.C:14:26: warning: the result of the conversion is unspecified because ‘5’ is outside the range of type ‘E’ [-Wconversion]

Implemented by yours truly in GCC 4.5. (or perhaps already in 4.4.)
Comment 8 Eric Botcazou 2010-04-08 10:48:33 UTC
> I did want to remove that optimization at some point but people complained
> that it was highly useful (the C++ FE can still set TYPE_PRECISION
> accordingly if it wants to, but TYPE_MIN/MAX_VALUE cannot really be trusted
> in the middle-end as conversions between integer types of same precision
> but different TYPE_MIN/MAX_VALUE are 1) useless and 2) do not produce code
> to truncate the values).

We had the exact same problem in Ada until 4.4, it has been solved by setting TYPE_MIN/MAX_VALUE to their canonical values...

> I also believe that the TYPE_MIN/MAX_VALUE on enums might be needed for
> proper debug information.

...and this one by providing a get_subrange_bounds lang hook.
Comment 9 Wolfgang Bangerth 2010-04-08 12:53:15 UTC
I'm not saying we *should* apply a mask (in fact, I think that would be
silly). But we *could*, and if we did then VRP's actions might lead to
faster but not different code.

All I want to say is that VRP is a powerful tool. Let's not just throw
the baby out with the bath water.

W.
Comment 10 Jason Merrill 2010-04-08 15:57:31 UTC
@7: But -Wconversion only warns if it knows the constant value, it doesn't warn about converting from an arbitrary int variable such as (importantly) a loop counter.

@5: It seems appropriate to me for VRP to optimize based on TYPE_MIN/MAX_VALUE, the problem is that the front end is lying; either we need to apply a mask on conversion to the enum, or we should set TYPE_MIN/MAX_VALUE to the appropriate values for the mode.  My preference is for the latter, and it sounds like Ada made the same choice.

The debugging backends don't seem to care about TYPE_MIN/MAX_VALUE for enums.
Comment 11 Andrew Pinski 2010-04-08 22:30:24 UTC
See also:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42810
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41425
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37281
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36170

There might be others which have been already closed as invalid/duplicate too but I cannot find them with a very simple search :).
Comment 12 Jason Merrill 2010-04-09 04:44:01 UTC
Created attachment 20343 [details]
patch

This patch causes the compiler to avoid this optimization while preserving all the related warnings (except pr33738.C which is specifically testing that this optimization is done).
Comment 13 Mark Mitchell 2010-04-20 20:43:45 UTC
I think this optimization is valuable in some cases, so I think this is a question of defaults, rather than of behavior per se.  While it may be useful for some security-related applications not to eliminate the checks, but it is clearly useful in other contexts to eliminate the checks.  "Optimizing away bounds checking" is certainly not in and of itself a bug.

In some sense, this is similar to strict-aliasing, or signed-does-not-overflow optimization.  There's a lot of code out there that depends on something that's not portable.  It's not useful to break that code, but it is useful to be able to optimize.  (One case where I think this optimization is likely to be valuable is in switch/if statements where you're cascading through values; this optimization can allow you to eliminate the catch-all case at the end, which is certainly a win for code-size.)

I think the standard should say that you get an indeterminate value when you convert an out-or-range input to an enum (and that such value may be outside the bounds of the enum range), but that the program is not allowed to go and reformat your disk.  Ada has a notion of errors like this; the damage is bounded.  This doesn't need to be in the dereference NULL pointer category of behavior, but neither should it have any guaranteed semantics, including that the value is masked down.

So, I think we should have a C++ front-end option that says "interpret enums strictly".  I think it should be off by default for compatibility with existing code.
Comment 14 Jason Merrill 2010-04-20 22:02:44 UTC
Certainly optimizing away bounds checking is good when it is provably redundant, but that clearly doesn't apply to this case.  That said, I'll go ahead and add the option.
Comment 15 Mark Mitchell 2010-04-20 22:18:38 UTC
Subject: Re:  [DR 1022] G++ is too aggressive in optimizing
 away bounds checking with enums

jason at gcc dot gnu dot org wrote:

> Certainly optimizing away bounds checking is good when it is provably
> redundant, but that clearly doesn't apply to this case.

Do you think this is different from signed integer overflow in loops?
To me, it seems quite similar.  That's a situation where the compiler
will now optimize away the check in something like "for (int i = 0; i >=
0; ++i)", leaving us with an infinite loop.

And, of course, that can hit you in a security context too.

  /* Here we know that "i" is positive.  */
  ...
  if (i + 100 <= 0)
    abort();
  /* The check above will make sure this never overflows ...
     <scaryvoice>or will it?</scaryvoice> */
  i += 100;

> That said, I'll go ahead and add the option.

Thanks,

Comment 16 Jason Merrill 2010-04-20 23:10:56 UTC
I agree that it's similar to signed integer overflow.  One significant difference is that this issue doesn't affect C.

One strange thing here is that the VRP optimization only happens when the enum variable is converted to a different type; it doesn't happen if we just say "if (value <= D)."  That seems like a bug.
Comment 17 Jason Merrill 2010-05-03 21:16:55 UTC
Subject: Bug 43680

Author: jason
Date: Mon May  3 21:16:40 2010
New Revision: 159006

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=159006
Log:
	PR c++/43680
gcc:
	* c.opt (-fstrict-enums): New.
	* doc/invoke.texi (C++ Dialect Options): Document -fstrict-enums.
gcc/cp:
	* decl.c (finish_enum): Use the TYPE_MIN_VALUE and TYPE_MAX_VALUE
	from the selected underlying type unless -fstrict-enums.  Set
	ENUM_UNDERLYING_TYPE to	have the restricted range.
	* cvt.c (type_promotes_to): Use ENUM_UNDERLYING_TYPE.
	* class.c (check_bitfield_decl): Likewise.

Added:
    trunk/gcc/testsuite/g++.dg/opt/enum2.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/c.opt
    trunk/gcc/cp/ChangeLog
    trunk/gcc/cp/class.c
    trunk/gcc/cp/cvt.c
    trunk/gcc/cp/decl.c
    trunk/gcc/doc/invoke.texi
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/g++.dg/warn/Wswitch-1.C
    trunk/gcc/testsuite/g++.dg/warn/pr33738.C

Comment 18 Jason Merrill 2010-05-04 04:44:18 UTC
Fixed for 4.6.0.
Comment 19 Jason Merrill 2014-02-12 18:55:54 UTC
It looks like the committee is making this code undefined again:

http://open-std.org/jtc1/sc22/wg21/docs/cwg_toc.html#1766