[PATCH] c++, v2: Allow indeterminate unsigned char or std::byte in bit_cast - P1272R4

Jakub Jelinek jakub@redhat.com
Tue Nov 30 12:17:07 GMT 2021


On Mon, Nov 29, 2021 at 10:25:58PM -0500, Jason Merrill wrote:
> It's a DR.  Really, it was intended to be part of C++20; at the Cologne
> meeting in 2019 CWG thought byteswap was going to make C++20, so this bugfix
> could go in as part of that paper.

Ok, changed to be done unconditionally now.

> Also, allowing indeterminate values that are never read was in C++20
> (P1331).

Reading P1331R2 again, I'm still puzzled.
Our current behavior (both before and after this patch) is that if
some variable is scalar and has indeterminate value or if an aggregate
variable has some members (possibly nested) with indeterminate values,
in constexpr contexts we allow copying those into other vars of the
same type (e.g. the testcases in the patch below test mere copying
of the whole structures or unsigned char result of __builtin_bit_cast),
but we reject if we actually use them in some other way (e.g. try to
read a member from a variable that has that member indeterminate,
see e.g. bit-cast14.C (f5, f6, f7), even when reading it into an
unsigned char variable.

Then there is P1331R2 which makes the UB on
"an lvalue-to-rvalue conversion that is applied to an object with
indeterminate value ([basic.indet]);"
but isn't even the
  unsigned char a = __builtin_bit_cast (unsigned char, u);
  unsigned char b = a;
case non-constant then when __builtin_bit_cast returns indeterminate value?
__builtin_bit_cast returns rvalue, so no lvalue-to-rvalue conversion happens
in that case, so supposely
  unsigned char a = __builtin_bit_cast (unsigned char, u);
is fine, but on
  unsigned char b = a;
a is lvalue and is converted to rvalue.
Similarly
  T t = { 1, 2 };
  S s = __builtin_bit_cast (S, t);
  S u = s;
where S s = __builtin_bit_cast (S, t); could be ok even when some or all
members are indeterminate, but u = s; does lvalue-to-rvalue conversion?

Or there is http://eel.is/c++draft/basic.indet that has quite clear rules
what is and isn't UB and if C++ wanted to go further and allow all those
valid cases in there as constant...

Anyway, I hope this can be dealt with incrementally.

> I think in all of them the result of the cast has (some) indeterminate
> value.  So f1-3 are OK because the indeterminate value has unsigned char
> type and is never used; f4() is non-constant because S::f has
> non-byte-access type and so the new wording says it's undefined.

Ok, implemented the bitfield handling then.

Here is an updated patch, so far lightly tested.

2021-11-30  Jakub Jelinek <jakub@redhat.com>

	* constexpr.c (clear_uchar_or_std_byte_in_mask): New function.
	(cxx_eval_bit_cast): Don't error about padding bits if target
	type is unsigned char or std::byte, instead return no clearing
	ctor.  Use clear_uchar_or_std_byte_in_mask.

	* g++.dg/cpp2a/bit-cast11.C: New test.
	* g++.dg/cpp2a/bit-cast12.C: New test.
	* g++.dg/cpp2a/bit-cast13.C: New test.
	* g++.dg/cpp2a/bit-cast14.C: New test.

--- gcc/cp/constexpr.c.jj	2021-11-30 09:44:46.531607444 +0100
+++ gcc/cp/constexpr.c	2021-11-30 12:20:29.105251443 +0100
@@ -4268,6 +4268,121 @@ check_bit_cast_type (const constexpr_ctx
   return false;
 }
 
+/* Helper function for cxx_eval_bit_cast.  For unsigned char or
+   std::byte members of CONSTRUCTOR (recursively) if they contain
+   some indeterminate bits (as set in MASK), remove the ctor elts,
+   mark the CONSTRUCTOR as CONSTRUCTOR_NO_CLEARING and clear the
+   bits in MASK.  */
+
+static void
+clear_uchar_or_std_byte_in_mask (location_t loc, tree t, unsigned char *mask)
+{
+  if (TREE_CODE (t) != CONSTRUCTOR)
+    return;
+
+  unsigned i, j = 0;
+  tree index, value;
+  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (t), i, index, value)
+    {
+      tree type = TREE_TYPE (value);
+      if (TREE_CODE (TREE_TYPE (t)) != ARRAY_TYPE
+	  && DECL_BIT_FIELD_TYPE (index) != NULL_TREE)
+	{
+	  if (is_byte_access_type (DECL_BIT_FIELD_TYPE (index))
+	      && (TYPE_MAIN_VARIANT (DECL_BIT_FIELD_TYPE (index))
+		  != char_type_node))
+	    {
+	      HOST_WIDE_INT fldsz = TYPE_PRECISION (TREE_TYPE (index));
+	      gcc_assert (fldsz != 0);
+	      HOST_WIDE_INT pos = int_byte_position (index);
+	      HOST_WIDE_INT bpos
+		= tree_to_uhwi (DECL_FIELD_BIT_OFFSET (index));
+	      bpos %= BITS_PER_UNIT;
+	      HOST_WIDE_INT end
+		= ROUND_UP (bpos + fldsz, BITS_PER_UNIT) / BITS_PER_UNIT;
+	      gcc_assert (end == 1 || end == 2);
+	      unsigned char *p = mask + pos;
+	      unsigned char mask_save[2];
+	      mask_save[0] = mask[pos];
+	      mask_save[1] = end == 2 ? mask[pos + 1] : 0;
+	      if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
+		sorry_at (loc, "PDP11 bit-field handling unsupported"
+			       " in %qs", "__builtin_bit_cast");
+	      else if (BYTES_BIG_ENDIAN)
+		{
+		  /* Big endian.  */
+		  if (bpos + fldsz <= BITS_PER_UNIT)
+		    *p &= ~(((1 << fldsz) - 1)
+			    << (BITS_PER_UNIT - bpos - fldsz));
+		  else
+		    {
+		      gcc_assert (bpos);
+		      *p &= ~(((1U << BITS_PER_UNIT) - 1) >> bpos);
+		      p++;
+		      fldsz -= BITS_PER_UNIT - bpos;
+		      gcc_assert (fldsz && fldsz < BITS_PER_UNIT);
+		      *p &= ((1U << BITS_PER_UNIT) - 1) >> fldsz;
+		    }
+		}
+	      else
+		{
+		  /* Little endian.  */
+		  if (bpos + fldsz <= BITS_PER_UNIT)
+		    *p &= ~(((1 << fldsz) - 1) << bpos);
+		  else
+		    {
+		      gcc_assert (bpos);
+		      *p &= ~(((1 << BITS_PER_UNIT) - 1) << bpos);
+		      p++;
+		      fldsz -= BITS_PER_UNIT - bpos;
+		      gcc_assert (fldsz && fldsz < BITS_PER_UNIT);
+		      *p &= ~((1 << fldsz) - 1);
+		    }
+		}
+	      if (mask_save[0] != mask[pos]
+		  || (end == 2 && mask_save[1] != mask[pos + 1]))
+		{
+		  CONSTRUCTOR_NO_CLEARING (t) = 1;
+		  continue;
+		}
+	    }
+	}
+      else if (is_byte_access_type (type)
+	       && TYPE_MAIN_VARIANT (type) != char_type_node)
+	{
+	  HOST_WIDE_INT pos;
+	  if (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
+	    pos = tree_to_shwi (index);
+	  else
+	    pos = int_byte_position (index);
+	  if (mask[pos])
+	    {
+	      CONSTRUCTOR_NO_CLEARING (t) = 1;
+	      mask[pos] = 0;
+	      continue;
+	    }
+	}
+      if (TREE_CODE (value) == CONSTRUCTOR)
+	{
+	  HOST_WIDE_INT pos;
+	  if (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
+	    pos = tree_to_shwi (index)
+		  * tree_to_shwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (t))));
+	  else
+	    pos = int_byte_position (index);
+	  clear_uchar_or_std_byte_in_mask (loc, value, mask + pos);
+	}
+      if (i != j)
+	{
+	  CONSTRUCTOR_ELT (t, j)->index = index;
+	  CONSTRUCTOR_ELT (t, j)->value = value;
+	}
+      ++j;
+    }
+  if (CONSTRUCTOR_NELTS (t) != j)
+    vec_safe_truncate (CONSTRUCTOR_ELTS (t), j);
+}
+
 /* Subroutine of cxx_eval_constant_expression.
    Attempt to evaluate a BIT_CAST_EXPR.  */
 
@@ -4344,12 +4459,28 @@ cxx_eval_bit_cast (const constexpr_ctx *
 
   tree r = NULL_TREE;
   if (can_native_interpret_type_p (TREE_TYPE (t)))
-    r = native_interpret_expr (TREE_TYPE (t), ptr, len);
+    {
+      r = native_interpret_expr (TREE_TYPE (t), ptr, len);
+      if (is_byte_access_type (TREE_TYPE (t))
+	  && TYPE_MAIN_VARIANT (TREE_TYPE (t)) != char_type_node)
+	{
+	  gcc_assert (len == 1);
+	  if (mask[0])
+	    {
+	      memset (mask, 0, len);
+	      r = build_constructor (TREE_TYPE (r), NULL);
+	      CONSTRUCTOR_NO_CLEARING (r) = 1;
+	    }
+	}
+    }
   else if (TREE_CODE (TREE_TYPE (t)) == RECORD_TYPE)
     {
       r = native_interpret_aggregate (TREE_TYPE (t), ptr, 0, len);
       if (r != NULL_TREE)
-	clear_type_padding_in_mask (TREE_TYPE (t), mask);
+	{
+	  clear_type_padding_in_mask (TREE_TYPE (t), mask);
+	  clear_uchar_or_std_byte_in_mask (loc, r, mask);
+	}
     }
 
   if (r != NULL_TREE)
--- gcc/testsuite/g++.dg/cpp2a/bit-cast11.C.jj	2021-11-30 10:42:21.301491218 +0100
+++ gcc/testsuite/g++.dg/cpp2a/bit-cast11.C	2021-11-30 12:09:07.967090417 +0100
@@ -0,0 +1,69 @@
+// P1272R4
+// { dg-do compile { target c++14 } }
+
+struct S { unsigned char a[2]; alignas(sizeof 0) int b; };
+struct T { char a; alignas(sizeof 0) int b; };
+struct U { char a : 1; char : 6; char b : 1; };
+struct V { int a; S b; };
+struct W { unsigned a; T b; };
+
+constexpr bool
+f1 ()
+{
+  T t = { 1, 2 };
+  S s = __builtin_bit_cast (S, t);
+  S u = s;
+  return u.a[0] == 1;
+}
+
+constexpr bool
+f2 ()
+{
+  U u = { 0, 0 };
+  unsigned char a = __builtin_bit_cast (unsigned char, u);
+  unsigned char b = a;
+  return true;
+}
+
+constexpr bool
+f3 ()
+{
+  T t = { 1, 2 };
+  S s = __builtin_bit_cast (S, t);
+  S u = s;
+  return u.a[1] == 0;
+}
+
+constexpr bool
+f4 ()
+{
+  U u = { 0, 0 };
+  unsigned char a = __builtin_bit_cast (unsigned char, u);
+  unsigned char b = a;
+  return b == 0;		// { dg-error "is not a constant expression" }
+}
+
+constexpr bool
+f5 ()
+{
+  W t = { 1, 2 };
+  V s = __builtin_bit_cast (V, t);
+  V u = s;
+  return u.b.a[0] == 1;
+}
+
+constexpr bool
+f6 ()
+{
+  W t = { 1, 2 };
+  V s = __builtin_bit_cast (V, t);
+  V u = s;
+  return u.b.a[1] == 1;
+}
+
+constexpr bool a = f1 ();
+constexpr bool b = f2 ();
+constexpr bool c = f3 ();	// { dg-error "accessing uninitialized array element" }
+constexpr bool d = f4 ();
+constexpr bool e = f5 ();
+constexpr bool f = f6 ();	// { dg-error "accessing uninitialized array element" }
--- gcc/testsuite/g++.dg/cpp2a/bit-cast12.C.jj	2021-11-30 10:42:21.301491218 +0100
+++ gcc/testsuite/g++.dg/cpp2a/bit-cast12.C	2021-11-30 12:09:52.178449320 +0100
@@ -0,0 +1,74 @@
+// P1272R4
+// { dg-do compile { target c++14 } }
+
+namespace std
+{
+  enum class byte : unsigned char {};
+}
+
+struct S { unsigned char a[2]; alignas(sizeof 0) int b; };
+struct T { char a; alignas(sizeof 0) int b; };
+struct U { char a : 1; char : 6; char b : 1; };
+struct V { int a; S b; };
+struct W { unsigned a; T b; };
+
+constexpr bool
+f1 ()
+{
+  T t = { 1, 2 };
+  S s = __builtin_bit_cast (S, t);
+  S u = s;
+  return u.a[0] == 1;
+}
+
+constexpr bool
+f2 ()
+{
+  U u = { 0, 0 };
+  unsigned char a = __builtin_bit_cast (unsigned char, u);
+  unsigned char b = a;
+  return true;
+}
+
+constexpr bool
+f3 ()
+{
+  T t = { 1, 2 };
+  S s = __builtin_bit_cast (S, t);
+  S u = s;
+  return u.a[1] == 0;
+}
+
+constexpr bool
+f4 ()
+{
+  U u = { 0, 0 };
+  unsigned char a = __builtin_bit_cast (unsigned char, u);
+  unsigned char b = a;
+  return b == 0;		// { dg-error "is not a constant expression" }
+}
+
+constexpr bool
+f5 ()
+{
+  W t = { 1, 2 };
+  V s = __builtin_bit_cast (V, t);
+  V u = s;
+  return u.b.a[0] == 1;
+}
+
+constexpr bool
+f6 ()
+{
+  W t = { 1, 2 };
+  V s = __builtin_bit_cast (V, t);
+  V u = s;
+  return u.b.a[1] == 1;
+}
+
+constexpr bool a = f1 ();
+constexpr bool b = f2 ();
+constexpr bool c = f3 ();	// { dg-error "accessing uninitialized array element" }
+constexpr bool d = f4 ();
+constexpr bool e = f5 ();
+constexpr bool f = f6 ();	// { dg-error "accessing uninitialized array element" }
--- gcc/testsuite/g++.dg/cpp2a/bit-cast13.C.jj	2021-11-30 10:42:21.302491204 +0100
+++ gcc/testsuite/g++.dg/cpp2a/bit-cast13.C	2021-11-30 10:42:21.302491204 +0100
@@ -0,0 +1,69 @@
+// P1272R4
+// { dg-do compile { target c++14 } }
+
+struct S { char a[2]; alignas(sizeof 0) int b; };
+struct T { char a; alignas(sizeof 0) int b; };
+struct U { char a : 1; char : 6; char b : 1; };
+struct V { int a; S b; };
+struct W { unsigned a; T b; };
+
+constexpr bool
+f1 ()
+{
+  T t = { 1, 2 };
+  S s = __builtin_bit_cast (S, t);	// { dg-error "accessing uninitialized byte" }
+  S u = s;
+  return u.a[0] == 1;
+}
+
+constexpr bool
+f2 ()
+{
+  U u = { 0, 0 };
+  char a = __builtin_bit_cast (char, u);	// { dg-error "accessing uninitialized byte" }
+  char b = a;
+  return true;
+}
+
+constexpr bool
+f3 ()
+{
+  T t = { 1, 2 };
+  S s = __builtin_bit_cast (S, t);	// { dg-error "accessing uninitialized byte" }
+  S u = s;
+  return u.a[1] == 0;
+}
+
+constexpr bool
+f4 ()
+{
+  U u = { 0, 0 };
+  char a = __builtin_bit_cast (char, u);	// { dg-error "accessing uninitialized byte" }
+  char b = a;
+  return b == 0;
+}
+
+constexpr bool
+f5 ()
+{
+  W t = { 1, 2 };
+  V s = __builtin_bit_cast (V, t);	// { dg-error "accessing uninitialized byte" }
+  V u = s;
+  return u.b.a[0] == 1;
+}
+
+constexpr bool
+f6 ()
+{
+  W t = { 1, 2 };
+  V s = __builtin_bit_cast (V, t);	// { dg-error "accessing uninitialized byte" }
+  V u = s;
+  return u.b.a[1] == 1;
+}
+
+constexpr bool a = f1 ();
+constexpr bool b = f2 ();
+constexpr bool c = f3 ();
+constexpr bool d = f4 ();
+constexpr bool e = f5 ();
+constexpr bool f = f6 ();
--- gcc/testsuite/g++.dg/cpp2a/bit-cast14.C.jj	2021-11-30 12:10:24.413981883 +0100
+++ gcc/testsuite/g++.dg/cpp2a/bit-cast14.C	2021-11-30 12:33:36.343979395 +0100
@@ -0,0 +1,78 @@
+// P1272R4
+// { dg-do compile { target c++14 } }
+
+struct S { unsigned char a : 8, b : 5, c : 3, d, e; unsigned int f : 8, g : 24; };
+struct T1 { unsigned char a : 1, : 7, b : 5, c : 3, d, e; unsigned int f : 8, g : 24; };
+struct T2 { unsigned char a : 8, b : 1, : 4, c : 3, d, e; unsigned int f : 8, g : 24; };
+struct T3 { unsigned char a : 8, b : 5, c : 1, : 2, d, e; unsigned int f : 8, g : 24; };
+struct T4 { unsigned char a : 8, b : 5, c : 3, d, e; unsigned int f : 1, : 7, g : 24; };
+
+constexpr bool
+f1 ()
+{
+  T1 t = { 0, 0, 0, 0, 0, 0, 0 };
+  S s = __builtin_bit_cast (S, t);
+  S u = s;
+  return true;
+}
+
+constexpr bool
+f2 ()
+{
+  T2 t = { 0, 0, 0, 0, 0, 0, 0 };
+  S s = __builtin_bit_cast (S, t);
+  S u = s;
+  return true;
+}
+
+constexpr bool
+f3 ()
+{
+  T3 t = { 0, 0, 0, 0, 0, 0, 0 };
+  S s = __builtin_bit_cast (S, t);
+  S u = s;
+  return true;
+}
+
+constexpr bool
+f4 ()
+{
+  T4 t = { 0, 0, 0, 0, 0, 0, 0 };
+  S s = __builtin_bit_cast (S, t);	// { dg-error "accessing uninitialized byte" }
+  return true;
+}
+
+constexpr bool
+f5 ()
+{
+  T1 t = { 0, 0, 0, 0, 0, 0, 0 };
+  S s = __builtin_bit_cast (S, t);
+  unsigned char a = s.a;
+  return true;
+}
+
+constexpr bool
+f6 ()
+{
+  T2 t = { 0, 0, 0, 0, 0, 0, 0 };
+  S s = __builtin_bit_cast (S, t);
+  unsigned char b = s.b;
+  return true;
+}
+
+constexpr bool
+f7 ()
+{
+  T3 t = { 0, 0, 0, 0, 0, 0, 0 };
+  S s = __builtin_bit_cast (S, t);
+  unsigned char c = s.c;
+  return true;
+}
+
+constexpr bool a = f1 ();
+constexpr bool b = f2 ();
+constexpr bool c = f3 ();
+constexpr bool d = f4 ();
+constexpr bool e = f5 ();	// { dg-error "accessing uninitialized member" }
+constexpr bool f = f6 ();	// { dg-error "accessing uninitialized member" }
+constexpr bool g = f7 ();	// { dg-error "accessing uninitialized member" }


	Jakub



More information about the Gcc-patches mailing list