Implementation-defined behavior or not?
Mon Jun 3 10:41:00 GMT 2019
On Mon, 3 Jun 2019 at 10:49, esoteric escape <email@example.com> wrote:
> Thanks! I see, yes speaking of C++17. Just to make sure I grasped it I'll say how I get it:
> 1. In the std::string's case, we care about bits regardless of the value of the chars inside std::string, so because mapping is precise that makes it well-defined.
> 2. In case of char, the underlying bit representation changes
On most implementations, no. The underlying bit representation is the
same. 0xC8 as an unsigned char is 11001000 and as a char is also
11001000. What is implementation-defined is the value of 11001000 as a
char. For signed char with GCC that value is (char)-56. For an
unsigned char it's (char)200. One a one's complement system
> if value overflows range and its implementation-defined.
There's no difference between #1 and #2. In both cases the UTF-8
encoding produces some (implementation-defined) char value for each
UTF-8 code unit. If you want UTF-8 encoded data then that's what you
get. The resulting chars will work perfectly well with anything that
expects UTF-8 encoded data. The fact that some characters in the
string might have negative values is irrelevant. If converting the
code unit to a char produces a negative value then that's what you
get. You probably don't need to worry about it in any more detail than
If you're using that string somewhere that expects UTF-8 then
everything just works. If you're using it somewhere that expects 7-bit
ASCII values then it might not work, but that's always true of UTF-8
data, it has nothing to do with whether char is signed or unsigned.
> Say, I decide to manually do this:
> std::string s = "\xE2\x82\xAC";
> char c;
> c = 0xE2;
> c = 0x82;
> c = 0xAC;
> Then, I suppose these cases will be more like my #2 above than #1, true?
Case #2 and #1 are the same.
More information about the Gcc-help