Bug 88662

Summary: Document trap representations of _Bool
Product: gcc Reporter: gnzlbg <gonzalo.gadeschi>
Component: cAssignee: Not yet assigned to anyone <unassigned>
Status: UNCONFIRMED ---    
Severity: normal CC: jsm28, msebor, vstinner
Priority: P3 Keywords: documentation
Version: 9.0   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed:

Description gnzlbg 2019-01-02 13:51:44 UTC
Compiling

unsigned int foo(unsigned int x, _Bool b) {
    return x - (unsigned int)b;
}

only produces correct results if the value of `_Bool` is either `0` or `1` [0], see https://gcc.godbolt.org/z/l0DPjc:

foo:
        movzx   esi, sil
        mov     eax, edi
        sub     eax, esi
        ret

This probably means that all other representations of `_Bool` are trap representations, but this does not appear to be documented anywhere. 

From my reading of the C standard, the role that padding bits play for `_Bool` is unclear.

[0] one can construct such a _Bool by writing to it via a char* .
Comment 1 Jonathan Wakely 2019-01-02 14:12:15 UTC
(In reply to gnzlbg from comment #0)
> Compiling
> 
> unsigned int foo(unsigned int x, _Bool b) {
>     return x - (unsigned int)b;
> }
> 
> only produces correct results if the value of `_Bool` is either `0` or `1`

Because (unsigned int)b is undefined otherwise.

> [0], see https://gcc.godbolt.org/z/l0DPjc:
> 
> foo:
>         movzx   esi, sil
>         mov     eax, edi
>         sub     eax, esi
>         ret
> 
> This probably means that all other representations of `_Bool` are trap
> representations, but this does not appear to be documented anywhere. 

The representation of _Bool is unspecified, not implementation-defined, so doesn't need to be documented.
Comment 2 gnzlbg 2019-01-02 14:15:39 UTC
> Because (unsigned int)b is undefined otherwise.

AFAICT this is only undefined behavior iff `b` has a trap representation.
Comment 3 Jonathan Wakely 2019-01-02 14:21:11 UTC
Yes, and an implementation is not required to document which object representations are trap representations.
Comment 4 gnzlbg 2019-01-02 14:23:14 UTC
Without that information, how does one know which values can a valid program write to a `_Bool` via a `char*`? 

AFAIK the C standard guarantees that 0x0 must be a valid representation of _Bool, but there are no guarantees about the bit-pattern of true beyond that such a value must exist.
Comment 5 Jonathan Wakely 2019-01-02 14:28:24 UTC
You can copy the bit-pattern from any _Bool with true value, e.g. one initialized with 'true' or an expression like '0==0'.

Why do you need more than that?
Comment 6 gnzlbg 2019-01-02 14:39:54 UTC
> Why do you need more than that?

I'm reading raw data from a file which supposedly contains _Bool's and I'd like to validate it (the _Bools could have been written to the file by a program compiled with a different C toolchain). 

> You can copy the bit-pattern from any _Bool with true value,

The standard does not guarantee that only one such bit-pattern exists AFAICT, i.e., there might be multiple bit-patterns representing true and false, e.g., if only the first bit is used to represent true and false, and all other bits are ignored (e.g., as opposed to just being zero, like the SysV AMD64 ABI requires).
Comment 7 Martin Sebor 2019-01-13 01:12:19 UTC
(In reply to gnzlbg from comment #2)
> > Because (unsigned int)b is undefined otherwise.
> 
> AFAICT this is only undefined behavior iff `b` has a trap representation.

Not necessarily.  It's undefined if b's value is indeterminate, whether or not it's a trap representation, or whether or not b's type even has a trap representation.  See C Defect Report 451 for some background:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_451.htm
I don't think copying arbitrary bits into an object changes that, unless those bits come from an initialized object of the same type in the same program execution.

That said, there has been a lot of confusion about padding bits and trap representations so I'm not completely unsympathetic to the request, even though, as Jonathan says, thos aspects of types are unspecified.  But rather than documenting which bits are padding bits I think it should be sufficient to either mention which types have padding bits, or expose some additional Common Predefined Macros to make it possible to determine which ones do (and perhaps even compute how many).
Comment 8 gnzlbg 2019-01-14 08:40:09 UTC
> I think it should be sufficient to either mention which types have padding bits,

I am not sure. An intrinsic that tells me that _Bool has 7 padding bits does not provide me with any new information. The C standard guarantees that _Bool has 1 value bit, so if `sizeof(_Bool)` returns N, then _Bool must have N * CHAR_BITS - 1 padding bits AFAICT. 

My question is which values are those padding bits allowed to take, which is unspecified in the C standard AFAICT. 

N1356 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1356.htm) stated:

> GCC defines it to have one value bit with the other bits being padding bits and undefined behavior if you access a _Bool representation with any of the padding bits having a nonzero value (such representations being trap representations)

Documenting that this is how GCC defines the value that the padding bits in _Bool are allowed to take would be an useful guarantee, even if the standard does not require GCC to make this guarantee.
Comment 9 Jonathan Wakely 2019-01-14 09:46:20 UTC
But it constrains GCC in future, which leaving it unspecified does not.
Comment 10 gnzlbg 2019-01-14 10:22:29 UTC
> But it constrains GCC in future, which leaving it unspecified does not.

Documenting whether GCC's C _Bool has the same valid and trap representations as the target platform's ABI specifies is a trade-off: it does have a cost as you said, but it also adds value.

The question is whether this trade-off is worth it. 

I am not a compiler expert, but using the same representation of _Bool as the platform's ABI allows GCC to avoid conversions on function arguments, return values, and when passing _Bools through memory. It appears to me that GCC would want to avoid doing these conversions anyways. An alternative here would be to, instead of guaranteeing this behavior, document the current behavior with a disclaimer that the behavior can change. So the cost of documenting this could be kept fairly small.

Value-wise, if I want to cast an array of char to an array of _Bool, this guarantee allows me to check whether doing so will introduce undefined behavior, which I think is very valuable. 

So from my pov, documenting current behavior without guaranteeing it has almost zero cost, and adds a lot of value.
Comment 11 Jonathan Wakely 2019-01-14 12:07:14 UTC
I disagree. Once it's documented, people will rely on it and scream if it changes. Caveats about something maybe changing in future don't help. If it's documented to behave one way today, people will depend on that.

It seems you already know what the behaviour is today, so how would documenting it but saying "this might change tomorrow!" help you? It tells you nothign you don't already know.
Comment 12 gnzlbg 2019-01-14 12:41:55 UTC
> I disagree. Once it's documented, people will rely on it and scream if it changes. Caveats about something maybe changing in future don't help. If it's documented to behave one way today, people will depend on that.

That's fair.

> It seems you already know what the behaviour is today

If you tell me that my thoughts about how this currently works are correct then that documents current behavior, and my code will depend on this.

> so how would documenting it but saying "this might change tomorrow!" help you? It tells you nothign you don't already know.

If this was documented somewhere for a particular version of GCC, when my code is compiled with that particular GCC version, I could check inputs for invalid _Bools in my programs and abort reliably without triggering undefined behavior. 

If this is not documented anywhere, I can at best write code that "maybe aborts or maybe has undefined behavior". I find the difference very significant.
Comment 13 Andrew Pinski 2020-12-08 00:31:25 UTC
*** Bug 98190 has been marked as a duplicate of this bug. ***