Bug 108580 - gcc treats shifts as signed operation, does wrong promotion
Summary: gcc treats shifts as signed operation, does wrong promotion
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 12.2.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-28 07:42 UTC by postmaster
Modified: 2023-01-28 15:36 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description postmaster 2023-01-28 07:42:06 UTC
I have a simple program that fails to compile correctly on any common compiler:

int main()
{
   int bits = 8;
   char* a = (char*)malloc(1 << bits);
   char* b = (char*)malloc(1 << bits);
   memcpy(b, a, 1 << bits);
   return 0;
}

when assembled with "gcc -S", the result is

main:
.LFB6:
        .cfi_startproc
        endbr64
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $32, %rsp
        movl    $8, -20(%rbp)
        movl    -20(%rbp), %eax
        movl    $1, %edx
        movl    %eax, %ecx
        sall    %cl, %edx
        movl    %edx, %eax
        cltq
        movq    %rax, %rdi
        call    malloc@PLT
        movq    %rax, -16(%rbp)
        movl    -20(%rbp), %eax
        movl    $1, %edx
        movl    %eax, %ecx
        sall    %cl, %edx
        movl    %edx, %eax
        cltq
        movq    %rax, %rdi
        call    malloc@PLT
        movq    %rax, -8(%rbp)
        movl    -20(%rbp), %eax
        movl    $1, %edx
        movl    %eax, %ecx
        sall    %cl, %edx
        movl    %edx, %eax
        movslq  %eax, %rdx
        movq    -16(%rbp), %rcx
        movq    -8(%rbp), %rax
        movq    %rcx, %rsi
        movq    %rax, %rdi
        call    memcpy@PLT
        movl    $0, %eax
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc

The part that is incorrect is:

        sall    %cl, %edx
        movl    %edx, %eax
        cltq
        movq    %rax, %rdi

It should zero-extend before the shift, but instead it sign-extends after the shift... Bit shifting is always unsigned operation. It correctly determines the function requires 64-bit parameter, but fails to determine it's unsigned. Integer promotion rules say that unsigned type in expression must be promoted to larger unsigned type if it can hold the result. As bit shift is unsigned operation, the temporary should also be unsigned.

Stock gcc headers don't have UINTPTR_C() macro which could be used to explicitly cast the constant "1" to pointer size to give hint that the shift is indeed unsigned operation.

gcc version is: gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0
Comment 1 Andrew Pinski 2023-01-28 07:53:34 UTC
1 << bits is done as an int and then that is casted to size_t (which in this case is unsigned long). That still does a sign extend.

>[left] Bit shifting is always unsigned operation.
No it is not, it might be zero filling and even shifting into the signed bit might be undefined behavior depending on the C or C++ standard version you compile to.


> Integer promotion rules say that unsigned type in expression must be promoted to larger unsigned type if it can hold the result.

No it does not say that.
What it says is if the type is smaller than int, it will promote to int. NOTE int here is signed.
That is:

unsigned short t = 1;
unsigned long tt = t << 2;
is really:
unsigned short t = 1;
int tmp = (int)t;
int tmp2 = tmp << 2;
unsigned long tt = (unsigned long)tmp2;
Comment 2 postmaster 2023-01-28 08:04:25 UTC
If I try to shift to highest bit of signed type, the compiler will reject the code and that is correct behaviour. The point here is that left-hand side of the shift operation is by default same size as "int", as in 32 bits, which means it can't be promoted to "int" again. 

This behaviour is same with gcc, clang and Visual C++, but Visual C++ correctly gives warning that the code is ambiguous (exact message is "Arithmetic overflow"), however it's also C++ compiler, which might validate the code with C++ rules, not C.
Comment 3 Andrew Pinski 2023-01-28 08:09:40 UTC
(In reply to postmaster from comment #2)
> If I try to shift to highest bit of signed type, the compiler will reject
> the code and that is correct behaviour. The point here is that left-hand
> side of the shift operation is by default same size as "int", as in 32 bits,
> which means it can't be promoted to "int" again. 

You are mixing up different things here.

In the original code, it was not promoting to int. malloc takes size_t as an argument ....
Comment 4 postmaster 2023-01-28 08:55:41 UTC
I'm not mixing things... The assembly code clearly says it's using 32-bit shift. Both with 32-bit and 64-bit architectures by default left-hand side of shift operation is 32 bits (EAX instead of RAX) and right-hand size is 8 bits (CL instead of CX, ECX or RCX). 

Using "1U << bits" to explicitly force unsigned 32-bit shift would be incorrect code. "(size_t)1 << bits", which is also "incorrect" code, would surprisingly result in correct code generation with both 32-bit and 64-bit targets.

Result of any left shift involving negative numbers, including left-shifting non-zero bit to highest bit of signed integer, is undefined.
Comment 5 Andrew Pinski 2023-01-28 09:21:42 UTC
You are still confused as the assembly code is correct.

Let's start over here.
Take:
   char* a = (char*)malloc(1 << bits);

1 << bits is done in int type as the literal 1 has the type of int (because that is the definition of it without any suffix) and there is no promption going on as 1 is already an int type.
so 1 << bits is done in 32bits (as x86_64 is LP64I32 [linux/elf] Or LLP64IL32 [windows] target and x86 is a ILP32 target).
And then it gets casted to size_t as it is an argument to malloc. This casting is a sign extend from 32bit to 64bit (on x86_64 as size_t is unsigned 64bit type, unsigned long on Linux and unsigned long long on Windows) as defined on by the C standard and https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Integers-implementation.html#Integers-implementation .


Also since this is -O0 the expressions are done without optimizations and you get extra load and stores to the stack so bits is on the stack.

If you want the original expression 1<<bits done in unsigned 64bits, use either 1ul<<bits or 1ull<<bits (or even ((size_t)1)<<bits or ((intptr_t)1)<<bits ).

There is still no bug with the compilers you tried, you missed that 1 is of type int.
Comment 6 postmaster 2023-01-28 12:21:15 UTC
There is wrong assumption again... Literal "1" is always unsigned as there is no implicit signed literals, even though there is explicit signed literals... When somebody writes "-1" it is treated as expression "0 - 1", not a literal "negative one"... This is because subtract operator has higher precedence. Empty literal always equals to literal "0".
Comment 7 Andrew Pinski 2023-01-28 13:22:05 UTC
Wrong again.
In c, 1 is int, if you want unsigned int type use 1u.

This forum is not to learn C and it is obvious you don't know the basics of the language.
Comment 8 postmaster 2023-01-28 15:36:45 UTC
I know enough C that you can't write code like:

int i = 0xFFFFFFFF;

This is not equal to:

int i = -1;

or

int i = (-1);


---
Largest literal you can assign to "int" is "0x7FFFFFFF". Any larger value must be either result of expression or another variable, otherwise it will result in "arithmetic" overflow warning.

Some literals and operations are inherently unsigned, no matter what generic rules say. As I already said, writing "1u << bits" would be incorrect, and strict-conforming or "pedantic" compiler would throw warning as the types don't match and implicit conversion doesn't happen with sizes larger than 32 bits. Type modifiers are otherwise case-insensitive, but don't support mixed-case.

C standard doesn't even mention anything about "size_t" or have type modifier for it. Even though printf() and alike support "%z", it is considered extension and will be rejected when using strict/pedantic mode.