97884 – INT_MIN falsely expanded to 64 bit

Bug 97884 - INT_MIN falsely expanded to 64 bit

Summary: INT_MIN falsely expanded to 64 bit

Status:	RESOLVED INVALID

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	c (show other bugs)
Version:	10.2.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2020-11-18 07:08 UTC by s.bauroth
Modified:	2020-11-18 18:06 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
source file (98 bytes, text/x-csrc) 2020-11-18 07:08 UTC, s.bauroth	Details
preprocessed source (4.98 KB, text/plain) 2020-11-18 07:08 UTC, s.bauroth	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description s.bauroth 2020-11-18 07:08:23 UTC

Created attachment 49583 [details]
source file

arm-none-eabi-gcc: when compiling

printf("%i\n", -2147483648);
printf("%i\n", (int)-2147483648);

INT_MIN in the first call gets recognized as a 64 bit argument and split across r2 and r3. r1 remains untouched. In the second call, INT_MIN is correctly put into r1:

   8:	e3a02102 	mov	r2, #-2147483648	; 0x80000000
   c:	e3e03000 	mvn	r3, #0
  10:	e59f0010 	ldr	r0, [pc, #16]	        ; 28 <start_kernel+0x28>
  14:	ebfffffe 	bl	0 <printf>
  18:	e3a01102 	mov	r1, #-2147483648	; 0x80000000
  1c:	e59f0004 	ldr	r0, [pc, #4]	        ; 28 <start_kernel+0x28>
  20:	ebfffffe 	bl	0 <printf>

Source file attached was compiled with
arm-none-eabi-gcc -v -save-temps -c start.c -o start.o

gcc 10.2.0 was configured with
--target=arm-none-eabi --prefix=$(PREFIX) --enable-interwork --enable-languages="c" --with-newlib --without-headers --disable-nls

Comment 1 s.bauroth 2020-11-18 07:08:52 UTC

Created attachment 49584 [details]
preprocessed source

Comment 2 Richard Biener 2020-11-18 08:25:03 UTC

You need to write -2147483647 - 1 to make it 'int', -2147483648 are two tokens and the second is too large for int.

Comment 3 Martin Sebor 2020-11-18 16:04:00 UTC

Compiling with -Wall should issue a warning pointing out the problem:

$ cat pr97884.c && gcc -O2 -S -Wall pr97884.c
void f (void)
{
  __builtin_printf ("%i\n", -2147483648);
  __builtin_printf ("%i\n", (int)-2147483648);
}


pr97884.c: In function 'f':
pr97884.c:3:23: warning: format '%i' expects argument of type 'int', but argument 2 has type 'long long int' [-Wformat=]
    3 |   __builtin_printf ("%i\n", -2147483648);
      |                      ~^     ~~~~~~~~~~~
      |                       |     |
      |                       int   long long int
      |                      %lli

Comment 4 s.bauroth 2020-11-18 16:23:30 UTC

I am aware of the warning, I disagree with it's content. INT_MIN is an int, not a long long int. I understand why it is processed as a long long int internally, but that should not be visible from the outside world, at least imho.

Comment 5 Jakub Jelinek 2020-11-18 16:29:36 UTC

But INT_MIN in C is (-2147483647 - 1) for 32-bit ints, not -2147483648, as has been explained, there is a significant difference for those two, because 2147483648 constant is not representable in int and therefore the constant gets a different type and the negation preserves that type.
If something defines INT_MIN to -2147483648, the bug is in whatever defines it that way.

Comment 6 Jonathan Wakely 2020-11-18 16:40:10 UTC

The C standard says "The type of an integer constant is the first of the corresponding list in which its value can be represented." The corresponding list for decimal constants with no suffix is int, long int, long long int.

Since 2147483647 doesn't fit in int, it will have a larger type (long long int for your target). The unary - operator changes the value, but it still has type long long int.

If you use INT_MIN from <limits.h> it will work, because that's defined as (-INT_MAX - 1) as Richard and Jakub said. It's defined like that for a good reason.

Comment 7 s.bauroth 2020-11-18 16:51:07 UTC

I do understand that +2147483648 is not an int. I am aware of how the 2s complement works. It seems to me the reason for INT_MIN being '(-2147483647 - 1)' instead of the mathematically equivalent '-2147483648' is the parser tokenizing the absolute value of the literal split from the sign of the literal. I'm also able to imagine why that eases parsing. But if splitting absolute value and sign - why not treat the absolute value as unsigned? Or maybe do a check 'in the end' (I have no knowledge of the codebase here...) whether one can reduce the size of the literal again?
The fact is INT_MIN and '-2147483648' are both integers perfectly representable in 32 bits. I understand why gcc treats the second one differently (and clang does too) - I just think it's not right (or expectable for that matter). And if it's right, gcc should maybe warn about a 32bit literal being expanded to a larger type - not only in format strings.

> The type of an integer constant is the first of the corresponding list in which its value can be represented.
These kind of sentences make me think gcc's behaviour is wrong. The number can be represented in 32 bits.

Comment 8 Jakub Jelinek 2020-11-18 16:57:30 UTC

If you design your own programming language, you can define it whatever way you want, but for C and C++ it is well defined how the compiler must behave in these cases, that -2147483648 are two separate tokens, unary minus and an integral constant.

Comment 9 Jonathan Wakely 2020-11-18 17:03:34 UTC

(In reply to s.bauroth from comment #7)
> > The type of an integer constant is the first of the corresponding list
> > in which its value can be represented.
> These kind of sentences make me think gcc's behaviour is wrong. The number
> can be represented in 32 bits.

No it can't.

A few paragraphs above the text I quoted last time it says:L

"An integer constant begins with a digit, but has no period or exponent part. It may have a prefix that specifies its base and a suffix that specifies its type."

So the - sign is not part of the constant. The constant is 2147483648 and that doesn't fit in 32 bits. So it's a 64-bit type, and then that gets negated.

That has been explained several times now.

"I don't understand C and I won't read the spec" is not a GCC bug.

Comment 10 s.bauroth 2020-11-18 17:24:48 UTC

(In reply to Jonathan Wakely from comment #9)
> (In reply to s.bauroth from comment #7)
> > > The type of an integer constant is the first of the corresponding list
> > > in which its value can be represented.
> > These kind of sentences make me think gcc's behaviour is wrong. The number
> > can be represented in 32 bits.

> So the - sign is not part of the constant. The constant is 2147483648 and
> that doesn't fit in 32 bits. So it's a 64-bit type, and then that gets
> negated.
If the constant is not allowed to have a sign why try to press it into a signed type? I know it's the standard that does it - but does it make any sense?

> That has been explained several times now.
And I said multiple times that I understand the reasoning.

> "I don't understand C and I won't read the spec" is not a GCC bug.
Not going to comment.

I'm not questioning your reading of the standard, I'm not saying gcc breaks the standard. From a programmers perspective it's not intuitive that 'INT_MIN', '-2147483647 - 1' and all the other forms (like int a = -2147483648; printf("%i", a);' work, but '-2147483648' does not. And from a technical perspective it is absolutely unnecessary. Despite what the standard says about how to tokenize the number , it fits in 32 bits. It just does.

Comment 11 Andreas Schwab 2020-11-18 18:06:28 UTC

2147483648 does not fit in 32 bits.