Bug 24756 - pointer arithmetic on ia32 uses signed divide
Summary: pointer arithmetic on ia32 uses signed divide
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 2.96 (redhat)
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-09 14:51 UTC by Jon-Paul Sullivan
Modified: 2005-11-10 14:09 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Test source (.i) (190 bytes, text/plain)
2005-11-09 14:51 UTC, Jon-Paul Sullivan
Details
More correct test program (3.57 KB, text/plain)
2005-11-10 13:54 UTC, Jon-Paul Sullivan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jon-Paul Sullivan 2005-11-09 14:51:06 UTC
By using a signed divide for pointer arithmetic an incorerct value can be obtained given sufficient distance between two pointers.

I have tested this on gcc 3.4 (RedHat EL4 update 1) and the same behaviour persists.

# gcc -v -save-temps -Wall -o test ./test.c
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.2 2.96-118.7.2)
 /usr/lib/gcc-lib/i386-redhat-linux/2.96/cpp0 -lang-c -v -D__GNUC__=2 -D__GNUC_MINOR__=96 -D__GNUC_PATCHLEVEL__=0 -D__ELF__ -Dunix -Dlinux -D__ELF__ -D__unix__ -D__linux__ -D__unix -D__linux -Asystem(posix) -D__NO_INLINE__ -Wall -Acpu(i386) -Amachine(i386) -Di386 -D__i386 -D__i386__ -D__tune_i386__ ./test.c test.i
GNU CPP version 2.96 20000731 (Red Hat Linux 7.2 2.96-118.7.2) (cpplib) (i386 Linux/ELF)
ignoring nonexistent directory "/usr/i386-redhat-linux/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/lib/gcc-lib/i386-redhat-linux/2.96/include
 /usr/include
End of search list.
 /usr/lib/gcc-lib/i386-redhat-linux/2.96/cc1 test.i -quiet -dumpbase test.c -Wall -version -o test.s
GNU C version 2.96 20000731 (Red Hat Linux 7.2 2.96-118.7.2) (i386-redhat-linux) compiled by GNU C version 2.96 20000731 (Red Hat Linux 7.2 2.96-118.7.2).
./test.c:1: warning: initialization makes pointer from integer without a cast
./test.c:2: warning: initialization makes pointer from integer without a cast
./test.c:5: warning: return type defaults to `int'
./test.c: In function `main':
./test.c:6: warning: implicit declaration of function `printf'
./test.c:6: warning: unsigned int format, pointer arg (arg 2)
./test.c:6: warning: unsigned int format, pointer arg (arg 3)
./test.c:9: warning: unsigned int format, long unsigned int arg (arg 3)
./test.c:11: warning: control reaches end of non-void function
 as -V -Qy -o test.o test.s
GNU assembler version 2.11.90.0.8 (i386-redhat-linux) using BFD version 2.11.90.0.8
 /usr/lib/gcc-lib/i386-redhat-linux/2.96/collect2 -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -o test /usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crt1.o /usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crti.o /usr/lib/gcc-lib/i386-redhat-linux/2.96/crtbegin.o -L/usr/lib/gcc-lib/i386-redhat-linux/2.96 -L/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../.. test.o -lgcc -lc -lgcc /usr/lib/gcc-lib/i386-redhat-linux/2.96/crtend.o /usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crtn.o



# ./test 
 a = 0xcceb0000 b = 0x24100000
pointer         ( a - b )                                               0xea36c000
non-pointer     ( ( ((unsigned long)a) - ((unsigned long)b) ) / 4)      0x2a36c000

Assembler:
	.file	"test.c"
	.version	"01.01"
gcc2_compiled.:
.globl number
.data
	.align 4
	.type	 number,@object
	.size	 number,4
number:
	.long	-857014272
.globl mem_map
	.align 4
	.type	 mem_map,@object
	.size	 mem_map,4
mem_map:
	.long	605028352
		.section	.rodata
.LC0:
	.string	" a = %#x b = %#x\n"
	.align 32
.LC1:
	.string	"pointer\t\t( a - b )\t\t\t\t\t\t%#x\nnon-pointer\t( ( ((unsigned long)a) - ((unsigned long)b) ) / 4)\t%#x\n"
.text
	.align 4
.globl main
	.type	 main,@function
main:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$8, %esp
	subl	$4, %esp
	pushl	mem_map
	pushl	number
	pushl	$.LC0
	call	printf
	addl	$16, %esp
	subl	$4, %esp
	movl	mem_map, %edx
	movl	number, %eax
	subl	%edx, %eax
	shrl	$2, %eax             <==== UNSIGNED (manual)
	pushl	%eax
	movl	mem_map, %edx
	movl	number, %eax
	subl	%edx, %eax
	movl	%eax, %eax
	sarl	$2, %eax             <==== SIGNED  (pointer)
	pushl	%eax
	pushl	$.LC1
	call	printf
	addl	$16, %esp
	leave
	ret
.Lfe1:
	.size	 main,.Lfe1-main
	.ident	"GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.2 2.96-118.7.2)"
Comment 1 Jon-Paul Sullivan 2005-11-09 14:51:54 UTC
Created attachment 10186 [details]
Test source (.i)
Comment 2 Andrew Pinski 2005-11-09 15:19:21 UTC
Why do you think this is a bug.  The difference between &a[3]-&a[4] better be -1.  (where a is an array).
Comment 3 Richard Biener 2005-11-09 15:53:04 UTC
Note that obtaining the difference of pointers that don't point to the same
object is invoking undefined behavior, too.
Comment 4 Jon-Paul Sullivan 2005-11-09 17:25:55 UTC
The test case was a simple test case where I tried to show the mathematical behaviour in as simple a way as possible.

The reason I thought that this may be a bug is because the behaviour on a 64-bit system is different as no sign extension would occur during the divide operation, hence the two values would be the same (0x2a36c000) - and I've run the test program on an ia64 to prove this.

Given this fact the pointer arithmetic in this case is giving an answer that is wrong (0xea36c000), and the reason for this can clearly be shown to be sign-extension occurring when possibly it should not.

0xcceb0000 - 0x24100000
1010 1000 1101 1011 0000 0000 0000 0000
^   Highest order bit set

0x2a36c000
0010 1010 0011 0110 1100 0000 0000 0000
^^
0xea36c000
1110 1010 0011 0110 1100 0000 0000 0000
^^

I admit that I hadn't thought about the case of a larger - smaller pointer, so would agree that the fix isn't as simple as I thought, but that doesn't change the fact that the current answer given is incorrect for the test program I was using.

If you think the test program should be altered in any way to more correctly determine the behaviour but need a large memory system to run it on then I'll be more than happy to run it for you.
Comment 5 Andrew Pinski 2005-11-09 17:30:22 UTC
This is invalid because otherwise you get the incorrect answer for &a[3]-&a[4]:
 a = 0x8049668 b = 0x804966c
pointer         ( a - b )                                               0xffffffff
non-pointer     ( ( ((unsigned long)a) - ((unsigned long)b) ) / 4)      0x3fffffff
Comment 6 Jon-Paul Sullivan 2005-11-10 13:49:47 UTC
I was not intending to show a correct fix for the problem, my simple test program was merely intended to show that given sufficient distance between 2 pointers the result from ptr_a - ptr_b can be incorrect.  I am not surprised that it shows the incorrect result where the result relies upon the sign as it is performing an unsigned operation.

This is not an invalid bug as there is a real, if very rare, problem shown by this.

To put it in as simple and direct a form as possible:

Is this a correct answer, given that the 2 operands are the values of 2 pointers pointing to entities of 4 bytes?
0xcceb0000 - 0x24100000 = 0xea36c000

I believe that this is NOT a correct answer, and that is the basis of this bug, not any other behaviour that is shown by the simple test program.
Comment 7 Jon-Paul Sullivan 2005-11-10 13:54:20 UTC
Created attachment 10201 [details]
More correct test program

Here is a test program designed to show the problem more clearly.  When run the pointer arithmetic for the final printf will return an incorrect result.
Comment 8 Andrew Pinski 2005-11-10 13:58:15 UTC
If pointer1 and pointer2 are not in the same array (+-1 element), the behavior is undefined so what GCC is doing for your case is fine.
Comment 9 Jon-Paul Sullivan 2005-11-10 14:02:06 UTC
Sorry - I don't mean to pester you, I would just like to clarify your final point.

What you are saying is that:

Pointer arithmetic is only valid on elements of the same array that are adjacent, and that any other uses of pointer arithmetic produces undefined behaviour.
Comment 10 Andrew Pinski 2005-11-10 14:09:29 UTC
(In reply to comment #9)
> What you are saying is that:
> Pointer arithmetic is only valid on elements of the same array that are
> adjacent, and that any other uses of pointer arithmetic produces undefined
> behaviour.

YES, see also comment #3.