Bug 60747 - the tree-vrp compilation flag produce wrong assembly
Summary: the tree-vrp compilation flag produce wrong assembly
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 4.4.7
: P3 major
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-03 13:59 UTC by tech
Modified: 2014-04-04 09:35 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Assembly of O2 no tree-vrp (627 bytes, text/plain)
2014-04-03 14:44 UTC, tech
Details
Assembly of O2 WITH tree-vrp (662 bytes, text/plain)
2014-04-03 14:46 UTC, tech
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tech 2014-04-03 13:59:33 UTC
Hi

At optimization level O2 the results of return value of short if are miss calculated i.e wrong results. see example 

Solution:
Don't use the tree-vrp compilation flag i.e add -fno-tree-vrp


Example code:
#include <stdio.h>
#include <string.h>

int hash_it(const char* d, const int prime) __attribute__((noinline));
int hash_it(const char* d, const int prime)
{
      int n = strlen(d);
      int h = n; 
      
      for (int i = 0; i < n; i++, d++)
          h = (777*h) + (*d);
      
      return ((h >= 0) ? (h % prime) : (-h % prime)); 
}

int main ()
{
	int lret = 0;
	const char* lsdata = "ABCD";
	lret = hash_it(lsdata, 10);
	printf("%d, %s\n", lret, lsdata);

	return 0;
}

Results:
correct result (i.e NO tree vrp) :
$ g++ -O2 -fno-tree-vrp main.cpp -o main_2_no_tree_vrp
$ ./main_2_no_tree_vrp
2, ABCD

incorrect result (i.e tree vrp in use) : 
$ g++ -O2 main.cpp -o main_o2_tree_vrp
$ ./main_o2_tree_vrp
-2, ABCD


Version details:
$ cat /etc/*-release
CentOS release 6.5 (Final)
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
CentOS release 6.5 (Final)
CentOS release 6.5 (Final)

$ uname -a
Linux mastercos6 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

$ gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
Comment 1 Markus Trippelsdorf 2014-04-03 14:09:27 UTC
"h = (777*h) + (*d);" will overflow, therefore your testcase is invalid.
Comment 2 tech 2014-04-03 14:44:17 UTC
Created attachment 32533 [details]
Assembly of O2 no tree-vrp

Assembly of O2 no tree-vrp

	.file	"main.cpp"
	.text
	.p2align 4,,15
.globl _Z7hash_itPKci
	.type	_Z7hash_itPKci, @function
_Z7hash_itPKci:
.LFB26:
	.cfi_startproc
	.cfi_personality 0x3,__gxx_personality_v0
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movl	%esi, %ebp
	pushq	%rbx
	.cfi_def_cfa_offset 24
	.cfi_offset 3, -24
	movq	%rdi, %rbx
	subq	$8, %rsp
	.cfi_def_cfa_offset 32
	call	strlen
	testl	%eax, %eax
	movl	%eax, %edx
	jle	.L3
	leaq	1(%rbx), %rcx
	leal	-1(%rax), %esi
	leaq	(%rcx,%rsi), %rsi
	jmp	.L4
	.p2align 4,,10
	.p2align 3
.L11:
	addq	$1, %rcx
.L4:
	movsbl	(%rbx), %ebx
	imull	$777, %edx, %edx
	addl	%ebx, %edx
	cmpq	%rsi, %rcx
	movq	%rcx, %rbx
	jne	.L11
.L3:
	testl	%edx, %edx
	jns	.L9
	negl	%edx
.L9:
	movl	%edx, %eax
	sarl	$31, %edx
	addq	$8, %rsp
	.cfi_def_cfa_offset 24
	idivl	%ebp
	popq	%rbx
	.cfi_def_cfa_offset 16
	popq	%rbp
	.cfi_def_cfa_offset 8
	movl	%edx, %eax
	ret
Comment 3 tech 2014-04-03 14:46:04 UTC
Created attachment 32534 [details]
Assembly of O2 WITH tree-vrp

Assembly of O2 WITH tree-vrp

	.file	"main.cpp"
	.text
	.p2align 4,,15
.globl _Z7hash_itPKci
	.type	_Z7hash_itPKci, @function
_Z7hash_itPKci:
.LFB26:
	.cfi_startproc
	.cfi_personality 0x3,__gxx_personality_v0
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movl	%esi, %ebp
	pushq	%rbx
	.cfi_def_cfa_offset 24
	.cfi_offset 3, -24
	movq	%rdi, %rbx
	subq	$8, %rsp
	.cfi_def_cfa_offset 32
	call	strlen
	cmpl	$0, %eax
	movl	%eax, %edx
	jle	.L2
	leaq	1(%rbx), %rcx
	leal	-1(%rax), %esi
	leaq	(%rcx,%rsi), %rsi
	jmp	.L3
	.p2align 4,,10
	.p2align 3
.L11:
	addq	$1, %rcx
.L3:
	movsbl	(%rbx), %ebx
	imull	$777, %edx, %edx
	addl	%ebx, %edx
	cmpq	%rsi, %rcx
	movq	%rcx, %rbx
	jne	.L11
.L9:
	movl	%edx, %eax
	sarl	$31, %edx
	addq	$8, %rsp
	.cfi_remember_state
	.cfi_def_cfa_offset 24
	idivl	%ebp
	popq	%rbx
	.cfi_def_cfa_offset 16
	popq	%rbp
	.cfi_def_cfa_offset 8
	movl	%edx, %eax
	ret
.L2:
	.cfi_restore_state
	jge	.L12
	negl	%edx
	addq	$8, %rsp
	.cfi_remember_state
	.cfi_def_cfa_offset 24
	movl	%edx, %eax
	sarl	$31, %edx
	idivl	%ebp
	popq	%rbx
	.cfi_def_cfa_offset 16
	popq	%rbp
	.cfi_def_cfa_offset 8
	movl	%edx, %eax
	ret
.L12:
	.cfi_restore_state
	movl	%eax, %edx
	jmp	.L9
	.cfi_endproc
.LFE26:
	.size	_Z7hash_itPKci, .-_Z7hash_itPKci
Comment 4 tech 2014-04-03 14:51:26 UTC
This code is using the SAME INPUTS (hard coded) and generate 2 different results (one positive and the other is negative) depends on the optimization level.

It shouldn't happen even if the variable "h" will overflow.

if you take a look at the assembly you can see the the optimized code "ignore" the negative operation.
Comment 5 Andreas Schwab 2014-04-03 14:58:10 UTC
If you don't want undefined behaviour on overflow you need to use unsigned types.
Comment 6 tech 2014-04-03 15:09:04 UTC
We have already changed our code to use unsigned.
We agree the the variable should be unsigned.
What we don't agree on is the code should behave the same regardless to the optimization level.

Take care thanks for your time.

I have spent 3 days to find this, While converting a working code on AIX and Solaris to Linux.

I still think it is a gcc bug, take care.
Comment 7 Markus Trippelsdorf 2014-04-03 15:12:37 UTC
The upcoming 4.9 has "-fsanitize=undefined", that will help tosave your time
in the future.
Comment 8 Eric Botcazou 2014-04-04 09:35:56 UTC
> What we don't agree on is the code should behave the same regardless to the
> optimization level.

That's not doable, this would mean disabling optimizations at -O2 because there are not enabled at -O0.  You need to write valid C++ instead.