Hi At optimization level O2 the results of return value of short if are miss calculated i.e wrong results. see example Solution: Don't use the tree-vrp compilation flag i.e add -fno-tree-vrp Example code: #include <stdio.h> #include <string.h> int hash_it(const char* d, const int prime) __attribute__((noinline)); int hash_it(const char* d, const int prime) { int n = strlen(d); int h = n; for (int i = 0; i < n; i++, d++) h = (777*h) + (*d); return ((h >= 0) ? (h % prime) : (-h % prime)); } int main () { int lret = 0; const char* lsdata = "ABCD"; lret = hash_it(lsdata, 10); printf("%d, %s\n", lret, lsdata); return 0; } Results: correct result (i.e NO tree vrp) : $ g++ -O2 -fno-tree-vrp main.cpp -o main_2_no_tree_vrp $ ./main_2_no_tree_vrp 2, ABCD incorrect result (i.e tree vrp in use) : $ g++ -O2 main.cpp -o main_o2_tree_vrp $ ./main_o2_tree_vrp -2, ABCD Version details: $ cat /etc/*-release CentOS release 6.5 (Final) LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch CentOS release 6.5 (Final) CentOS release 6.5 (Final) $ uname -a Linux mastercos6 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux $ gcc -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
"h = (777*h) + (*d);" will overflow, therefore your testcase is invalid.
Created attachment 32533 [details] Assembly of O2 no tree-vrp Assembly of O2 no tree-vrp .file "main.cpp" .text .p2align 4,,15 .globl _Z7hash_itPKci .type _Z7hash_itPKci, @function _Z7hash_itPKci: .LFB26: .cfi_startproc .cfi_personality 0x3,__gxx_personality_v0 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movl %esi, %ebp pushq %rbx .cfi_def_cfa_offset 24 .cfi_offset 3, -24 movq %rdi, %rbx subq $8, %rsp .cfi_def_cfa_offset 32 call strlen testl %eax, %eax movl %eax, %edx jle .L3 leaq 1(%rbx), %rcx leal -1(%rax), %esi leaq (%rcx,%rsi), %rsi jmp .L4 .p2align 4,,10 .p2align 3 .L11: addq $1, %rcx .L4: movsbl (%rbx), %ebx imull $777, %edx, %edx addl %ebx, %edx cmpq %rsi, %rcx movq %rcx, %rbx jne .L11 .L3: testl %edx, %edx jns .L9 negl %edx .L9: movl %edx, %eax sarl $31, %edx addq $8, %rsp .cfi_def_cfa_offset 24 idivl %ebp popq %rbx .cfi_def_cfa_offset 16 popq %rbp .cfi_def_cfa_offset 8 movl %edx, %eax ret
Created attachment 32534 [details] Assembly of O2 WITH tree-vrp Assembly of O2 WITH tree-vrp .file "main.cpp" .text .p2align 4,,15 .globl _Z7hash_itPKci .type _Z7hash_itPKci, @function _Z7hash_itPKci: .LFB26: .cfi_startproc .cfi_personality 0x3,__gxx_personality_v0 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movl %esi, %ebp pushq %rbx .cfi_def_cfa_offset 24 .cfi_offset 3, -24 movq %rdi, %rbx subq $8, %rsp .cfi_def_cfa_offset 32 call strlen cmpl $0, %eax movl %eax, %edx jle .L2 leaq 1(%rbx), %rcx leal -1(%rax), %esi leaq (%rcx,%rsi), %rsi jmp .L3 .p2align 4,,10 .p2align 3 .L11: addq $1, %rcx .L3: movsbl (%rbx), %ebx imull $777, %edx, %edx addl %ebx, %edx cmpq %rsi, %rcx movq %rcx, %rbx jne .L11 .L9: movl %edx, %eax sarl $31, %edx addq $8, %rsp .cfi_remember_state .cfi_def_cfa_offset 24 idivl %ebp popq %rbx .cfi_def_cfa_offset 16 popq %rbp .cfi_def_cfa_offset 8 movl %edx, %eax ret .L2: .cfi_restore_state jge .L12 negl %edx addq $8, %rsp .cfi_remember_state .cfi_def_cfa_offset 24 movl %edx, %eax sarl $31, %edx idivl %ebp popq %rbx .cfi_def_cfa_offset 16 popq %rbp .cfi_def_cfa_offset 8 movl %edx, %eax ret .L12: .cfi_restore_state movl %eax, %edx jmp .L9 .cfi_endproc .LFE26: .size _Z7hash_itPKci, .-_Z7hash_itPKci
This code is using the SAME INPUTS (hard coded) and generate 2 different results (one positive and the other is negative) depends on the optimization level. It shouldn't happen even if the variable "h" will overflow. if you take a look at the assembly you can see the the optimized code "ignore" the negative operation.
If you don't want undefined behaviour on overflow you need to use unsigned types.
We have already changed our code to use unsigned. We agree the the variable should be unsigned. What we don't agree on is the code should behave the same regardless to the optimization level. Take care thanks for your time. I have spent 3 days to find this, While converting a working code on AIX and Solaris to Linux. I still think it is a gcc bug, take care.
The upcoming 4.9 has "-fsanitize=undefined", that will help tosave your time in the future.
> What we don't agree on is the code should behave the same regardless to the > optimization level. That's not doable, this would mean disabling optimizations at -O2 because there are not enabled at -O0. You need to write valid C++ instead.