Bug 49140 - [4.6 regression] wrong code with -O2 and -O3, not with -O3 -no-inline
Summary: [4.6 regression] wrong code with -O2 and -O3, not with -O3 -no-inline
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P3 major
Target Milestone: 4.7.0
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on: 49330
Blocks:
  Show dependency treegraph
 
Reported: 2011-05-24 13:08 UTC by Matthias Klose
Modified: 2013-04-12 16:31 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.4.6, 4.5.3, 4.7.0
Known to fail: 4.6.0, 4.6.1
Last reconfirmed: 2011-07-18 08:02:02


Attachments
preprocessed source (67.82 KB, application/x-gzip)
2011-05-24 13:08 UTC, Matthias Klose
Details
test case with Salsa20 in Crypto++ (1014 bytes, text/x-c++src)
2011-07-18 19:55 UTC, Sébastien Kunz-Jacques
Details
the preprocessed source of salsa20 from Crypto++ with gcc 4.5.1, option -O2 (105.19 KB, application/octet-stream)
2011-07-19 16:12 UTC, Sébastien Kunz-Jacques
Details
the preprocessed source of Salsa20 from Crypto++, with gcc 4.6.0, option -O2 (101.14 KB, application/octet-stream)
2011-07-19 16:14 UTC, Sébastien Kunz-Jacques
Details
full testcase source with required files from Crypto++ 5.6.1 and build command (83.91 KB, application/x-gzip)
2011-07-20 07:09 UTC, Sébastien Kunz-Jacques
Details
Small test case with invalid code exhibiting the problem (127 bytes, text/x-csrc)
2011-10-16 14:40 UTC, Mans Rullgard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Klose 2011-05-24 13:08:34 UTC
Created attachment 24342 [details]
preprocessed source

[forwarded from http://bugs.debian.org/627084]

seen with r174102 from the 4.6 branch, works with r173903 from the trunk


Step to reproduce:

wget 'http://pari.math.u-bordeaux.fr/~bill/pari-2.4.3.12000.tar.gz'
tar xf pari-2.4.3.12000.tar.gz
cd pari-2.4.3.alpha
./Configure
make gp
make bench

Result: all test suite fail.

Cause: 

The function pari_init_parser() in the file src/language/parsec.h is miscompiled.
(This file is included by src/language/parse.y).

If you replace the line 43: s_node.n=OPnboperator; by parsestate_reset(); 
(which does the same thing), then all test pass.

It seems that the issue is that the function stack_alloc() is not inlined correctly,
which cause pari_tree to be NULL (or maybe the call to pari_inline inside stack_alloc()
is not inlined correctly.

The command line used is
gcc-4.6  -c -O3 -Wall -fno-strict-aliasing -fomit-frame-pointer    -I. -I../src/headers -fPIC -o
parse.o ../src/language/parse.c

It also happens with -O2, but not with -O3 -fno-inline.

It works fine with gcc 4.3, 4.4 and 4.5.
Comment 1 Richard Biener 2011-05-24 14:53:29 UTC
stack_init computes &pari_tree - &s_node which is undefined, stack_alloc
then re-computes one via stack_base.  That's broken as well.

Not sure if this eventually causes the issue, but certainly the code is
full of C implementation details that you can't capture in standard C.
Comment 2 Richard Biener 2011-06-12 12:41:59 UTC
Maybe related to PR49330.
Comment 3 H.J. Lu 2011-06-12 14:28:32 UTC
It is triggered by revision 158045:

http://gcc.gnu.org/ml/gcc-cvs/2010-04/msg00148.html
Comment 4 Jakub Jelinek 2011-06-27 12:32:48 UTC
GCC 4.6.1 is being released.
Comment 5 Sébastien Kunz-Jacques 2011-07-14 22:09:11 UTC
(In reply to comment #4)
> GCC 4.6.1 is being released.

I see a similar bug with both gcc 4.6.0 and 4.6.1. In the library crypto++ (v. 5.6.1, http://www.cryptopp.com/), the algorithm Salsa20 produces wrong outputs when compiling with -O, -O2, -O3, but the bug dissapears as soon as -fno-inline is added. The output is always correct with gcc 4.5.x. These results were obtained on a 64-bit linux platform (ubuntu 11.04). 

I cannot post the source code that produces the errors; I do not have a reduced test case.
Comment 6 Richard Biener 2011-07-18 08:02:02 UTC
This PR lacks an executable testcase for easy verification of the bug.

Thus, can people try with the fix for PR49651 installed?
Comment 7 Matthias Klose 2011-07-18 10:19:28 UTC
the pari tests still fail
Comment 8 Sébastien Kunz-Jacques 2011-07-18 19:55:29 UTC
Created attachment 24790 [details]
test case with Salsa20 in Crypto++
Comment 9 Sébastien Kunz-Jacques 2011-07-18 19:59:35 UTC
(In reply to comment #8)
> Created attachment 24790 [details]
> test case with Salsa20 in Crypto++

Sorry about my partial comment. Used the test case on source of gcc 4.6.1 + patch for PR49651 (applied the patch as found at http://gcc.gnu.org/viewcvs?view=revision&revision=176274 to tree-ssa-structalias.c), still does not work.
Comment 10 Richard Biener 2011-07-19 08:05:42 UTC
Can you attach preprocessed source of the Salsa20 testcase please?
Comment 11 Sébastien Kunz-Jacques 2011-07-19 16:12:14 UTC
Created attachment 24793 [details]
the preprocessed source of salsa20 from Crypto++ with gcc 4.5.1, option -O2
Comment 12 Sébastien Kunz-Jacques 2011-07-19 16:14:29 UTC
Created attachment 24794 [details]
the preprocessed source of Salsa20 from Crypto++, with gcc 4.6.0, option -O2
Comment 13 Sébastien Kunz-Jacques 2011-07-20 06:07:39 UTC
(In reply to comment #12)
> Created attachment 24794 [details]
> the preprocessed source of Salsa20 from Crypto++, with gcc 4.6.0, option -O2

I just discovered that the bug is present only when crypto++ is compiled with NDEBUG defined, which is not the case in the preprocessed files above. I will re-post updated files (output of the whole compilation of test case with -save-temps).
Comment 14 Sébastien Kunz-Jacques 2011-07-20 07:09:53 UTC
Created attachment 24796 [details]
full testcase source with required files from Crypto++ 5.6.1 and build command

the (slightly modified) testcase with Crypto++ 5.6.1, this time self-contained. All files except gcc_pr49140.cpp are unmodified form Crypto++.
build command is in build.sh, with option -save-temps.
Comment 15 rguenther@suse.de 2011-07-20 08:44:33 UTC
On Tue, 19 Jul 2011, grokbrsm at free dot fr wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49140
> 
> --- Comment #12 from Sébastien Kunz-Jacques <grokbrsm at free dot fr> 2011-07-19 16:14:29 UTC ---
> Created attachment 24794 [details]
>   --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24794
> the preprocessed source of Salsa20 from Crypto++, with gcc 4.6.0, option -O2

Hm, unfortunately these don't seem to be self-contained (they fail to
link).
Comment 16 Richard Biener 2011-07-20 09:22:45 UTC
Confirmed.  Works with -O0, fails with -O[12] at least.  Still fails on the
4.6 branch.

Compiling salsa.cpp with -O1 is enough to trigger the error, compiling
salsa.cpp with -O0 is enough to mitigate it.
Comment 17 Sébastien Kunz-Jacques 2011-07-20 10:09:54 UTC
(In reply to comment #16)
> Confirmed.  Works with -O0, fails with -O[12] at least.  Still fails on the
> 4.6 branch.
> 
> Compiling salsa.cpp with -O1 is enough to trigger the error, compiling
> salsa.cpp with -O0 is enough to mitigate it.

yes, the wrong code is most probably generated in method Salsa20_Policy::OperateKeystream of salsa.cpp.
Comment 18 Jakub Jelinek 2011-07-20 17:26:11 UTC
The inline asm in that function is invalid:
   :
   : "r" (m_rounds), "r" (input), "r" (iterationCount), "r" (m_state.data()), "r" (output), "r" (workspace.m_ptr)
   : "%eax", "%edx", "memory", "cc", "%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5", "%xmm6", "%xmm7", "%xmm8", "%xmm9", "%xmm10", "%xmm11",

It tells the compiler that it only uses the 6 input registers, while it modifies 3 of them, e.g. the asm string contains:
"add %1" ", " "1*16" ";"
"sub %2" ", " "4" ";"
"add %4" ", " "1*16" ";"

GCC can assume that it will find the old content in the register after the inline asm and will find there something completely different.
For the inputs that are clobbered, the pattern should use something like:
void *dummy1;
... asm volatile ("..." : "=r" (dummy1) : "0" (input_value));
to say that it can't be used.
Comment 19 Jakub Jelinek 2011-07-20 17:39:11 UTC
And that isn't the only bug in it, the inline asm also performs calls (which modify the bytes right below the stack), but when this function is compiled with -O1 and above (and not without -fno-inline), the function makes no function calls, therefore it happily uses red-zone and the embedded calls in the inline asm clobber the red-zone.  Adding "sub rsp, 128;" and "add rsp, 128;" around the
whole inline asm content well, inside of the intel syntax, fixes the testcase (of course, as written in the previous comment, it is still broken).
Comment 20 Jakub Jelinek 2011-07-20 17:43:36 UTC
Ah, the crypto++ comments were just hijacking an unrelated bug for which no details have been provided.  Please don't do this.
Comment 21 Sébastien Kunz-Jacques 2011-07-20 21:23:42 UTC
(In reply to comment #20)
> Ah, the crypto++ comments were just hijacking an unrelated bug for which no
> details have been provided.  Please don't do this.

Well, the symptoms looked similar: the error also dissapears with -fno-inline, so that it matches the bug report header pretty well. I'll file a bug report for Crypto++.
Comment 22 Richard Biener 2011-08-01 14:36:43 UTC
Waiting for a testcase.
Comment 23 Mans Rullgard 2011-10-16 14:40:29 UTC
Created attachment 25516 [details]
Small test case with invalid code exhibiting the problem

Here's a small test case with invalid code showing the problem with several gcc versions going back at least to 4.5.  Compiling with -fno-tree-pta makes it behave as "expected".

I do not believe the compiler to be at fault here.  PARI is clearly full of undefined behaviours they really ought to fix rather than complain that doing so would change the ABI and blame the compiler which is only doing a good job following the spec.
Comment 24 Jakub Jelinek 2011-10-26 17:13:34 UTC
GCC 4.6.2 is being released.
Comment 25 Jakub Jelinek 2012-03-01 14:38:25 UTC
GCC 4.6.3 is being released.
Comment 26 Jakub Jelinek 2013-04-12 16:31:36 UTC
The 4.6 branch has been closed, fixed in GCC 4.7.0.