Bug 60429 - [4.7 Regression] Miscompilation (aliasing) with -finline-functions
Summary: [4.7 Regression] Miscompilation (aliasing) with -finline-functions
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.8.2
: P3 normal
Target Milestone: 4.8.3
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2014-03-05 13:12 UTC by Allan Jensen
Modified: 2014-06-12 13:36 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.8.3, 4.9.0
Known to fail: 4.7.3, 4.7.4, 4.8.2
Last reconfirmed: 2014-03-06 00:00:00


Attachments
qregion.cpp intermediate compiled with G++ 4.4 (working) (250.23 KB, application/x-gzip)
2014-03-05 13:13 UTC, Allan Jensen
Details
qregion.cpp intermediate compiled with gcc 4.8 (311.30 KB, application/x-gzip)
2014-03-05 13:17 UTC, Allan Jensen
Details
qregion.cpp assembler compiled with gcc 4.8 (807.80 KB, application/x-gzip)
2014-03-05 13:28 UTC, Allan Jensen
Details
qregion.cpp assembler compiled with gcc 4.4 (578.37 KB, application/x-gzip)
2014-03-05 13:29 UTC, Allan Jensen
Details
Reduced test (59.48 KB, text/plain)
2014-03-07 16:36 UTC, Allan Jensen
Details
Reduced test assembler (2.42 KB, text/plain)
2014-03-07 16:38 UTC, Allan Jensen
Details
reduced testcase (1.48 KB, text/plain)
2014-03-07 20:02 UTC, Markus Trippelsdorf
Details
reduced testcase (1.29 KB, text/plain)
2014-03-07 20:26 UTC, Markus Trippelsdorf
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Allan Jensen 2014-03-05 13:12:15 UTC
After recently trying to build Qt with -O3, I found one of our tests failing. After investigating I narrowed it down to qregion.cpp and the flag -finline-functions (using -O2 -finline-functions).

Specificially the inlining of the function loadAET() in "QRegionPrivate *PolygonRegion()" causes the problem. Adding __attribute__((noinline)) to loadAET() solves the problem. Interestingly compiling without -finline-function and just marking the loadAET as inline or always_inline, does not trigger the issue.

The code is not Qt specific and is used in a other projects, though it is most places compiled as C code.

Further testing shows the issue is also triggered in GCC 4.6, 4.7, and the latest version of 4.9 I had. It is however NOT present in GCC 4.4, so this is a regression.

Looking at the assembler and debug output, I believe the that the variable pAET that should have been reloaded after loadAET() has been eliminated. This might be what breaks the aliasing rules.
Comment 1 Allan Jensen 2014-03-05 13:13:47 UTC
Created attachment 32268 [details]
qregion.cpp intermediate compiled with G++ 4.4 (working)
Comment 2 Allan Jensen 2014-03-05 13:17:08 UTC
Created attachment 32269 [details]
qregion.cpp intermediate compiled with gcc 4.8
Comment 3 Allan Jensen 2014-03-05 13:28:47 UTC
Created attachment 32270 [details]
qregion.cpp assembler compiled with gcc 4.8
Comment 4 Allan Jensen 2014-03-05 13:29:52 UTC
Created attachment 32271 [details]
qregion.cpp assembler compiled with gcc 4.4
Comment 5 Richard Biener 2014-03-06 10:10:48 UTC
Can you identify the inlined call?  Is it

                if (pSLL && y == pSLL->scanline) {
                    loadAET(&AET, pSLL->edgelist);
                    pSLL = pSLL->next;
                }

or

                if (pSLL && y == pSLL->scanline) {
                    loadAET(&AET, pSLL->edgelist);
                    computeWAET(&AET);
                    pSLL = pSLL->next;
                }

?

Can you also show the correct and the invalid assembly snippet?

Not sure if it is possible, but having a "driver" around this (in
a separate compilation-unit) that calls PolygonRegion, verifying correct output for correct input
and thus showing the miscompile would be nice (so there is a runtime
testcase).  It doesn't matter if linking requires Qt.

I am somewhat lost in that large function ;)
Comment 6 Allan Jensen 2014-03-06 19:07:34 UTC
(In reply to Richard Biener from comment #5)
> Can you identify the inlined call?  Is it
> 
>                 if (pSLL && y == pSLL->scanline) {
>                     loadAET(&AET, pSLL->edgelist);
>                     pSLL = pSLL->next;
>                 }
> 
> or
> 
>                 if (pSLL && y == pSLL->scanline) {
>                     loadAET(&AET, pSLL->edgelist);
>                     computeWAET(&AET);
>                     pSLL = pSLL->next;
>                 }
> 
> ?
> 

It is the first one (at least that is the one I have triggered).

> Can you also show the correct and the invalid assembly snippet?

I haven't really been able to identify it except that between code from loadAET and the following while loop, there appears to be no code loading the variable pAET which controls the loop, it appears to be loaded before loadAET, even though loadAET may change it. 

Note that I have made a work-around for the issue by making AET a pointer to an 
EdgeTableEntry on the heap instead of an object on the stack. I believe somehow the aliasing between AET and the pointers in loadAET which may point to it is lost by GCC.

> 
> Not sure if it is possible, but having a "driver" around this (in
> a separate compilation-unit) that calls PolygonRegion, verifying correct
> output for correct input
> and thus showing the miscompile would be nice (so there is a runtime
> testcase).  It doesn't matter if linking requires Qt.
> 
> I am somewhat lost in that large function ;)

I posted like this in case the description would be enough to make someone know where to look. If you need to debug it and dig into it, I will try to make a proper reduced test case.
Comment 7 Richard Biener 2014-03-07 12:18:09 UTC
(In reply to Allan Jensen from comment #6)
> I posted like this in case the description would be enough to make someone
> know where to look. If you need to debug it and dig into it, I will try to
> make a proper reduced test case.

That would be nice.
Comment 8 Allan Jensen 2014-03-07 16:36:48 UTC
Created attachment 32303 [details]
Reduced test
Comment 9 Allan Jensen 2014-03-07 16:38:03 UTC
Created attachment 32304 [details]
Reduced test assembler
Comment 10 Allan Jensen 2014-03-07 16:38:42 UTC
I have uploaded a reduced test. Compiled with -O0 or -O1 it outputs 180, compiled with -O2 or higher it outputs 179.
Comment 11 Allan Jensen 2014-03-07 16:44:37 UTC
Note that to run it, it links against Qt5Core.
Comment 12 Andrew Pinski 2014-03-07 18:47:04 UTC
                tmpPtBlock->pts = reinterpret_cast<QPoint *>(tmpPtBlock->data);

Does this not violate C/C++ aliasing rules later on?

I think data should be char array with the alignment of QPoint instead of an int array.
Comment 13 Allan Jensen 2014-03-07 19:58:21 UTC
(In reply to Andrew Pinski from comment #12)
>                 tmpPtBlock->pts = reinterpret_cast<QPoint
> *>(tmpPtBlock->data);
> 
> Does this not violate C/C++ aliasing rules later on?
> 
> I think data should be char array with the alignment of QPoint instead of an
> int array.

True, the data array is also four times too big for 200 QPoints because of the int type. So fixing that would lower memory consumption, but I can't see how that would break anything, since the this is casting an uninitialized structure. And it doesn't seem to fit in with the symptoms.
Comment 14 Markus Trippelsdorf 2014-03-07 20:02:05 UTC
Created attachment 32305 [details]
reduced testcase

A bit more reduced (doesn't need external lib):

markus@x4 tmp % g++ -O2 test.ii && ./a.out
179
markus@x4 tmp % g++ -O1 test.ii && ./a.out
180
Comment 15 Markus Trippelsdorf 2014-03-07 20:26:22 UTC
Created attachment 32306 [details]
reduced testcase
Comment 16 Jakub Jelinek 2014-03-07 21:40:04 UTC
Started with r179799.
Comment 17 Richard Biener 2014-03-10 09:05:04 UTC
I'll have a look.
Comment 18 Richard Biener 2014-03-10 10:41:57 UTC
Seems to be a PTA issue:

InsertionSort_pETEchase.29_82, points-to vars: { }
InsertionSort_pETEchase.29_86, points-to non-local, points-to escaped, points-to vars: { }
p1_155, points-to NULL, points-to vars: { }

  <bb 30>:
  InsertionSort_pETEchase.29_82->next = p1_84;
  p1_155->next = InsertionSort_pETEchase.28_85;
  InsertionSort_pETEchase.29_86 = InsertionSort_pETEchase.28_85->back;
  InsertionSort_pETEchase.29_86->next = p1_155;
  p1_155->back = &InsertionSort_pETEchaseBackTMP;

taking _155 as example

  # p1_155 = PHI <_35(57), p1_58(60)>

_35, points-to NULL, points-to vars: { }
p1_58, points-to vars: { }

  _35 = MEM[(struct _EdgeTableEntry * *)&AET + 16B];

  # p1_58 = PHI <p1_84(59), p1_84(30), p1_87(34)>

  p1_84 = p1_155->next;
  p1_87 = p1_155->next;

so it's a cycle seeded only by MEM[(struct _EdgeTableEntry * *)&AET + 16B];

_35 = { NULL } same as AET.128+64

_35 = AET.128+64

  # p1_150 = PHI <&AET(47), p1_151(49)>
  # p1_151 = PHI <p1_19(47), p1_45(49)>
  p1_45 = p1_151->next;
  fn3_tmp = p1_45;
  p1_151->next = p1_47;
  p1_151->back = p1_150;

p1_150 = &AET.0+96
*p1_150 + 128 = p1_151


Note that valgrind shows loads of errors (with the reduced testcase at least)
that show invalid reads and writes even at -O0.  So we may just optimistically
optimize based on that undefined behavior.

At least I can't see anything wrong with what PTA derives ...
Comment 19 Markus Trippelsdorf 2014-03-10 11:27:00 UTC
Yes, looks like the reduced testcase is invalid and contains a few
buffer overflows.
Comment 20 Richard Biener 2014-03-10 11:28:58 UTC
As for what Andrew said, yes, the reinterpret_casts<> look bogus, you should
really change

typedef struct _POINTBLOCK {
    int data[200 * sizeof(QPoint)];
    QPoint *pts;
    struct _POINTBLOCK *next;
} POINTBLOCK;

to

typedef struct _POINTBLOCK {
    char data[200 * sizeof(QPoint) * sizeof (int)];
    QPoint *pts;
    struct _POINTBLOCK *next;
} POINTBLOCK;

but that doesn't change the outcome of the testcase.  The reduced testcase
requiring QtCore is valgrind clean for me.

The cause of the issue _is_ what tree PRE does to the function though.

+Replaced AET.next with prephitmp_4 in pPrevAET_44 = AET.next;

in PolygonRegion, with -O2 -fno-ipa-cp.  Still most of the pointers are
computed to point to noting by PTA.

Function calls left in that function after inlining are operator delete[], free, operator new, qBadAlloc and malloc calls.

--param max-fields-for-field-sensitive=0 fixes it as well, so it does point
at a PTA issue.  Still looking ...
Comment 21 Richard Biener 2014-03-10 13:18:33 UTC
AFAIK I can understand the reduced testcase AET is never written to anything
but the initial NULL pointers.  Neither CerateETandAET nor loadAET do anything
to the PolygonRegion local AET.

I have a fix (bah, this function needs a LOT of TLC!)

Index: gcc/tree-ssa-structalias.c
===================================================================
--- gcc/tree-ssa-structalias.c  (revision 208448)
+++ gcc/tree-ssa-structalias.c  (working copy)
@@ -3218,7 +3218,12 @@ get_constraint_for_component_ref (tree t
                {
                  cexpr.var = curr->id;
                  results->safe_push (cexpr);
-                 if (address_p)
+                 /* If we take the address and the field starts exactly
+                    at the desired position that was all we need to add.  */
+                 if (address_p
+                     && curr->offset == (unsigned HOST_WIDE_INT) bitpos
+                     && bitmaxsize != -1
+                     && bitsize == bitmaxsize)
                    break;
                }
            }
Comment 22 Richard Biener 2014-03-11 12:42:50 UTC
Author: rguenth
Date: Tue Mar 11 12:42:18 2014
New Revision: 208479

URL: http://gcc.gnu.org/viewcvs?rev=208479&root=gcc&view=rev
Log:
2014-03-11  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/60429
	PR tree-optimization/60485
	* tree-ssa-structalias.c (set_union_with_increment): Properly
	take into account all fields that overlap the shifted vars.
	(do_sd_constraint): Likewise.
	(do_ds_constraint): Likewise.
	(get_constraint_for_ptr_offset): Likewise.

	* gcc.dg/pr60485-1.c: New testcase.
	* gcc.dg/pr60485-2.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/pr60485-1.c
    trunk/gcc/testsuite/gcc.dg/pr60485-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-structalias.c
Comment 23 Richard Biener 2014-03-11 12:59:33 UTC
Should be fixed for 4.9 (sofar).  Would be nice if you can verify that it fixes
the original problem (it fixes your reduced testcase for me).
Comment 24 Allan Jensen 2014-03-11 14:38:20 UTC
I just tested the latest subversion head of gcc 4.9 and can confirm it fixes the original problem (tst_qregion in Qt 5.2.1 compiled with -O3).
Comment 25 Allan Jensen 2014-03-15 13:20:49 UTC
Will it be backported to 4.8?
Comment 26 rguenther@suse.de 2014-03-17 08:30:28 UTC
On Sat, 15 Mar 2014, linux at carewolf dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
> 
> --- Comment #25 from Allan Jensen <linux at carewolf dot com> ---
> Will it be backported to 4.8?

Yes.
Comment 27 Richard Biener 2014-03-17 13:09:13 UTC
Author: rguenth
Date: Mon Mar 17 13:08:41 2014
New Revision: 208615

URL: http://gcc.gnu.org/viewcvs?rev=208615&root=gcc&view=rev
Log:
2014-03-17  Richard Biener  <rguenther@suse.de>

	Backport from mainline
	2014-03-11  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/60429
	PR tree-optimization/60485
	* tree-ssa-structalias.c (set_union_with_increment): Properly
	take into account all fields that overlap the shifted vars.
	(do_sd_constraint): Likewise.
	(do_ds_constraint): Likewise.
	(get_constraint_for_ptr_offset): Likewise.

	* gcc.dg/pr60485-1.c: New testcase.
	* gcc.dg/pr60485-2.c: Likewise.

Added:
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/pr60485-1.c
    branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/pr60485-2.c
Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_8-branch/gcc/tree-ssa-structalias.c
Comment 28 Richard Biener 2014-03-17 13:09:25 UTC
Also fixed on the 4.8 branch.
Comment 29 Mikael Pettersson 2014-03-22 10:57:44 UTC
Richard, your 4.8 backport performs the same call twice in a row on lines 3017 and 3018 in tree-ssa-structalias.c; is that really intentional?

+	  /* We have to include all fields that overlap the current
+	     field shifted by rhsoffset.  And we include at least
+	     the last or the first field of the variable to represent
+	     reachability of off-bound addresses, in particular &object + 1,
+	     conservatively correct.  */
+	  temp = first_or_preceding_vi_for_offset (curr, offset);
+	  temp = first_or_preceding_vi_for_offset (curr, offset);
Comment 30 Richard Biener 2014-03-24 08:43:09 UTC
Author: rguenth
Date: Mon Mar 24 08:42:37 2014
New Revision: 208784

URL: http://gcc.gnu.org/viewcvs?rev=208784&root=gcc&view=rev
Log:
2014-03-24  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/60429
	* tree-ssa-structalias.c (get_constraint_for_ptr_offset): Remove
	duplicated line.

Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/tree-ssa-structalias.c
Comment 31 rguenther@suse.de 2014-03-24 08:43:18 UTC
On Sat, 22 Mar 2014, mikpelinux at gmail dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
> 
> Mikael Pettersson <mikpelinux at gmail dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |mikpelinux at gmail dot com
> 
> --- Comment #29 from Mikael Pettersson <mikpelinux at gmail dot com> ---
> Richard, your 4.8 backport performs the same call twice in a row on lines 3017
> and 3018 in tree-ssa-structalias.c; is that really intentional?
> 
> +      /* We have to include all fields that overlap the current
> +         field shifted by rhsoffset.  And we include at least
> +         the last or the first field of the variable to represent
> +         reachability of off-bound addresses, in particular &object + 1,
> +         conservatively correct.  */
> +      temp = first_or_preceding_vi_for_offset (curr, offset);
> +      temp = first_or_preceding_vi_for_offset (curr, offset);

No, fixed.
Comment 32 Richard Biener 2014-06-12 13:36:57 UTC
Fixed for 4.8.3.