After recently trying to build Qt with -O3, I found one of our tests failing. After investigating I narrowed it down to qregion.cpp and the flag -finline-functions (using -O2 -finline-functions). Specificially the inlining of the function loadAET() in "QRegionPrivate *PolygonRegion()" causes the problem. Adding __attribute__((noinline)) to loadAET() solves the problem. Interestingly compiling without -finline-function and just marking the loadAET as inline or always_inline, does not trigger the issue. The code is not Qt specific and is used in a other projects, though it is most places compiled as C code. Further testing shows the issue is also triggered in GCC 4.6, 4.7, and the latest version of 4.9 I had. It is however NOT present in GCC 4.4, so this is a regression. Looking at the assembler and debug output, I believe the that the variable pAET that should have been reloaded after loadAET() has been eliminated. This might be what breaks the aliasing rules.
Created attachment 32268 [details] qregion.cpp intermediate compiled with G++ 4.4 (working)
Created attachment 32269 [details] qregion.cpp intermediate compiled with gcc 4.8
Created attachment 32270 [details] qregion.cpp assembler compiled with gcc 4.8
Created attachment 32271 [details] qregion.cpp assembler compiled with gcc 4.4
Can you identify the inlined call? Is it if (pSLL && y == pSLL->scanline) { loadAET(&AET, pSLL->edgelist); pSLL = pSLL->next; } or if (pSLL && y == pSLL->scanline) { loadAET(&AET, pSLL->edgelist); computeWAET(&AET); pSLL = pSLL->next; } ? Can you also show the correct and the invalid assembly snippet? Not sure if it is possible, but having a "driver" around this (in a separate compilation-unit) that calls PolygonRegion, verifying correct output for correct input and thus showing the miscompile would be nice (so there is a runtime testcase). It doesn't matter if linking requires Qt. I am somewhat lost in that large function ;)
(In reply to Richard Biener from comment #5) > Can you identify the inlined call? Is it > > if (pSLL && y == pSLL->scanline) { > loadAET(&AET, pSLL->edgelist); > pSLL = pSLL->next; > } > > or > > if (pSLL && y == pSLL->scanline) { > loadAET(&AET, pSLL->edgelist); > computeWAET(&AET); > pSLL = pSLL->next; > } > > ? > It is the first one (at least that is the one I have triggered). > Can you also show the correct and the invalid assembly snippet? I haven't really been able to identify it except that between code from loadAET and the following while loop, there appears to be no code loading the variable pAET which controls the loop, it appears to be loaded before loadAET, even though loadAET may change it. Note that I have made a work-around for the issue by making AET a pointer to an EdgeTableEntry on the heap instead of an object on the stack. I believe somehow the aliasing between AET and the pointers in loadAET which may point to it is lost by GCC. > > Not sure if it is possible, but having a "driver" around this (in > a separate compilation-unit) that calls PolygonRegion, verifying correct > output for correct input > and thus showing the miscompile would be nice (so there is a runtime > testcase). It doesn't matter if linking requires Qt. > > I am somewhat lost in that large function ;) I posted like this in case the description would be enough to make someone know where to look. If you need to debug it and dig into it, I will try to make a proper reduced test case.
(In reply to Allan Jensen from comment #6) > I posted like this in case the description would be enough to make someone > know where to look. If you need to debug it and dig into it, I will try to > make a proper reduced test case. That would be nice.
Created attachment 32303 [details] Reduced test
Created attachment 32304 [details] Reduced test assembler
I have uploaded a reduced test. Compiled with -O0 or -O1 it outputs 180, compiled with -O2 or higher it outputs 179.
Note that to run it, it links against Qt5Core.
tmpPtBlock->pts = reinterpret_cast<QPoint *>(tmpPtBlock->data); Does this not violate C/C++ aliasing rules later on? I think data should be char array with the alignment of QPoint instead of an int array.
(In reply to Andrew Pinski from comment #12) > tmpPtBlock->pts = reinterpret_cast<QPoint > *>(tmpPtBlock->data); > > Does this not violate C/C++ aliasing rules later on? > > I think data should be char array with the alignment of QPoint instead of an > int array. True, the data array is also four times too big for 200 QPoints because of the int type. So fixing that would lower memory consumption, but I can't see how that would break anything, since the this is casting an uninitialized structure. And it doesn't seem to fit in with the symptoms.
Created attachment 32305 [details] reduced testcase A bit more reduced (doesn't need external lib): markus@x4 tmp % g++ -O2 test.ii && ./a.out 179 markus@x4 tmp % g++ -O1 test.ii && ./a.out 180
Created attachment 32306 [details] reduced testcase
Started with r179799.
I'll have a look.
Seems to be a PTA issue: InsertionSort_pETEchase.29_82, points-to vars: { } InsertionSort_pETEchase.29_86, points-to non-local, points-to escaped, points-to vars: { } p1_155, points-to NULL, points-to vars: { } <bb 30>: InsertionSort_pETEchase.29_82->next = p1_84; p1_155->next = InsertionSort_pETEchase.28_85; InsertionSort_pETEchase.29_86 = InsertionSort_pETEchase.28_85->back; InsertionSort_pETEchase.29_86->next = p1_155; p1_155->back = &InsertionSort_pETEchaseBackTMP; taking _155 as example # p1_155 = PHI <_35(57), p1_58(60)> _35, points-to NULL, points-to vars: { } p1_58, points-to vars: { } _35 = MEM[(struct _EdgeTableEntry * *)&AET + 16B]; # p1_58 = PHI <p1_84(59), p1_84(30), p1_87(34)> p1_84 = p1_155->next; p1_87 = p1_155->next; so it's a cycle seeded only by MEM[(struct _EdgeTableEntry * *)&AET + 16B]; _35 = { NULL } same as AET.128+64 _35 = AET.128+64 # p1_150 = PHI <&AET(47), p1_151(49)> # p1_151 = PHI <p1_19(47), p1_45(49)> p1_45 = p1_151->next; fn3_tmp = p1_45; p1_151->next = p1_47; p1_151->back = p1_150; p1_150 = &AET.0+96 *p1_150 + 128 = p1_151 Note that valgrind shows loads of errors (with the reduced testcase at least) that show invalid reads and writes even at -O0. So we may just optimistically optimize based on that undefined behavior. At least I can't see anything wrong with what PTA derives ...
Yes, looks like the reduced testcase is invalid and contains a few buffer overflows.
As for what Andrew said, yes, the reinterpret_casts<> look bogus, you should really change typedef struct _POINTBLOCK { int data[200 * sizeof(QPoint)]; QPoint *pts; struct _POINTBLOCK *next; } POINTBLOCK; to typedef struct _POINTBLOCK { char data[200 * sizeof(QPoint) * sizeof (int)]; QPoint *pts; struct _POINTBLOCK *next; } POINTBLOCK; but that doesn't change the outcome of the testcase. The reduced testcase requiring QtCore is valgrind clean for me. The cause of the issue _is_ what tree PRE does to the function though. +Replaced AET.next with prephitmp_4 in pPrevAET_44 = AET.next; in PolygonRegion, with -O2 -fno-ipa-cp. Still most of the pointers are computed to point to noting by PTA. Function calls left in that function after inlining are operator delete[], free, operator new, qBadAlloc and malloc calls. --param max-fields-for-field-sensitive=0 fixes it as well, so it does point at a PTA issue. Still looking ...
AFAIK I can understand the reduced testcase AET is never written to anything but the initial NULL pointers. Neither CerateETandAET nor loadAET do anything to the PolygonRegion local AET. I have a fix (bah, this function needs a LOT of TLC!) Index: gcc/tree-ssa-structalias.c =================================================================== --- gcc/tree-ssa-structalias.c (revision 208448) +++ gcc/tree-ssa-structalias.c (working copy) @@ -3218,7 +3218,12 @@ get_constraint_for_component_ref (tree t { cexpr.var = curr->id; results->safe_push (cexpr); - if (address_p) + /* If we take the address and the field starts exactly + at the desired position that was all we need to add. */ + if (address_p + && curr->offset == (unsigned HOST_WIDE_INT) bitpos + && bitmaxsize != -1 + && bitsize == bitmaxsize) break; } }
Author: rguenth Date: Tue Mar 11 12:42:18 2014 New Revision: 208479 URL: http://gcc.gnu.org/viewcvs?rev=208479&root=gcc&view=rev Log: 2014-03-11 Richard Biener <rguenther@suse.de> PR tree-optimization/60429 PR tree-optimization/60485 * tree-ssa-structalias.c (set_union_with_increment): Properly take into account all fields that overlap the shifted vars. (do_sd_constraint): Likewise. (do_ds_constraint): Likewise. (get_constraint_for_ptr_offset): Likewise. * gcc.dg/pr60485-1.c: New testcase. * gcc.dg/pr60485-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/pr60485-1.c trunk/gcc/testsuite/gcc.dg/pr60485-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-structalias.c
Should be fixed for 4.9 (sofar). Would be nice if you can verify that it fixes the original problem (it fixes your reduced testcase for me).
I just tested the latest subversion head of gcc 4.9 and can confirm it fixes the original problem (tst_qregion in Qt 5.2.1 compiled with -O3).
Will it be backported to 4.8?
On Sat, 15 Mar 2014, linux at carewolf dot com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429 > > --- Comment #25 from Allan Jensen <linux at carewolf dot com> --- > Will it be backported to 4.8? Yes.
Author: rguenth Date: Mon Mar 17 13:08:41 2014 New Revision: 208615 URL: http://gcc.gnu.org/viewcvs?rev=208615&root=gcc&view=rev Log: 2014-03-17 Richard Biener <rguenther@suse.de> Backport from mainline 2014-03-11 Richard Biener <rguenther@suse.de> PR tree-optimization/60429 PR tree-optimization/60485 * tree-ssa-structalias.c (set_union_with_increment): Properly take into account all fields that overlap the shifted vars. (do_sd_constraint): Likewise. (do_ds_constraint): Likewise. (get_constraint_for_ptr_offset): Likewise. * gcc.dg/pr60485-1.c: New testcase. * gcc.dg/pr60485-2.c: Likewise. Added: branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/pr60485-1.c branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/pr60485-2.c Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/testsuite/ChangeLog branches/gcc-4_8-branch/gcc/tree-ssa-structalias.c
Also fixed on the 4.8 branch.
Richard, your 4.8 backport performs the same call twice in a row on lines 3017 and 3018 in tree-ssa-structalias.c; is that really intentional? + /* We have to include all fields that overlap the current + field shifted by rhsoffset. And we include at least + the last or the first field of the variable to represent + reachability of off-bound addresses, in particular &object + 1, + conservatively correct. */ + temp = first_or_preceding_vi_for_offset (curr, offset); + temp = first_or_preceding_vi_for_offset (curr, offset);
Author: rguenth Date: Mon Mar 24 08:42:37 2014 New Revision: 208784 URL: http://gcc.gnu.org/viewcvs?rev=208784&root=gcc&view=rev Log: 2014-03-24 Richard Biener <rguenther@suse.de> PR tree-optimization/60429 * tree-ssa-structalias.c (get_constraint_for_ptr_offset): Remove duplicated line. Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/tree-ssa-structalias.c
On Sat, 22 Mar 2014, mikpelinux at gmail dot com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429 > > Mikael Pettersson <mikpelinux at gmail dot com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |mikpelinux at gmail dot com > > --- Comment #29 from Mikael Pettersson <mikpelinux at gmail dot com> --- > Richard, your 4.8 backport performs the same call twice in a row on lines 3017 > and 3018 in tree-ssa-structalias.c; is that really intentional? > > + /* We have to include all fields that overlap the current > + field shifted by rhsoffset. And we include at least > + the last or the first field of the variable to represent > + reachability of off-bound addresses, in particular &object + 1, > + conservatively correct. */ > + temp = first_or_preceding_vi_for_offset (curr, offset); > + temp = first_or_preceding_vi_for_offset (curr, offset); No, fixed.
Fixed for 4.8.3.