Bug 83201 - [7/8 Regression] SPEC CPU2017 505.mcf_r produces incorrect output when built with -flto and FDO
Summary: [7/8 Regression] SPEC CPU2017 505.mcf_r produces incorrect output when built ...
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: lto (show other bugs)
Version: 8.0
: P3 normal
Target Milestone: 7.3
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2017-11-28 20:12 UTC by Pat Haugen
Modified: 2018-06-10 17:59 UTC (History)
7 users (show)

See Also:
Host: powerpc64le-unknown-linux-gnu, x86_64-linux-gnu
Target: powerpc64le-unknown-linux-gnu, x86_64-linux-gnu
Build: powerpc64le-unknown-linux-gnu, x86_64-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2017-12-19 00:00:00


Attachments
Patch that removes violation of aliasing rules (318 bytes, patch)
2017-12-19 14:50 UTC, Martin Liška
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Pat Haugen 2017-11-28 20:12:25 UTC
505.mcf_f produces incorrect output when built with both LTO/FDO. Using either option separately is fine. GCC trunk r255207 was used. Following are options used.

OPTIMIZE        = -O3 -mcpu=power8 -flto

PASS1_FLAGS   = -fprofile-generate
PASS1_LDFLAGS  = -fprofile-generate
PASS2_FLAGS   = -fprofile-use
PASS2_LDFLAGS  = -fprofile-use


Contents of inp.out.mis (miscompares).

0010:  simplex iterations         : 107102
       simplex iterations         : 107598
                                       ^
0014:  simplex iterations         : 152479
       simplex iterations         : 149876
                                     ^
0016:  erased arcs                : 995716
       erased arcs                : 995702
                                        ^
0017:  new implicit arcs          : 2995716
       new implicit arcs          : 2995702
                                         ^
0019:  simplex iterations         : 253145
       simplex iterations         : 248008
                                     ^
0020:  objective value            : 12161789395
       objective value            : 12171761765
                                       ^
0021:  erased arcs                : 2991635
       erased arcs                : 2991537
                                        ^
0022:  new implicit arcs          : 2991635
       new implicit arcs          : 2991537
                                        ^
0024:  simplex iterations         : 398127
       simplex iterations         : 385785
                                     ^
0025:  objective value            : 11729854482
       objective value            : 11769820561
                                       ^


It appears to work fine with r254943. I'll start a bisect and post results.
Comment 1 Bill Schmidt 2017-11-28 20:20:01 UTC
It may be latent for a while -- the same problem exists with GCC 7.  (Well, technically with branches/ibm/gcc-7-branch.)
Comment 2 Pat Haugen 2017-11-28 22:53:35 UTC
(In reply to Pat Haugen from comment #0)
> 
> It appears to work fine with r254943. I'll start a bisect and post results.

My bisect showed that r254946 was where it started failing on trunk. And yes, it fails with current GCC 7 branch too.
Comment 3 Richard Biener 2017-11-29 08:32:28 UTC
(In reply to Pat Haugen from comment #2)
> (In reply to Pat Haugen from comment #0)
> > 
> > It appears to work fine with r254943. I'll start a bisect and post results.
> 
> My bisect showed that r254946 was where it started failing on trunk. And
> yes, it fails with current GCC 7 branch too.

Probably a bogus bisect point, for a few revisions inlining was broken so it's probably broken a bit earlier as well.
Comment 4 Richard Biener 2017-12-14 13:04:25 UTC
Do we have an indication that it worked with any 6.x or 7.x release?
Comment 5 Pat Haugen 2017-12-15 02:30:57 UTC
Current FSF 6 branch works fine, so I have some bisect points. Will comment further as I find out.
Comment 6 Pat Haugen 2017-12-15 21:39:22 UTC
So I did a bisect of trunk during the GCC 7 development timeframe (r235035-r247017) and it pointed to r236878 as the point where the failure started.


+++ gcc/ChangeLog	(revision 236878)
@@ -1,3 +1,9 @@
+2016-05-30  Jan Hubicka  <hubicka@ucw.cz>
+
+	* tree-ssa-loop-ivcanon.c (try_peel_loop): Correctly set wont_exit
+	for peeled copies; avoid underflow when updating estimates; correctly
+	scale loop profile.
+
Comment 7 Martin Liška 2017-12-19 10:52:21 UTC
Confirmed on x86_64-linux-gnu with current trunk. I'll try to reproduce with the suspected revision.
Comment 8 Martin Liška 2017-12-19 12:09:29 UTC
But my bisection script points to r217827 as first bad revision:

2014-11-20   Richard Biener  <rguenther@suse.de>

	PR tree-optimization/63677
	* tree-ssa-dom.c: Include gimplify.h for unshare_expr.
	(avail_exprs_stack): Make a vector of pairs.
	(struct hash_expr_elt): Replace stmt member with vop member.
	(expr_elt_hasher::equal): Simplify.
	(initialize_hash_element): Adjust.
	(initialize_hash_element_from_expr): Likewise.
	(dom_opt_dom_walker::thread_across_edge): Likewise.
	(record_cond): Likewise.
	(dom_opt_dom_walker::before_dom_children): Likewise.
	(print_expr_hash_elt): Likewise.
	(remove_local_expressions_from_table): Restore previous state
	if requested.
	(record_equivalences_from_stmt): Record &x + CST as constant
	&MEM[&x, CST] for further propagation.
	(vuse_eq): New function.
	(lookup_avail_expr): For loads use the alias oracle to see
	whether a candidate from the expr hash is usable.
	(avail_expr_hash): Do not hash VUSEs.

	* gcc.dg/tree-ssa/ssa-dom-cse-2.c: New testcase.
	* gcc.dg/tree-ssa/ssa-dom-cse-3.c: Likewise.
Comment 9 Martin Liška 2017-12-19 12:21:54 UTC
Using just a single ltrans, I see first divergence in mcf_r.ltrans0.088t.dom1.
Richi, how possible is the revision real culprit?
Comment 10 rguenther@suse.de 2017-12-19 13:05:57 UTC
On Tue, 19 Dec 2017, marxin at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201
> 
> --- Comment #9 from Martin Liška <marxin at gcc dot gnu.org> ---
> Using just a single ltrans, I see first divergence in mcf_r.ltrans0.088t.dom1.
> Richi, how possible is the revision real culprit?

Unlikely, it probably triggers a latent issue...
Comment 11 Martin Liška 2017-12-19 13:12:01 UTC
(In reply to rguenther@suse.de from comment #10)
> On Tue, 19 Dec 2017, marxin at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201
> > 
> > --- Comment #9 from Martin Liška <marxin at gcc dot gnu.org> ---
> > Using just a single ltrans, I see first divergence in mcf_r.ltrans0.088t.dom1.
> > Richi, how possible is the revision real culprit?
> 
> Unlikely, it probably triggers a latent issue...

That's not much positive :/ Ideas what to do with that?
Comment 12 rguenther@suse.de 2017-12-19 13:17:44 UTC
On Tue, 19 Dec 2017, marxin at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201
> 
> --- Comment #11 from Martin Liška <marxin at gcc dot gnu.org> ---
> (In reply to rguenther@suse.de from comment #10)
> > On Tue, 19 Dec 2017, marxin at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201
> > > 
> > > --- Comment #9 from Martin Liška <marxin at gcc dot gnu.org> ---
> > > Using just a single ltrans, I see first divergence in mcf_r.ltrans0.088t.dom1.
> > > Richi, how possible is the revision real culprit?
> > 
> > Unlikely, it probably triggers a latent issue...
> 
> That's not much positive :/ Ideas what to do with that?

Given 505.mcf_r isn't too large figuring out the miscompiled function
shouldn't be too hard (heh, well ...).  The key is probably to somehow
reproduce without FDO ... like just do -fprofile-use?  What other flags
did you use on x86_64?
Comment 13 Richard Biener 2017-12-19 13:38:27 UTC
Likely invalid.  spec_qsort is full of alias violations.  We sort

typedef struct basket
{
    arc_t *a;
    cost_t cost;
    cost_t abs_cost;
    LONG number;
} BASKET;

and spec_qsort does stuff like

        if (n < 7) {
                for (pm = (char *)a + es; pm < (char *)a + n * es; pm += es)
                        for (pl = pm;
                             pl > (char *)a && cmp(pl - es, pl) > 0;
                             pl -= es)
                                swap(pl, pl - es);
                return;

with

#define swap(a, b)                              \
        if (swaptype_long == 0) {               \
                long t = *(long *)(a);          \
                *(long *)(a) = *(long *)(b);    \
                *(long *)(b) = t;               \
        } else if (swaptype_int == 0) {         \
                int t = *(int *)(a);            \
                *(int *)(a) = *(int *)(b);      \
                *(int *)(b) = t;                \
        } else                                  \
                swapfunc((char *)a, (char *)b, es, swaptype_long, swaptype_int)

eh... (no, swapfunc isn't any better).

Anybody up to reporting this to SPEC?
Comment 14 Richard Biener 2017-12-19 13:39:45 UTC
Copies in more benchmarks:

> find benchspec -name spec_qsort.c
benchspec/common/spec_qsort/spec_qsort.c
benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.c
benchspec/CPU/502.gcc_r/src/spec_qsort/spec_qsort.c
benchspec/CPU/511.povray_r/src/spec_qsort/spec_qsort.c
benchspec/CPU/527.cam4_r/src/spec_qsort/spec_qsort.c

...
Comment 15 Richard Biener 2017-12-19 13:55:52 UTC
SWAPINIT should end up with swaptype_long == 1 I think and swaptype_int == 1
for the cases in question.  Enforcing swaptype_int = swaptype_long = 2
should make it work (scratch SWAPINIT calls).

#define SWAPINIT(TYPE, a, es) swaptype_ ## TYPE =       \
        ((char *)a - (char *)0) % sizeof(TYPE) ||       \
        es % sizeof(TYPE) ? 2 : es == sizeof(TYPE) ? 0 : 1;

static INLINE void
swapfunc(char *a, char *b, int n, int swaptype_long, int swaptype_int)
{
        if (swaptype_long <= 1)
                swapcode(long, a, b, n)
        else if (swaptype_int <= 1)
                swapcode(int, a, b, n)
        else
                swapcode(char, a, b, n)
}

#define swap(a, b)                              \
        if (swaptype_long == 0) {               \
                long t = *(long *)(a);          \
                *(long *)(a) = *(long *)(b);    \
                *(long *)(b) = t;               \
        } else if (swaptype_int == 0) {         \
                int t = *(int *)(a);            \
                *(int *)(a) = *(int *)(b);      \
                *(int *)(b) = t;                \
        } else                                  \
                swapfunc((char *)a, (char *)b, es, swaptype_long, swaptype_int)

...


loop:   SWAPINIT(long, a, es);
        SWAPINIT(int, a, es);
Comment 16 Martin Liška 2017-12-19 14:00:38 UTC
(In reply to Richard Biener from comment #15)
> SWAPINIT should end up with swaptype_long == 1 I think and swaptype_int == 1
> for the cases in question.  Enforcing swaptype_int = swaptype_long = 2
> should make it work (scratch SWAPINIT calls).

I can confirm that.
Comment 17 Martin Liška 2017-12-19 14:50:33 UTC
Created attachment 42919 [details]
Patch that removes violation of aliasing rules
Comment 18 Pat Haugen 2017-12-19 16:42:07 UTC
(In reply to Martin Liška from comment #16)
> (In reply to Richard Biener from comment #15)
> > SWAPINIT should end up with swaptype_long == 1 I think and swaptype_int == 1
> > for the cases in question.  Enforcing swaptype_int = swaptype_long = 2
> > should make it work (scratch SWAPINIT calls).
> 
> I can confirm that.

Yes, that fixes the problem for me on PowerPC also. I can pass along the info to our SPEC rep.


Richi,
  I'm curious if the alias violations were apparent in a dump file, or did you just happened to spot them looking through the source?
Comment 19 rguenther@suse.de 2017-12-19 17:32:35 UTC
On December 19, 2017 5:42:07 PM GMT+01:00, "pthaugen at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201
>
>--- Comment #18 from Pat Haugen <pthaugen at gcc dot gnu.org> ---
>(In reply to Martin Liška from comment #16)
>> (In reply to Richard Biener from comment #15)
>> > SWAPINIT should end up with swaptype_long == 1 I think and
>swaptype_int == 1
>> > for the cases in question.  Enforcing swaptype_int = swaptype_long
>= 2
>> > should make it work (scratch SWAPINIT calls).
>> 
>> I can confirm that.
>
>Yes, that fixes the problem for me on PowerPC also. I can pass along
>the info
>to our SPEC rep.
>
>
>Richi,
>I'm curious if the alias violations were apparent in a dump file, or
>did you
>just happened to spot them looking through the source?

I just looked at the source seeing *(long *) ptr and bells went off. 

Richard.
Comment 20 Jeffrey A. Law 2018-01-07 04:37:18 UTC
Based on c#18.
Comment 21 john henning 2018-05-09 16:13:49 UTC
Comments from SPEC CPU Documentation Guy:

As described at 
https://www.spec.org/cpu2017/Docs/faq.html#Miscompare.07
https://www.spec.org/cpu2017/Docs/benchmarks/505.mcf_r.html#Portability
SPEC's recommendation is to use
   -fno-strict-aliasing

The patch attached to this bug is not approved by SPEC.

SPEC generally does not change benchmarks after release (no moving targets); more detail is at the above links.

   John Henning
   SPEC CPU Subcommittee Secretary
Comment 22 Martin Liška 2018-06-06 10:06:09 UTC
(In reply to john henning from comment #21)
> Comments from SPEC CPU Documentation Guy:
> 
> As described at 
> https://www.spec.org/cpu2017/Docs/faq.html#Miscompare.07
> https://www.spec.org/cpu2017/Docs/benchmarks/505.mcf_r.html#Portability
> SPEC's recommendation is to use
>    -fno-strict-aliasing
> 
> The patch attached to this bug is not approved by SPEC.
> 
> SPEC generally does not change benchmarks after release (no moving targets);
> more detail is at the above links.
> 
>    John Henning
>    SPEC CPU Subcommittee Secretary

Note that the same function is used by other benchmarks as well: 502.gcc_r, 511.povray_r, 527.cam4_r

That means the same miscompilation can happen for them as well :/