505.mcf_f produces incorrect output when built with both LTO/FDO. Using either option separately is fine. GCC trunk r255207 was used. Following are options used. OPTIMIZE = -O3 -mcpu=power8 -flto PASS1_FLAGS = -fprofile-generate PASS1_LDFLAGS = -fprofile-generate PASS2_FLAGS = -fprofile-use PASS2_LDFLAGS = -fprofile-use Contents of inp.out.mis (miscompares). 0010: simplex iterations : 107102 simplex iterations : 107598 ^ 0014: simplex iterations : 152479 simplex iterations : 149876 ^ 0016: erased arcs : 995716 erased arcs : 995702 ^ 0017: new implicit arcs : 2995716 new implicit arcs : 2995702 ^ 0019: simplex iterations : 253145 simplex iterations : 248008 ^ 0020: objective value : 12161789395 objective value : 12171761765 ^ 0021: erased arcs : 2991635 erased arcs : 2991537 ^ 0022: new implicit arcs : 2991635 new implicit arcs : 2991537 ^ 0024: simplex iterations : 398127 simplex iterations : 385785 ^ 0025: objective value : 11729854482 objective value : 11769820561 ^ It appears to work fine with r254943. I'll start a bisect and post results.
It may be latent for a while -- the same problem exists with GCC 7. (Well, technically with branches/ibm/gcc-7-branch.)
(In reply to Pat Haugen from comment #0) > > It appears to work fine with r254943. I'll start a bisect and post results. My bisect showed that r254946 was where it started failing on trunk. And yes, it fails with current GCC 7 branch too.
(In reply to Pat Haugen from comment #2) > (In reply to Pat Haugen from comment #0) > > > > It appears to work fine with r254943. I'll start a bisect and post results. > > My bisect showed that r254946 was where it started failing on trunk. And > yes, it fails with current GCC 7 branch too. Probably a bogus bisect point, for a few revisions inlining was broken so it's probably broken a bit earlier as well.
Do we have an indication that it worked with any 6.x or 7.x release?
Current FSF 6 branch works fine, so I have some bisect points. Will comment further as I find out.
So I did a bisect of trunk during the GCC 7 development timeframe (r235035-r247017) and it pointed to r236878 as the point where the failure started. +++ gcc/ChangeLog (revision 236878) @@ -1,3 +1,9 @@ +2016-05-30 Jan Hubicka <hubicka@ucw.cz> + + * tree-ssa-loop-ivcanon.c (try_peel_loop): Correctly set wont_exit + for peeled copies; avoid underflow when updating estimates; correctly + scale loop profile. +
Confirmed on x86_64-linux-gnu with current trunk. I'll try to reproduce with the suspected revision.
But my bisection script points to r217827 as first bad revision: 2014-11-20 Richard Biener <rguenther@suse.de> PR tree-optimization/63677 * tree-ssa-dom.c: Include gimplify.h for unshare_expr. (avail_exprs_stack): Make a vector of pairs. (struct hash_expr_elt): Replace stmt member with vop member. (expr_elt_hasher::equal): Simplify. (initialize_hash_element): Adjust. (initialize_hash_element_from_expr): Likewise. (dom_opt_dom_walker::thread_across_edge): Likewise. (record_cond): Likewise. (dom_opt_dom_walker::before_dom_children): Likewise. (print_expr_hash_elt): Likewise. (remove_local_expressions_from_table): Restore previous state if requested. (record_equivalences_from_stmt): Record &x + CST as constant &MEM[&x, CST] for further propagation. (vuse_eq): New function. (lookup_avail_expr): For loads use the alias oracle to see whether a candidate from the expr hash is usable. (avail_expr_hash): Do not hash VUSEs. * gcc.dg/tree-ssa/ssa-dom-cse-2.c: New testcase. * gcc.dg/tree-ssa/ssa-dom-cse-3.c: Likewise.
Using just a single ltrans, I see first divergence in mcf_r.ltrans0.088t.dom1. Richi, how possible is the revision real culprit?
On Tue, 19 Dec 2017, marxin at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201 > > --- Comment #9 from Martin Liška <marxin at gcc dot gnu.org> --- > Using just a single ltrans, I see first divergence in mcf_r.ltrans0.088t.dom1. > Richi, how possible is the revision real culprit? Unlikely, it probably triggers a latent issue...
(In reply to rguenther@suse.de from comment #10) > On Tue, 19 Dec 2017, marxin at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201 > > > > --- Comment #9 from Martin Liška <marxin at gcc dot gnu.org> --- > > Using just a single ltrans, I see first divergence in mcf_r.ltrans0.088t.dom1. > > Richi, how possible is the revision real culprit? > > Unlikely, it probably triggers a latent issue... That's not much positive :/ Ideas what to do with that?
On Tue, 19 Dec 2017, marxin at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201 > > --- Comment #11 from Martin Liška <marxin at gcc dot gnu.org> --- > (In reply to rguenther@suse.de from comment #10) > > On Tue, 19 Dec 2017, marxin at gcc dot gnu.org wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201 > > > > > > --- Comment #9 from Martin Liška <marxin at gcc dot gnu.org> --- > > > Using just a single ltrans, I see first divergence in mcf_r.ltrans0.088t.dom1. > > > Richi, how possible is the revision real culprit? > > > > Unlikely, it probably triggers a latent issue... > > That's not much positive :/ Ideas what to do with that? Given 505.mcf_r isn't too large figuring out the miscompiled function shouldn't be too hard (heh, well ...). The key is probably to somehow reproduce without FDO ... like just do -fprofile-use? What other flags did you use on x86_64?
Likely invalid. spec_qsort is full of alias violations. We sort typedef struct basket { arc_t *a; cost_t cost; cost_t abs_cost; LONG number; } BASKET; and spec_qsort does stuff like if (n < 7) { for (pm = (char *)a + es; pm < (char *)a + n * es; pm += es) for (pl = pm; pl > (char *)a && cmp(pl - es, pl) > 0; pl -= es) swap(pl, pl - es); return; with #define swap(a, b) \ if (swaptype_long == 0) { \ long t = *(long *)(a); \ *(long *)(a) = *(long *)(b); \ *(long *)(b) = t; \ } else if (swaptype_int == 0) { \ int t = *(int *)(a); \ *(int *)(a) = *(int *)(b); \ *(int *)(b) = t; \ } else \ swapfunc((char *)a, (char *)b, es, swaptype_long, swaptype_int) eh... (no, swapfunc isn't any better). Anybody up to reporting this to SPEC?
Copies in more benchmarks: > find benchspec -name spec_qsort.c benchspec/common/spec_qsort/spec_qsort.c benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.c benchspec/CPU/502.gcc_r/src/spec_qsort/spec_qsort.c benchspec/CPU/511.povray_r/src/spec_qsort/spec_qsort.c benchspec/CPU/527.cam4_r/src/spec_qsort/spec_qsort.c ...
SWAPINIT should end up with swaptype_long == 1 I think and swaptype_int == 1 for the cases in question. Enforcing swaptype_int = swaptype_long = 2 should make it work (scratch SWAPINIT calls). #define SWAPINIT(TYPE, a, es) swaptype_ ## TYPE = \ ((char *)a - (char *)0) % sizeof(TYPE) || \ es % sizeof(TYPE) ? 2 : es == sizeof(TYPE) ? 0 : 1; static INLINE void swapfunc(char *a, char *b, int n, int swaptype_long, int swaptype_int) { if (swaptype_long <= 1) swapcode(long, a, b, n) else if (swaptype_int <= 1) swapcode(int, a, b, n) else swapcode(char, a, b, n) } #define swap(a, b) \ if (swaptype_long == 0) { \ long t = *(long *)(a); \ *(long *)(a) = *(long *)(b); \ *(long *)(b) = t; \ } else if (swaptype_int == 0) { \ int t = *(int *)(a); \ *(int *)(a) = *(int *)(b); \ *(int *)(b) = t; \ } else \ swapfunc((char *)a, (char *)b, es, swaptype_long, swaptype_int) ... loop: SWAPINIT(long, a, es); SWAPINIT(int, a, es);
(In reply to Richard Biener from comment #15) > SWAPINIT should end up with swaptype_long == 1 I think and swaptype_int == 1 > for the cases in question. Enforcing swaptype_int = swaptype_long = 2 > should make it work (scratch SWAPINIT calls). I can confirm that.
Created attachment 42919 [details] Patch that removes violation of aliasing rules
(In reply to Martin Liška from comment #16) > (In reply to Richard Biener from comment #15) > > SWAPINIT should end up with swaptype_long == 1 I think and swaptype_int == 1 > > for the cases in question. Enforcing swaptype_int = swaptype_long = 2 > > should make it work (scratch SWAPINIT calls). > > I can confirm that. Yes, that fixes the problem for me on PowerPC also. I can pass along the info to our SPEC rep. Richi, I'm curious if the alias violations were apparent in a dump file, or did you just happened to spot them looking through the source?
On December 19, 2017 5:42:07 PM GMT+01:00, "pthaugen at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201 > >--- Comment #18 from Pat Haugen <pthaugen at gcc dot gnu.org> --- >(In reply to Martin Liška from comment #16) >> (In reply to Richard Biener from comment #15) >> > SWAPINIT should end up with swaptype_long == 1 I think and >swaptype_int == 1 >> > for the cases in question. Enforcing swaptype_int = swaptype_long >= 2 >> > should make it work (scratch SWAPINIT calls). >> >> I can confirm that. > >Yes, that fixes the problem for me on PowerPC also. I can pass along >the info >to our SPEC rep. > > >Richi, >I'm curious if the alias violations were apparent in a dump file, or >did you >just happened to spot them looking through the source? I just looked at the source seeing *(long *) ptr and bells went off. Richard.
Based on c#18.
Comments from SPEC CPU Documentation Guy: As described at https://www.spec.org/cpu2017/Docs/faq.html#Miscompare.07 https://www.spec.org/cpu2017/Docs/benchmarks/505.mcf_r.html#Portability SPEC's recommendation is to use -fno-strict-aliasing The patch attached to this bug is not approved by SPEC. SPEC generally does not change benchmarks after release (no moving targets); more detail is at the above links. John Henning SPEC CPU Subcommittee Secretary
(In reply to john henning from comment #21) > Comments from SPEC CPU Documentation Guy: > > As described at > https://www.spec.org/cpu2017/Docs/faq.html#Miscompare.07 > https://www.spec.org/cpu2017/Docs/benchmarks/505.mcf_r.html#Portability > SPEC's recommendation is to use > -fno-strict-aliasing > > The patch attached to this bug is not approved by SPEC. > > SPEC generally does not change benchmarks after release (no moving targets); > more detail is at the above links. > > John Henning > SPEC CPU Subcommittee Secretary Note that the same function is used by other benchmarks as well: 502.gcc_r, 511.povray_r, 527.cam4_r That means the same miscompilation can happen for them as well :/
Sorry for digging up the past, but recently ran into a related implementation issue in OpenBSD's qsort implementation and came across this discussion. While I understand the proposed patch was rejected, it looks like it also degraded the implementation to single byte swaps. Admittedly untested, but a possible solution that I think would make LTO happy (avoids aliasing the pointer) and doesn't change the implementation profile would be to cast it differently since it's always dealing with a simple type alignment check on long/int/char. Something like this: #define SWAPINIT(TYPE, a, es) swaptype_ ## TYPE = (uintptr_t)a % sizeof(TYPE) || es % sizeof(TYPE) ? 2 : es == sizeof(TYPE) ? 0 : 1; Maybe a more acceptable alternative compromise since it would have a nearly identical performance profile with it still swap-optimizing int and long.
(In reply to Sean from comment #23) > Sorry for digging up the past, but recently ran into a related > implementation issue in OpenBSD's qsort implementation and came across this > discussion. While I understand the proposed patch was rejected, it looks > like it also degraded the implementation to single byte swaps. > > Admittedly untested, but a possible solution that I think would make LTO > happy (avoids aliasing the pointer) and doesn't change the implementation > profile would be to cast it differently since it's always dealing with a > simple type alignment check on long/int/char. Something like this: > > #define SWAPINIT(TYPE, a, es) swaptype_ ## TYPE = (uintptr_t)a % > sizeof(TYPE) || es % sizeof(TYPE) ? 2 : es == sizeof(TYPE) ? 0 : 1; > > Maybe a more acceptable alternative compromise since it would have a nearly > identical performance profile with it still swap-optimizing int and long. Note the issue is not the way the type is computed but the memory accesses done via #define swap(a, b) \ if (swaptype_long == 0) { \ long t = *(long *)(a); \ *(long *)(a) = *(long *)(b); \ *(long *)(b) = t; \ } else if (swaptype_int == 0) { \ int t = *(int *)(a); \ *(int *)(a) = *(int *)(b); \ *(int *)(b) = t; \ there exists non-standard ways to preserve the access pattern like typedef int notbaa_int __attribute__((may_alias)); typedef long notbaa_long __attribute__((may_alias)); #define swap(a, b) \ if (swaptype_long == 0) { \ long t = *(notbaa_long *)(a); \ *(notbaa_long *)(a) = *(notbaa_long *)(b); \ *(notbaa_long *)(b) = t; \ } else if (swaptype_int == 0) { \ int t = *(notbaa_int *)(a); \ *(notbaa_int *)(a) = *(notbaa_int *)(b); \ *(notbaa_int *)(b) = t; \ or portable ones like doing long t; memcpy (&t, a, sizeof (long)); memcpy (a, b, sizeof (long)); memcpy (b, &t, sizeof (long)); and hoping for the compiler to optimize this well (GCC will do). Note at time I proposed the change to the originating project (freebsd?) and it was accepted there.
Ah, that makes sense, thank you Richard. I didn't pay as close attention to the actual swap() code and casting going on there. Apparently unrelated, but perhaps worth noting the reason this come up on my radar is because the latest compilers now also issue a warning on the subtraction: error: performing pointer subtraction with a null pointer has undefined behavior [-Werror,-Wnull-pointer-subtraction]