63432 – [5 Regression] profiledbootstrap failure with bootstrap-lto

Bug 63432 - [5 Regression] profiledbootstrap failure with bootstrap-lto

Summary: [5 Regression] profiledbootstrap failure with bootstrap-lto

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	bootstrap (show other bugs)
Version:	5.0

Importance:	P3 normal
Target Milestone:	5.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-10-01 16:19 UTC by H.J. Lu
Modified:	2014-11-24 13:36 UTC (History)
CC List:	0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2014-10-03 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description H.J. Lu 2014-10-01 16:19:32 UTC

On Linux/x86-64, r215740 failed to profiledbootstrap when
configured with --with-build-config=bootstrap-lto:

/export/project/git/gcc-regression/gcc/gcc/genhooks.c: In function ‘emit_documentation’:
/export/project/git/gcc-regression/gcc/gcc/genhooks.c:117:1: internal compiler error: in freqs_to_counts_path, at tree-ssa-threadupdate.c:981
 emit_documentation (const char *in_fname)
 ^
0x107902e freqs_to_counts_path
        /export/project/git/gcc-regression/gcc/gcc/tree-ssa-threadupdate.c:981
0x107902e ssa_fix_duplicate_block_edges(redirection_data*, ssa_local_info_t*)
        /export/project/git/gcc-regression/gcc/gcc/tree-ssa-threadupdate.c:1061
0x10791ea ssa_create_duplicates(redirection_data**, ssa_local_info_t*)
        /export/project/git/gcc-regression/gcc/gcc/tree-ssa-threadupdate.c:1275
0x1080149 void hash_table<redirection_data, xcallocator, false>::traverse_noresize<ssa_local_info_t*, &(ssa_create_duplicates(redirection_data**, ssa_local_info_t*))>(ssa_local_info_t*)
        /export/project/git/gcc-regression/gcc/gcc/hash-table.h:942
0x1080149 void hash_table<redirection_data, xcallocator, false>::traverse<ssa_local_info_t*, &(ssa_create_duplicates(redirection_data**, ssa_local_info_t*))>(ssa_local_info_t*)
        /export/project/git/gcc-regression/gcc/gcc/hash-table.h:964
0x10794b3 thread_block_1
        /export/project/git/gcc-regression/gcc/gcc/tree-ssa-threadupdate.c:1515
0x107d556 thread_block
        /export/project/git/gcc-regression/gcc/gcc/tree-ssa-threadupdate.c:1559
0x107d556 thread_through_all_blocks(bool)
        /export/project/git/gcc-regression/gcc/gcc/tree-ssa-threadupdate.c:2279
0x114eecb finalize_jump_threads
        /export/project/git/gcc-regression/gcc/gcc/tree-vrp.c:9856
0x114eecb execute_vrp
        /export/project/git/gcc-regression/gcc/gcc/tree-vrp.c:10010
0x114eecb execute
        /export/project/git/gcc-regression/gcc/gcc/tree-vrp.c:10073
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
make[4]: *** [/tmp/ccyuxuEg.ltrans0.ltrans.o] Error 1
lto-wrapper: fatal error: make returned 2 exit status
compilation terminated.
/bin/ld: lto-wrapper failed

r215738 is OK.  r215739 may be the cause.

Comment 1 Teresa Johnson 2014-10-01 16:24:55 UTC

Looks like a dup of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63422, due to incoming profile insanities. Will loosen up the new assert.

Teresa

Comment 2 H.J. Lu 2014-10-03 15:59:28 UTC

r215830 doesn't solve the problem.  Now I got

/export/project/git/gcc-regression/gcc/gcc/c/c-parser.c:3898:1: error: verify_flow_info: Wrong probability of edge 238->241 -1819341072 
 c_parser_attributes (c_parser *parser)
 ^
/export/project/git/gcc-regression/gcc/gcc/c/c-parser.c:3898:1: internal compiler error: verify_flow_info failed
0x6b2baa verify_flow_info()
        /export/project/git/gcc-regression/gcc/gcc/cfghooks.c:260
0xdd9bc7 cleanup_tree_cfg_noloop
        /export/project/git/gcc-regression/gcc/gcc/tree-cfgcleanup.c:765
0xdd9bc7 cleanup_tree_cfg()
        /export/project/git/gcc-regression/gcc/gcc/tree-cfgcleanup.c:814
0xbf2b04 execute_function_todo
        /export/project/git/gcc-regression/gcc/gcc/passes.c:1704
0xbff5b7 do_per_function
        /export/project/git/gcc-regression/gcc/gcc/passes.c:1478
0xbff5b7 execute_todo
        /export/project/git/gcc-regression/gcc/gcc/passes.c:1808
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
make[4]: *** [/tmp/ccxbDLKp.ltrans2.ltrans.o] Error 1

Comment 3 Teresa Johnson 2014-10-03 16:13:56 UTC

Presumably the fix for PR63422 (r215822) fixed the initial problem,
but r215830 must have introduced this. Will take a look right now.

Teresa

On Fri, Oct 3, 2014 at 8:59 AM, hjl.tools at gmail dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63432
>
> --- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
> r215830 doesn't solve the problem.  Now I got
>
> /export/project/git/gcc-regression/gcc/gcc/c/c-parser.c:3898:1: error:
> verify_flow_info: Wrong probability of edge 238->241 -1819341072
>  c_parser_attributes (c_parser *parser)
>  ^
> /export/project/git/gcc-regression/gcc/gcc/c/c-parser.c:3898:1: internal
> compiler error: verify_flow_info failed
> 0x6b2baa verify_flow_info()
>         /export/project/git/gcc-regression/gcc/gcc/cfghooks.c:260
> 0xdd9bc7 cleanup_tree_cfg_noloop
>         /export/project/git/gcc-regression/gcc/gcc/tree-cfgcleanup.c:765
> 0xdd9bc7 cleanup_tree_cfg()
>         /export/project/git/gcc-regression/gcc/gcc/tree-cfgcleanup.c:814
> 0xbf2b04 execute_function_todo
>         /export/project/git/gcc-regression/gcc/gcc/passes.c:1704
> 0xbff5b7 do_per_function
>         /export/project/git/gcc-regression/gcc/gcc/passes.c:1478
> 0xbff5b7 execute_todo
>         /export/project/git/gcc-regression/gcc/gcc/passes.c:1808
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <http://gcc.gnu.org/bugs.html> for instructions.
> make[4]: *** [/tmp/ccxbDLKp.ltrans2.ltrans.o] Error 1
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Comment 4 H.J. Lu 2014-10-03 17:21:32 UTC

r215830 introduced:

      /* Scale up the frequency by REG_BR_PROB_BASE, to avoid rounding
         errors applying the edge probability when the frequencies are very
         small.  */
      epath->src->count = epath->src->frequency * REG_BR_PROB_BASE;
      FOR_EACH_EDGE (esucc, ei, epath->src->succs)
        esucc->count = apply_probability (esucc->src->count,
                                          esucc->probability);

Can it guarantee that esucc->src->count has been scaled up by
REG_BR_PROB_BASE from esucc->src->frequency?

Comment 5 Teresa Johnson 2014-10-03 17:27:49 UTC

On Fri, Oct 3, 2014 at 10:21 AM, hjl.tools at gmail dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63432
>
> --- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> ---
> r215830 introduced:
>
>       /* Scale up the frequency by REG_BR_PROB_BASE, to avoid rounding
>          errors applying the edge probability when the frequencies are very
>          small.  */
>       epath->src->count = epath->src->frequency * REG_BR_PROB_BASE;
>       FOR_EACH_EDGE (esucc, ei, epath->src->succs)
>         esucc->count = apply_probability (esucc->src->count,
>                                           esucc->probability);
>
> Can it guarantee that esucc->src->count has been scaled up by
> REG_BR_PROB_BASE from esucc->src->frequency?

I believe so. Since esucc are successor edges of epath->src,
esucc->src == epath->src, and we have just set epath->src->count =
epath->src->frequency * REG_BR_PROB_BASE.

BTW, I am having a little trouble doing an LTO profiledbootstrap on my
linux/x86_64 machine. Here's what I did:

$ /usr/local/google/home/tejohnson/gcc_trunk_7/configure
--prefix=/usr/local/google/home/tejohnson/gcc_trunk_7_validate_lto/bld-gcc/install
--with-build-config=bootstrap-lto
$ make -j8 profiledbootstrap

I first got an error:
---
/usr/local/google/home/tejohnson/gcc_trunk_7/gcc/gengtype.c: In
function ‘type* create_user_defined_type(const char*, fileloc*)’:
/usr/local/google/home/tejohnson/gcc_trunk_7/gcc/gengtype.c:583:29:
error: ‘next’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
   str += strspn (str, delim);
                             ^
/usr/local/google/home/tejohnson/gcc_trunk_7/gcc/gengtype.c:636:13:
note: ‘next’ was declared here
       char *next;
             ^
cc1plus: all warnings being treated as errors
make[3]: *** [gengtype.o] Error 1
---

which I made a simple change in my gcc client to address.

After restarting the "make -j8 profiledbootstrap" I got a bunch of
other errors about undefined references in ltrans.o files. In case the
first error left things in a bad state I have wiped everything out and
restarted the process. Will let you know once I have reproduced.

Thanks,
Teresa

>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Comment 6 H.J. Lu 2014-10-03 17:33:37 UTC

(In reply to Teresa Johnson from comment #5)
> 
> BTW, I am having a little trouble doing an LTO profiledbootstrap on my
> linux/x86_64 machine. Here's what I did:
> 
> $ /usr/local/google/home/tejohnson/gcc_trunk_7/configure
> --prefix=/usr/local/google/home/tejohnson/gcc_trunk_7_validate_lto/bld-gcc/
> install
> --with-build-config=bootstrap-lto
> $ make -j8 profiledbootstrap
> 

You need to configure GCC with --disable-werror.

Comment 7 Teresa Johnson 2014-10-03 18:28:14 UTC

On Fri, Oct 3, 2014 at 10:33 AM, hjl.tools at gmail dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63432
>
> --- Comment #6 from H.J. Lu <hjl.tools at gmail dot com> ---
> (In reply to Teresa Johnson from comment #5)
>>
>> BTW, I am having a little trouble doing an LTO profiledbootstrap on my
>> linux/x86_64 machine. Here's what I did:
>>
>> $ /usr/local/google/home/tejohnson/gcc_trunk_7/configure
>> --prefix=/usr/local/google/home/tejohnson/gcc_trunk_7_validate_lto/bld-gcc/
>> install
>> --with-build-config=bootstrap-lto
>> $ make -j8 profiledbootstrap
>>
>
> You need to configure GCC with --disable-werror.

Did that, restarted the whole thing, still seeing undefined ref
errors. Here is an example:

/usr/local/google/home/tejohnson/gcc_trunk_7_validate_lto/bld-gcc/./prev-gcc/xg++
-B/usr/local/google/home/tejohnson/gcc_trunk_7_validate_lto/bld-gcc/./prev-gcc/
-B/usr/local/google/home/tejohnson/gcc_trunk_7_validate_lto/bld-gcc/install/x86_64-unknown-linux-gnu/bin/
-nostdinc++ -B/usr/local/google/home/tejohnson/gcc_trunk_7_validate_lto/bld-gcc/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
-B/usr/local/google/home/tejohnson/gcc_trunk_7_validate_lto/bld-gcc/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
 -I/usr/local/google/home/tejohnson/gcc_trunk_7_validate_lto/bld-gcc/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu
 -I/usr/local/google/home/tejohnson/gcc_trunk_7_validate_lto/bld-gcc/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include
 -I/usr/local/google/home/tejohnson/gcc_trunk_7/libstdc++-v3/libsupc++
-L/usr/local/google/home/tejohnson/gcc_trunk_7_validate_lto/bld-gcc/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
-L/usr/local/google/home/tejohnson/gcc_trunk_7_validate_lto/bld-gcc/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
  -g -O2 -flto=jobserver -frandom-seed=1 -fprofile-use -DIN_GCC
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -DGENERATOR_FILE
-static-libstdc++ -static-libgcc  -o build/genmodes \
            build/genmodes.o build/errors.o .././libiberty/libiberty.a
/usr/local/google/home/tejohnson/tmpdir/cc9adAmE.ltrans0.ltrans.o: In
function `main':
/usr/local/google/home/tejohnson/gcc_trunk_7/gcc/genchecksum.c:108:
undefined reference to `md5_init_ctx'
/usr/local/google/home/tejohnson/gcc_trunk_7/gcc/genchecksum.c:111:
undefined reference to `md5_finish_ctx'
/usr/local/google/home/tejohnson/tmpdir/cc9adAmE.ltrans0.ltrans.o: In
function `dosum':
/usr/local/google/home/tejohnson/gcc_trunk_7/gcc/genchecksum.c:40:
undefined reference to `fopen_unlocked'
/usr/local/google/home/tejohnson/gcc_trunk_7/gcc/genchecksum.c:81:
undefined reference to `md5_process_block'
/usr/local/google/home/tejohnson/gcc_trunk_7/gcc/genchecksum.c:86:
undefined reference to `md5_process_bytes'
/usr/local/google/home/tejohnson/gcc_trunk_7/gcc/genchecksum.c:43:
undefined reference to `xstrerror'
/usr/local/google/home/tejohnson/gcc_trunk_7/gcc/genchecksum.c:90:
undefined reference to `xstrerror'
/usr/local/google/home/tejohnson/gcc_trunk_7/gcc/genchecksum.c:50:
undefined reference to `xstrerror'
collect2: error: ld returned 1 exit status
make[3]: *** [build/genchecksum] Error 1

Anything obvious that am I doing wrong that could cause this? I see
that those functions should come from libiberty, and I see libiberty.a
in the above link.

In the meantime, if you have the gcda files and preprocessed source or
whatever I need to reproduce the failure and can attach them to the
bug that would be great.

Thanks,
Teresa

>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Comment 8 H.J. Lu 2014-10-03 19:47:59 UTC

Bug is in

static void 
recompute_probabilities (basic_block bb)
{
  edge esucc;
  edge_iterator ei;
  FOR_EACH_EDGE (esucc, ei, bb->succs)
    {    
      if (bb->count)
        esucc->probability = GCOV_COMPUTE_SCALE (esucc->count,
                                                 bb->count);
      if (esucc->probability > REG_BR_PROB_BASE)
        {

We got

(gdb) whatis esucc->probability
type = int
(gdb) p esucc->count
$7 = 2822215
(gdb) p bb->count
$8 = 1
(gdb) p esucc->probability 
$9 = -1842621072
(gdb) 

When count is set from frequency scaled up by REG_BR_PROB_BASE,
probability may overflow.

Comment 9 H.J. Lu 2014-10-03 19:58:22 UTC

This works for me:

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index d660fdb..c6f3956 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -874,7 +874,8 @@ recompute_probabilities (basic_block bb)
       if (bb->count)
         esucc->probability = GCOV_COMPUTE_SCALE (esucc->count,
                                                  bb->count);
-      if (esucc->probability > REG_BR_PROB_BASE)
+      if (esucc->probability < 0
+    || esucc->probability > REG_BR_PROB_BASE)
         {
     /* Can happen with missing/guessed probabilities, since we
        may determine that more is flowing along duplicated

Comment 10 Teresa Johnson 2014-10-03 20:18:30 UTC

On Fri, Oct 3, 2014 at 12:47 PM, hjl.tools at gmail dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63432
>
> H.J. Lu <hjl.tools at gmail dot com> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|UNCONFIRMED                 |NEW
>    Last reconfirmed|                            |2014-10-03
>      Ever confirmed|0                           |1
>
> --- Comment #8 from H.J. Lu <hjl.tools at gmail dot com> ---
> Bug is in
>
> static void
> recompute_probabilities (basic_block bb)
> {
>   edge esucc;
>   edge_iterator ei;
>   FOR_EACH_EDGE (esucc, ei, bb->succs)
>     {
>       if (bb->count)
>         esucc->probability = GCOV_COMPUTE_SCALE (esucc->count,
>                                                  bb->count);
>       if (esucc->probability > REG_BR_PROB_BASE)
>         {
>
> We got
>
> (gdb) whatis esucc->probability
> type = int
> (gdb) p esucc->count
> $7 = 2822215
> (gdb) p bb->count
> $8 = 1
> (gdb) p esucc->probability
> $9 = -1842621072

This points to a bigger problem, since esucc->count should be <=
bb->count. Can you attach the input files and command line and I will
take a closer look?

Thanks,
Teresa

> (gdb)
>
> When count is set from frequency scaled up by REG_BR_PROB_BASE,
> probability may overflow.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Comment 11 H.J. Lu 2014-10-03 20:23:56 UTC

(In reply to Teresa Johnson from comment #10)
>
> This points to a bigger problem, since esucc->count should be <=
> bb->count. Can you attach the input files and command line and I will
> take a closer look?
> 

Since the compressed input file is about 3MB, I can't upload it here.

Comment 12 Teresa Johnson 2014-10-03 20:36:43 UTC

Feel free to email it to me at tejohnson@google.com.
Teresa

On Fri, Oct 3, 2014 at 1:23 PM, hjl.tools at gmail dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63432
>
> --- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> ---
> (In reply to Teresa Johnson from comment #10)
>>
>> This points to a bigger problem, since esucc->count should be <=
>> bb->count. Can you attach the input files and command line and I will
>> take a closer look?
>>
>
> Since the compressed input file is about 3MB, I can't upload it here.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Comment 13 Teresa Johnson 2014-10-03 22:35:47 UTC

Thanks to H.J. for the test case, I have reproduced the issue. It
exposed two separate problems. Cc'ing Honza and Jeff for input on the
profile count and jump threading issues, respectively.

The first is that we are calling my new freqs_to_counts_path in cases
where the functions do have non-zero profile counts. I am invoking
this when either the profile status is != PROFILE_READ, or when the
entry block count is 0. The latter is the case where the read in
profile had zero counts for the function, so we kept the guessed
frequencies (see counts_to_freqs in predict.c). In some cases I am
finding the the profile count on the entry block is 0, but the rest of
the blocks (bb 2 and descendants) have non-zero profile counts. I
thought the way to check for this case was to look at the entry
block's count, and in fact that is what we seem to do in other places.
I could also check the entry block successor counts, or I could simply
check the blocks along the path (to see if they all have 0 counts but
non-0 frequencies). I don't want to check all bbs in the function for
every path. Honza, is there a good way to detect this case (function
is marked PROFILE_READ but has all-0 bb counts and we kept the
estimated frequencies)?

Even when I skip freqs_to_counts_path in this case, we still get the
insane edge probability. I dumped out some more verbose info while
jump threading, and I am seeing a jump threading path that I don't
understand. Since it violates my assumptions, the new profile update
computations go haywire. Here is the path that we are attempting to
thread:

(119, 150) incoming edge;  (150, 155) joiner;  (13, 15) normal;

Notice that the normal edge is not connected to the rest of the path.
This path appears to be constructed during jump threading, as block
155 was created by an earlier threading opportunity. In fact, the
earlier threadings that created bb 155 removed all predecessors of bb
13. We originally had

bb 150 with successors bb 12 and bb 13
bb 12 with successor bb 13.

Then we do:
  Threaded jump 12 --> 13 to 155
  Threaded jump 150 --> 13 to 155
and bb 13 has no more predecessors. So I'm not sure what it means to
have a jump threading path through 150 and 13?

Jeff, is the above type of jump threading path expected and
reasonable? If so, I need to redo some of the profile update code to
handle it better.

Thanks,
Teresa

On Fri, Oct 3, 2014 at 1:36 PM, Teresa Johnson <tejohnson@google.com> wrote:
> Feel free to email it to me at tejohnson@google.com.
> Teresa
>
> On Fri, Oct 3, 2014 at 1:23 PM, hjl.tools at gmail dot com
> <gcc-bugzilla@gcc.gnu.org> wrote:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63432
>>
>> --- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> ---
>> (In reply to Teresa Johnson from comment #10)
>>>
>>> This points to a bigger problem, since esucc->count should be <=
>>> bb->count. Can you attach the input files and command line and I will
>>> take a closer look?
>>>
>>
>> Since the compressed input file is about 3MB, I can't upload it here.
>>
>> --
>> You are receiving this mail because:
>> You are on the CC list for the bug.
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

Comment 14 Teresa Johnson 2014-10-04 06:16:24 UTC

On Fri, Oct 3, 2014 at 3:35 PM, Teresa Johnson <tejohnson@google.com> wrote:
> Thanks to H.J. for the test case, I have reproduced the issue. It
> exposed two separate problems. Cc'ing Honza and Jeff for input on the
> profile count and jump threading issues, respectively.
>
> The first is that we are calling my new freqs_to_counts_path in cases
> where the functions do have non-zero profile counts. I am invoking
> this when either the profile status is != PROFILE_READ, or when the
> entry block count is 0. The latter is the case where the read in
> profile had zero counts for the function, so we kept the guessed
> frequencies (see counts_to_freqs in predict.c). In some cases I am
> finding the the profile count on the entry block is 0, but the rest of
> the blocks (bb 2 and descendants) have non-zero profile counts. I
> thought the way to check for this case was to look at the entry
> block's count, and in fact that is what we seem to do in other places.
> I could also check the entry block successor counts, or I could simply
> check the blocks along the path (to see if they all have 0 counts but
> non-0 frequencies). I don't want to check all bbs in the function for
> every path. Honza, is there a good way to detect this case (function
> is marked PROFILE_READ but has all-0 bb counts and we kept the
> estimated frequencies)?
>
> Even when I skip freqs_to_counts_path in this case, we still get the
> insane edge probability. I dumped out some more verbose info while
> jump threading, and I am seeing a jump threading path that I don't
> understand. Since it violates my assumptions, the new profile update
> computations go haywire. Here is the path that we are attempting to
> thread:
>
> (119, 150) incoming edge;  (150, 155) joiner;  (13, 15) normal;
>
> Notice that the normal edge is not connected to the rest of the path.
> This path appears to be constructed during jump threading, as block
> 155 was created by an earlier threading opportunity. In fact, the
> earlier threadings that created bb 155 removed all predecessors of bb
> 13. We originally had
>
> bb 150 with successors bb 12 and bb 13
> bb 12 with successor bb 13.
>
> Then we do:
>   Threaded jump 12 --> 13 to 155
>   Threaded jump 150 --> 13 to 155
> and bb 13 has no more predecessors. So I'm not sure what it means to
> have a jump threading path through 150 and 13?
>
> Jeff, is the above type of jump threading path expected and
> reasonable? If so, I need to redo some of the profile update code to
> handle it better.

I see what happened - 155 is a duplicate of 13 from the earlier
threading. So the path that was originally:

(119, 150) incoming edge;  (150, 13) joiner;  (13, 15) normal;

became

(119, 150) incoming edge;  (150, 155) joiner;  (13, 15) normal;

This happened naturally when we redirected the incoming edge 150->13
to 155 on the earlier jump threading. During the same threading we
created duplicate bb 154 of bb 15, and redirect bb 155 to bb 154. This
redirection does not affect the original 13->15 edge, and thus 13->15
remains on the above path. The 155->154 edge should be the new edge in
the path instead of 13->15. But the edges recorded in other paths are
not explicitly updated, so the normal edge was left alone.

This works out ok from a correctness standpoint, since bb 13 still
exists and has the same outgoing edges as it did before, which were
duplicated onto bb 154. However, those outgoing edges have had their
profile counts updated and it no longer works out to use those profile
counts to update the counts along this new threading path. We really
want to look at 155->154 for the profile to update.

Probably we should be updating remaining jump threading paths when we
perform jump threading. I need to think about the best way to do that.

Teresa

>
> Thanks,
> Teresa
>
>
> On Fri, Oct 3, 2014 at 1:36 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> Feel free to email it to me at tejohnson@google.com.
>> Teresa
>>
>> On Fri, Oct 3, 2014 at 1:23 PM, hjl.tools at gmail dot com
>> <gcc-bugzilla@gcc.gnu.org> wrote:
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63432
>>>
>>> --- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> ---
>>> (In reply to Teresa Johnson from comment #10)
>>>>
>>>> This points to a bigger problem, since esucc->count should be <=
>>>> bb->count. Can you attach the input files and command line and I will
>>>> take a closer look?
>>>>
>>>
>>> Since the compressed input file is about 3MB, I can't upload it here.
>>>
>>> --
>>> You are receiving this mail because:
>>> You are on the CC list for the bug.
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

Comment 15 Teresa Johnson 2014-10-04 15:34:38 UTC

On Fri, Oct 3, 2014 at 11:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Fri, Oct 3, 2014 at 3:35 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> Thanks to H.J. for the test case, I have reproduced the issue. It
>> exposed two separate problems. Cc'ing Honza and Jeff for input on the
>> profile count and jump threading issues, respectively.
>>
>> The first is that we are calling my new freqs_to_counts_path in cases
>> where the functions do have non-zero profile counts. I am invoking
>> this when either the profile status is != PROFILE_READ, or when the
>> entry block count is 0. The latter is the case where the read in
>> profile had zero counts for the function, so we kept the guessed
>> frequencies (see counts_to_freqs in predict.c). In some cases I am
>> finding the the profile count on the entry block is 0, but the rest of
>> the blocks (bb 2 and descendants) have non-zero profile counts. I
>> thought the way to check for this case was to look at the entry
>> block's count, and in fact that is what we seem to do in other places.
>> I could also check the entry block successor counts, or I could simply
>> check the blocks along the path (to see if they all have 0 counts but
>> non-0 frequencies). I don't want to check all bbs in the function for
>> every path. Honza, is there a good way to detect this case (function
>> is marked PROFILE_READ but has all-0 bb counts and we kept the
>> estimated frequencies)?

I am also seeing cases where the entry bb 0 has a non-zero count, but
the rest of the bbs in the function have 0 counts and estimated
frequencies. So I think the safest thing to do here, and what I have
implemented, is if the profile status is PROFILE_READ, then check all
the bbs/edges along the path - if and only if they all have 0 counts
and there is at least one non-zero frequency do we do the
freqs_to_counts.

>>
>> Even when I skip freqs_to_counts_path in this case, we still get the
>> insane edge probability. I dumped out some more verbose info while
>> jump threading, and I am seeing a jump threading path that I don't
>> understand. Since it violates my assumptions, the new profile update
>> computations go haywire. Here is the path that we are attempting to
>> thread:
>>
>> (119, 150) incoming edge;  (150, 155) joiner;  (13, 15) normal;
>>
>> Notice that the normal edge is not connected to the rest of the path.
>> This path appears to be constructed during jump threading, as block
>> 155 was created by an earlier threading opportunity. In fact, the
>> earlier threadings that created bb 155 removed all predecessors of bb
>> 13. We originally had
>>
>> bb 150 with successors bb 12 and bb 13
>> bb 12 with successor bb 13.
>>
>> Then we do:
>>   Threaded jump 12 --> 13 to 155
>>   Threaded jump 150 --> 13 to 155
>> and bb 13 has no more predecessors. So I'm not sure what it means to
>> have a jump threading path through 150 and 13?
>>
>> Jeff, is the above type of jump threading path expected and
>> reasonable? If so, I need to redo some of the profile update code to
>> handle it better.
>
> I see what happened - 155 is a duplicate of 13 from the earlier
> threading. So the path that was originally:
>
> (119, 150) incoming edge;  (150, 13) joiner;  (13, 15) normal;
>
> became
>
> (119, 150) incoming edge;  (150, 155) joiner;  (13, 15) normal;
>
> This happened naturally when we redirected the incoming edge 150->13
> to 155 on the earlier jump threading. During the same threading we
> created duplicate bb 154 of bb 15, and redirect bb 155 to bb 154. This
> redirection does not affect the original 13->15 edge, and thus 13->15
> remains on the above path. The 155->154 edge should be the new edge in
> the path instead of 13->15. But the edges recorded in other paths are
> not explicitly updated, so the normal edge was left alone.
>
> This works out ok from a correctness standpoint, since bb 13 still
> exists and has the same outgoing edges as it did before, which were
> duplicated onto bb 154. However, those outgoing edges have had their
> profile counts updated and it no longer works out to use those profile
> counts to update the counts along this new threading path. We really
> want to look at 155->154 for the profile to update.
>
> Probably we should be updating remaining jump threading paths when we
> perform jump threading. I need to think about the best way to do that.

Looking at the code, I think it attempts to check for this case and
prevent it but that code does not work in this case because of the
order the paths are identified and subsequently iterated in the paths
vec. In mark_threaded_blocks it looks at paths with joiners and if
there are any edges along the path already part of another path it
will skip that path. But in this case, we registered the paths in this
order:

  Registering jump thread: (119, 150) incoming edge;  (150, 13)
joiner;  (13, 15) normal;
...
  Registering jump thread: (150, 13) incoming edge;  (13, 15) joiner;
(15, 17) normal;

For the first path, the path is attached to incoming edge 119->150. So
when we walk edges in the 2nd path we don't see the first path. If we
looked at the paths in the reverse order we would have seen the path
attached to incoming edge 150->13 and skipped the 119->150 path. Note
that we end up doing the actual threading in the reverse order - first
we do the threading through 13 (the second path), then later the
threading through 150 (the first path), leading to the issue I am
seeing.

Jeff, what is intended here - should we not be threading both of these paths?

Thanks,
Teresa

Comment 16 Teresa Johnson 2014-10-04 19:30:01 UTC

On Sat, Oct 4, 2014 at 8:34 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Fri, Oct 3, 2014 at 11:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Fri, Oct 3, 2014 at 3:35 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>> Thanks to H.J. for the test case, I have reproduced the issue. It
>>> exposed two separate problems. Cc'ing Honza and Jeff for input on the
>>> profile count and jump threading issues, respectively.
>>>
>>> The first is that we are calling my new freqs_to_counts_path in cases
>>> where the functions do have non-zero profile counts. I am invoking
>>> this when either the profile status is != PROFILE_READ, or when the
>>> entry block count is 0. The latter is the case where the read in
>>> profile had zero counts for the function, so we kept the guessed
>>> frequencies (see counts_to_freqs in predict.c). In some cases I am
>>> finding the the profile count on the entry block is 0, but the rest of
>>> the blocks (bb 2 and descendants) have non-zero profile counts. I
>>> thought the way to check for this case was to look at the entry
>>> block's count, and in fact that is what we seem to do in other places.
>>> I could also check the entry block successor counts, or I could simply
>>> check the blocks along the path (to see if they all have 0 counts but
>>> non-0 frequencies). I don't want to check all bbs in the function for
>>> every path. Honza, is there a good way to detect this case (function
>>> is marked PROFILE_READ but has all-0 bb counts and we kept the
>>> estimated frequencies)?
>
> I am also seeing cases where the entry bb 0 has a non-zero count, but
> the rest of the bbs in the function have 0 counts and estimated
> frequencies. So I think the safest thing to do here, and what I have
> implemented, is if the profile status is PROFILE_READ, then check all
> the bbs/edges along the path - if and only if they all have 0 counts
> and there is at least one non-zero frequency do we do the
> freqs_to_counts.
>
>>>
>>> Even when I skip freqs_to_counts_path in this case, we still get the
>>> insane edge probability. I dumped out some more verbose info while
>>> jump threading, and I am seeing a jump threading path that I don't
>>> understand. Since it violates my assumptions, the new profile update
>>> computations go haywire. Here is the path that we are attempting to
>>> thread:
>>>
>>> (119, 150) incoming edge;  (150, 155) joiner;  (13, 15) normal;
>>>
>>> Notice that the normal edge is not connected to the rest of the path.
>>> This path appears to be constructed during jump threading, as block
>>> 155 was created by an earlier threading opportunity. In fact, the
>>> earlier threadings that created bb 155 removed all predecessors of bb
>>> 13. We originally had
>>>
>>> bb 150 with successors bb 12 and bb 13
>>> bb 12 with successor bb 13.
>>>
>>> Then we do:
>>>   Threaded jump 12 --> 13 to 155
>>>   Threaded jump 150 --> 13 to 155
>>> and bb 13 has no more predecessors. So I'm not sure what it means to
>>> have a jump threading path through 150 and 13?
>>>
>>> Jeff, is the above type of jump threading path expected and
>>> reasonable? If so, I need to redo some of the profile update code to
>>> handle it better.
>>
>> I see what happened - 155 is a duplicate of 13 from the earlier
>> threading. So the path that was originally:
>>
>> (119, 150) incoming edge;  (150, 13) joiner;  (13, 15) normal;
>>
>> became
>>
>> (119, 150) incoming edge;  (150, 155) joiner;  (13, 15) normal;
>>
>> This happened naturally when we redirected the incoming edge 150->13
>> to 155 on the earlier jump threading. During the same threading we
>> created duplicate bb 154 of bb 15, and redirect bb 155 to bb 154. This
>> redirection does not affect the original 13->15 edge, and thus 13->15
>> remains on the above path. The 155->154 edge should be the new edge in
>> the path instead of 13->15. But the edges recorded in other paths are
>> not explicitly updated, so the normal edge was left alone.
>>
>> This works out ok from a correctness standpoint, since bb 13 still
>> exists and has the same outgoing edges as it did before, which were
>> duplicated onto bb 154. However, those outgoing edges have had their
>> profile counts updated and it no longer works out to use those profile
>> counts to update the counts along this new threading path. We really
>> want to look at 155->154 for the profile to update.
>>
>> Probably we should be updating remaining jump threading paths when we
>> perform jump threading. I need to think about the best way to do that.
>
> Looking at the code, I think it attempts to check for this case and
> prevent it but that code does not work in this case because of the
> order the paths are identified and subsequently iterated in the paths
> vec. In mark_threaded_blocks it looks at paths with joiners and if
> there are any edges along the path already part of another path it
> will skip that path. But in this case, we registered the paths in this
> order:
>
>   Registering jump thread: (119, 150) incoming edge;  (150, 13)
> joiner;  (13, 15) normal;
> ...
>   Registering jump thread: (150, 13) incoming edge;  (13, 15) joiner;
> (15, 17) normal;
>
> For the first path, the path is attached to incoming edge 119->150. So
> when we walk edges in the 2nd path we don't see the first path. If we
> looked at the paths in the reverse order we would have seen the path
> attached to incoming edge 150->13 and skipped the 119->150 path. Note
> that we end up doing the actual threading in the reverse order - first
> we do the threading through 13 (the second path), then later the
> threading through 150 (the first path), leading to the issue I am
> seeing.
>
> Jeff, what is intended here - should we not be threading both of these paths?

I have a patch to make the mark_threaded_blocks checking of paths work
regardless of the ordering of paths in the vec. This fixes the
failure.

The other approach is whenever we finish threading a path, go through
the vec of remaining paths and update the edges for any that have been
affected by the threading and that should instead include the
duplicated edges.

Teresa

>
> Thanks,
> Teresa
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

Comment 17 Teresa Johnson 2014-10-07 05:16:34 UTC

I'm going to finish testing my patch, which passes regular
x86_64-unknown-linux-gnu bootstrap + regression tests. I am still
trying to get the lto profiledbootstrap to work. I found some
workarounds for the undefined reference issue on Honza's blog, and I
am trying to get that working. Then I will send to gcc-patches for
review.

In the meantime, H.J. do you want to try the patch? It is below.

Thanks,
Teresa

2014-10-06  Teresa Johnson  <tejohnson@google.com>

        * tree-ssa-threadupdate.c (estimated_freqs_path): New function.
        (ssa_fix_duplicate_block_edges): Invoke it.
        (mark_threaded_blocks): Make two passes to avoid ordering dependences.

Index: tree-ssa-threadupdate.c
===================================================================
--- tree-ssa-threadupdate.c     (revision 215830)
+++ tree-ssa-threadupdate.c     (working copy)
@@ -959,6 +959,43 @@ update_joiner_offpath_counts (edge epath, basic_bl
 }


+/* Check if the paths through RD all have estimated frequencies but zero
+   profile counts.  This is more accurate than checking the entry block
+   for a zero profile count, since profile insanities sometimes creep in.  */
+
+static bool
+estimated_freqs_path (struct redirection_data *rd)
+{
+  edge e = rd->incoming_edges->e;
+  vec<jump_thread_edge *> *path = THREAD_PATH (e);
+  edge ein;
+  edge_iterator ei;
+  bool non_zero_freq = false;
+  FOR_EACH_EDGE (ein, ei, e->dest->preds)
+    {
+      if (ein->count)
+        return false;
+      non_zero_freq |= ein->src->frequency != 0;
+    }
+
+  for (unsigned int i = 1; i < path->length (); i++)
+    {
+      edge epath = (*path)[i]->e;
+      if (epath->src->count)
+        return false;
+      non_zero_freq |= epath->src->frequency != 0;
+      edge esucc;
+      FOR_EACH_EDGE (esucc, ei, epath->src->succs)
+        {
+          if (esucc->count)
+            return false;
+          non_zero_freq |= esucc->src->frequency != 0;
+        }
+    }
+  return non_zero_freq;
+}
+
+
 /* Invoked for routines that have guessed frequencies and no profile
    counts to record the block and edge frequencies for paths through RD
    in the profile count fields of those blocks and edges.  This is because
@@ -1058,9 +1095,11 @@ ssa_fix_duplicate_block_edges (struct redirection_
      data we first take a snapshot of the existing block and edge frequencies
      by copying them into the empty profile count fields.  These counts are
      then used to do the incremental updates, and cleared at the end of this
-     routine.  */
+     routine.  If the function is marked as having a profile, we still check
+     to see if the paths through RD are using estimated frequencies because
+     the routine had zero profile counts.  */
   bool do_freqs_to_counts = (profile_status_for_fn (cfun) != PROFILE_READ
-                             || !ENTRY_BLOCK_PTR_FOR_FN (cfun)->count);
+                             || estimated_freqs_path (rd));
   if (do_freqs_to_counts)
     freqs_to_counts_path (rd);

@@ -2077,35 +2116,48 @@ mark_threaded_blocks (bitmap threaded_blocks)

   /* Now iterate again, converting cases where we want to thread
      through a joiner block, but only if no other edge on the path
-     already has a jump thread attached to it.  */
+     already has a jump thread attached to it.  We do this in two passes,
+     to avoid situations where the order in the paths vec can hide overlapping
+     threads (the path is recorded on the incoming edge, so we would miss
+     cases where the second path starts at a downstream edge on the same
+     path).  First record all joiner paths, deleting any in the unexpected
+     case where there is already a path for that incoming edge.  */
   for (i = 0; i < paths.length (); i++)
     {
       vec<jump_thread_edge *> *path = paths[i];

       if ((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK)
+        {
+         /* Attach the path to the starting edge if none is yet recorded.  */
+          if ((*path)[0]->e->aux == NULL)
+            (*path)[0]->e->aux = path;
+         else if (dump_file && (dump_flags & TDF_DETAILS))
+           dump_jump_thread_path (dump_file, *path, false);
+        }
+    }
+  /* Second, look for paths that have any other jump thread attached to
+     them, and either finish converting them or cancel them.  */
+  for (i = 0; i < paths.length (); i++)
+    {
+      vec<jump_thread_edge *> *path = paths[i];
+      edge e = (*path)[0]->e;
+
+      if ((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK && e->aux == path)
        {
          unsigned int j;
-
-         for (j = 0; j < path->length (); j++)
+         for (j = 1; j < path->length (); j++)
            if ((*path)[j]->e->aux != NULL)
              break;

          /* If we iterated through the entire path without exiting the loop,
-            then we are good to go, attach the path to the starting edge.  */
+            then we are good to go, record it.  */
          if (j == path->length ())
-           {
-             edge e = (*path)[0]->e;
-             e->aux = path;
-             bitmap_set_bit (tmp, e->dest->index);
-           }
+           bitmap_set_bit (tmp, e->dest->index);
          else if (dump_file && (dump_flags & TDF_DETAILS))
-           {
-             dump_jump_thread_path (dump_file, *path, false);
-           }
+           dump_jump_thread_path (dump_file, *path, false);
        }
     }

-
   /* If optimizing for size, only thread through block if we don't have
      to duplicate it or it's an otherwise empty redirection block.  */
   if (optimize_function_for_size_p (cfun))

Comment 18 Teresa Johnson 2014-10-07 15:54:07 UTC

On Mon, Oct 6, 2014 at 10:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
> I'm going to finish testing my patch, which passes regular
> x86_64-unknown-linux-gnu bootstrap + regression tests. I am still
> trying to get the lto profiledbootstrap to work. I found some
> workarounds for the undefined reference issue on Honza's blog, and I

Nevermind, that was for something else, the issue was that I needed to
upgrade my binutils from 2.22 to 2.24. Now the LTO profiledbootstrap
is humming along.

Once I get a round of testing in I will send for review.

Thanks,
Teresa

> am trying to get that working. Then I will send to gcc-patches for
> review.
>
> In the meantime, H.J. do you want to try the patch? It is below.
>
> Thanks,
> Teresa
>
> 2014-10-06  Teresa Johnson  <tejohnson@google.com>
>
>         * tree-ssa-threadupdate.c (estimated_freqs_path): New function.
>         (ssa_fix_duplicate_block_edges): Invoke it.
>         (mark_threaded_blocks): Make two passes to avoid ordering dependences.
>
> Index: tree-ssa-threadupdate.c
> ===================================================================
> --- tree-ssa-threadupdate.c     (revision 215830)
> +++ tree-ssa-threadupdate.c     (working copy)
> @@ -959,6 +959,43 @@ update_joiner_offpath_counts (edge epath, basic_bl
>  }
>
>
> +/* Check if the paths through RD all have estimated frequencies but zero
> +   profile counts.  This is more accurate than checking the entry block
> +   for a zero profile count, since profile insanities sometimes creep in.  */
> +
> +static bool
> +estimated_freqs_path (struct redirection_data *rd)
> +{
> +  edge e = rd->incoming_edges->e;
> +  vec<jump_thread_edge *> *path = THREAD_PATH (e);
> +  edge ein;
> +  edge_iterator ei;
> +  bool non_zero_freq = false;
> +  FOR_EACH_EDGE (ein, ei, e->dest->preds)
> +    {
> +      if (ein->count)
> +        return false;
> +      non_zero_freq |= ein->src->frequency != 0;
> +    }
> +
> +  for (unsigned int i = 1; i < path->length (); i++)
> +    {
> +      edge epath = (*path)[i]->e;
> +      if (epath->src->count)
> +        return false;
> +      non_zero_freq |= epath->src->frequency != 0;
> +      edge esucc;
> +      FOR_EACH_EDGE (esucc, ei, epath->src->succs)
> +        {
> +          if (esucc->count)
> +            return false;
> +          non_zero_freq |= esucc->src->frequency != 0;
> +        }
> +    }
> +  return non_zero_freq;
> +}
> +
> +
>  /* Invoked for routines that have guessed frequencies and no profile
>     counts to record the block and edge frequencies for paths through RD
>     in the profile count fields of those blocks and edges.  This is because
> @@ -1058,9 +1095,11 @@ ssa_fix_duplicate_block_edges (struct redirection_
>       data we first take a snapshot of the existing block and edge frequencies
>       by copying them into the empty profile count fields.  These counts are
>       then used to do the incremental updates, and cleared at the end of this
> -     routine.  */
> +     routine.  If the function is marked as having a profile, we still check
> +     to see if the paths through RD are using estimated frequencies because
> +     the routine had zero profile counts.  */
>    bool do_freqs_to_counts = (profile_status_for_fn (cfun) != PROFILE_READ
> -                             || !ENTRY_BLOCK_PTR_FOR_FN (cfun)->count);
> +                             || estimated_freqs_path (rd));
>    if (do_freqs_to_counts)
>      freqs_to_counts_path (rd);
>
> @@ -2077,35 +2116,48 @@ mark_threaded_blocks (bitmap threaded_blocks)
>
>    /* Now iterate again, converting cases where we want to thread
>       through a joiner block, but only if no other edge on the path
> -     already has a jump thread attached to it.  */
> +     already has a jump thread attached to it.  We do this in two passes,
> +     to avoid situations where the order in the paths vec can hide overlapping
> +     threads (the path is recorded on the incoming edge, so we would miss
> +     cases where the second path starts at a downstream edge on the same
> +     path).  First record all joiner paths, deleting any in the unexpected
> +     case where there is already a path for that incoming edge.  */
>    for (i = 0; i < paths.length (); i++)
>      {
>        vec<jump_thread_edge *> *path = paths[i];
>
>        if ((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK)
> +        {
> +         /* Attach the path to the starting edge if none is yet recorded.  */
> +          if ((*path)[0]->e->aux == NULL)
> +            (*path)[0]->e->aux = path;
> +         else if (dump_file && (dump_flags & TDF_DETAILS))
> +           dump_jump_thread_path (dump_file, *path, false);
> +        }
> +    }
> +  /* Second, look for paths that have any other jump thread attached to
> +     them, and either finish converting them or cancel them.  */
> +  for (i = 0; i < paths.length (); i++)
> +    {
> +      vec<jump_thread_edge *> *path = paths[i];
> +      edge e = (*path)[0]->e;
> +
> +      if ((*path)[1]->type == EDGE_COPY_SRC_JOINER_BLOCK && e->aux == path)
>         {
>           unsigned int j;
> -
> -         for (j = 0; j < path->length (); j++)
> +         for (j = 1; j < path->length (); j++)
>             if ((*path)[j]->e->aux != NULL)
>               break;
>
>           /* If we iterated through the entire path without exiting the loop,
> -            then we are good to go, attach the path to the starting edge.  */
> +            then we are good to go, record it.  */
>           if (j == path->length ())
> -           {
> -             edge e = (*path)[0]->e;
> -             e->aux = path;
> -             bitmap_set_bit (tmp, e->dest->index);
> -           }
> +           bitmap_set_bit (tmp, e->dest->index);
>           else if (dump_file && (dump_flags & TDF_DETAILS))
> -           {
> -             dump_jump_thread_path (dump_file, *path, false);
> -           }
> +           dump_jump_thread_path (dump_file, *path, false);
>         }
>      }
>
> -
>    /* If optimizing for size, only thread through block if we don't have
>       to duplicate it or it's an otherwise empty redirection block.  */
>    if (optimize_function_for_size_p (cfun))
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

Comment 19 Jeffrey A. Law 2014-10-07 18:06:34 UTC

On 10/04/14 09:34, Teresa Johnson wrote:
>
> Looking at the code, I think it attempts to check for this case and
> prevent it but that code does not work in this case because of the
> order the paths are identified and subsequently iterated in the paths
> vec. In mark_threaded_blocks it looks at paths with joiners and if
> there are any edges along the path already part of another path it
> will skip that path. But in this case, we registered the paths in this
> order:
>
>    Registering jump thread: (119, 150) incoming edge;  (150, 13)
> joiner;  (13, 15) normal;
> ...
>    Registering jump thread: (150, 13) incoming edge;  (13, 15) joiner;
> (15, 17) normal;
>
> For the first path, the path is attached to incoming edge 119->150. So
> when we walk edges in the 2nd path we don't see the first path. If we
> looked at the paths in the reverse order we would have seen the path
> attached to incoming edge 150->13 and skipped the 119->150 path. Note
> that we end up doing the actual threading in the reverse order - first
> we do the threading through 13 (the second path), then later the
> threading through 150 (the first path), leading to the issue I am
> seeing.
>
> Jeff, what is intended here - should we not be threading both of these paths?
The code in mark_threaded_jumps looks for cases where there is a jump 
threading path with a joiner block is subsumed by a jump threading path 
without a joiner block and it will suppress the jump threading path with 
the joiner.  We had more suppressed paths in the past, but they mostly 
got ripped out as the implementation solidified.

In the case shown above, we have two paths with joiner blocks so the 
suppression code in mark_threaded_blocks does not fire and we attempt to 
realize both jump threading paths.

 From a correctness standpoint the code should handle this case; 
however, there may be strong reasons to suppress one path or the other 
from a code bloat and/or performance standpoint.

I typically have to draw out the CFG and the intermediate steps to gain 
clarity into the desired result and there's not enough info here to do 
that.   My gut tells me suppression of one jump thread path is 
desirable, we aren't going to have anything good to guide which path to 
suppress, the vast majority of the time it won't matter which path is 
suppressed.  So I'm comfortable with suppressing either path.

jeff

Comment 20 Jeffrey A. Law 2014-10-07 18:09:16 UTC

On 10/04/14 13:29, Teresa Johnson wrote:
>> Jeff, what is intended here - should we not be threading both of these paths?
>
> I have a patch to make the mark_threaded_blocks checking of paths work
> regardless of the ordering of paths in the vec. This fixes the
> failure.
This seems like a better solution.  It'll decrease unnecessary block 
copying.

>
> The other approach is whenever we finish threading a path, go through
> the vec of remaining paths and update the edges for any that have been
> affected by the threading and that should instead include the
> duplicated edges.
That'd probably work too, but I suspect there's not much, if any, 
benefit to keeping both paths.

Jeff

Comment 21 Teresa Johnson 2014-10-07 18:58:34 UTC

On Tue, Oct 7, 2014 at 11:08 AM, Jeff Law <law@redhat.com> wrote:
> On 10/04/14 13:29, Teresa Johnson wrote:
>>>
>>> Jeff, what is intended here - should we not be threading both of these
>>> paths?
>>
>>
>> I have a patch to make the mark_threaded_blocks checking of paths work
>> regardless of the ordering of paths in the vec. This fixes the
>> failure.
>
> This seems like a better solution.  It'll decrease unnecessary block
> copying.

Ok, great. The patch I posted had a bug I introduced when cleaning it
up that was caught by my LTO profiledbootstrap in that I wasn't
resetting the e->aux pointer back to NULL when canceling the jump
thread. I am kicking off the testing again and should hopefully have a
final patch to send to gcc-patches later today.

>
>>
>> The other approach is whenever we finish threading a path, go through
>> the vec of remaining paths and update the edges for any that have been
>> affected by the threading and that should instead include the
>> duplicated edges.
>
> That'd probably work too, but I suspect there's not much, if any, benefit to
> keeping both paths.

Yeah, in this case the extra threading worked from a correctness
standpoint but doesn't make sense from a logical or performance
standpoint.

> I typically have to draw out the CFG and the intermediate steps to gain clarity into the desired result and there's not enough info here to do that.

I'm glad I'm not the only one who has to draw it out every time!

Thanks,
Teresa

>
> Jeff

Comment 22 tejohnson 2014-10-09 04:38:56 UTC

Author: tejohnson
Date: Thu Oct  9 04:38:24 2014
New Revision: 216024

URL: https://gcc.gnu.org/viewcvs?rev=216024&root=gcc&view=rev
Log:
2014-10-07  Teresa Johnson  <tejohnson@google.com>

	PR bootstrap/63432.
	* tree-ssa-threadupdate.c (estimated_freqs_path): New function.
	(ssa_fix_duplicate_block_edges): Invoke it.
	(mark_threaded_blocks): Make two passes to avoid ordering dependences.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-threadupdate.c

Comment 23 H.J. Lu 2014-10-09 19:52:29 UTC

With r216039, I still got

../../src-trunk/gcc/cp/parser.c: In function ‘cp_parser_declaration_seq_opt’:
../../src-trunk/gcc/cp/parser.c:11049:0: error: verify_flow_info: Wrong probability of edge 53->54 -1192228368
 cp_parser_declaration_seq_opt (cp_parser* parser)
 ^
../../src-trunk/gcc/cp/parser.c:11049:0: internal compiler error: verify_flow_info failed
0x6b2e3a verify_flow_info()
        ../../src-trunk/gcc/cfghooks.c:260
0xddcd27 cleanup_tree_cfg_noloop
        ../../src-trunk/gcc/tree-cfgcleanup.c:765
0xddcd27 cleanup_tree_cfg()
        ../../src-trunk/gcc/tree-cfgcleanup.c:814
0xbf5804 execute_function_todo
        ../../src-trunk/gcc/passes.c:1704
0xc02287 do_per_function
        ../../src-trunk/gcc/passes.c:1478
0xc02287 execute_todo
        ../../src-trunk/gcc/passes.c:1808
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
make[7]: *** [/tmp/cchv0APR.ltrans3.ltrans.o] Error 1
make[7]: *** Waiting for unfinished jobs....
lto-wrapper: fatal error: make returned 2 exit status
compilation terminated.
/usr/local/bin/ld: lto-wrapper failed
collect2: error: ld returned 1 exit status
make[6]: *** [cc1plus] Error 1

Comment 24 Teresa Johnson 2014-10-09 20:38:03 UTC

On Thu, Oct 9, 2014 at 12:52 PM, hjl.tools at gmail dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63432
>
> --- Comment #23 from H.J. Lu <hjl.tools at gmail dot com> ---
> With r216039, I still got
>
> ../../src-trunk/gcc/cp/parser.c: In function ‘cp_parser_declaration_seq_opt’:
> ../../src-trunk/gcc/cp/parser.c:11049:0: error: verify_flow_info: Wrong
> probability of edge 53->54 -1192228368
>  cp_parser_declaration_seq_opt (cp_parser* parser)
>  ^
> ../../src-trunk/gcc/cp/parser.c:11049:0: internal compiler error:
> verify_flow_info failed
> 0x6b2e3a verify_flow_info()
>         ../../src-trunk/gcc/cfghooks.c:260
> 0xddcd27 cleanup_tree_cfg_noloop
>         ../../src-trunk/gcc/tree-cfgcleanup.c:765
> 0xddcd27 cleanup_tree_cfg()
>         ../../src-trunk/gcc/tree-cfgcleanup.c:814
> 0xbf5804 execute_function_todo
>         ../../src-trunk/gcc/passes.c:1704
> 0xc02287 do_per_function
>         ../../src-trunk/gcc/passes.c:1478
> 0xc02287 execute_todo
>         ../../src-trunk/gcc/passes.c:1808
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <http://gcc.gnu.org/bugs.html> for instructions.
> make[7]: *** [/tmp/cchv0APR.ltrans3.ltrans.o] Error 1
> make[7]: *** Waiting for unfinished jobs....
> lto-wrapper: fatal error: make returned 2 exit status
> compilation terminated.
> /usr/local/bin/ld: lto-wrapper failed
> collect2: error: ld returned 1 exit status
> make[6]: *** [cc1plus] Error 1

Arg, looks very similar so maybe another instance of the duplicate
threading is slipping through? My own lto profiled bootstrap succeeded
with my patch. I will try updating to r216039 and redo it to see if I
can provoke the same failure.

Teresa

>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Comment 25 Teresa Johnson 2014-10-10 02:11:47 UTC

Unfortunately I can't reproduce this failure. Here's what I did:

In my gcc source:
% svn update -r r216039

In my build directory:
% ~/gcc_trunk_7/configure
--prefix=/usr/local/google/home/tejohnson/gcc_trunk_7_validate_lto/bld-gcc/install
--disable-werror --with-build-config=bootstrap-lto --enable-checking
% make -j8 profiledbootstrap

I might need you to email me the input files again. Sorry about that.
Teresa

On Thu, Oct 9, 2014 at 1:37 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Thu, Oct 9, 2014 at 12:52 PM, hjl.tools at gmail dot com
> <gcc-bugzilla@gcc.gnu.org> wrote:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63432
>>
>> --- Comment #23 from H.J. Lu <hjl.tools at gmail dot com> ---
>> With r216039, I still got
>>
>> ../../src-trunk/gcc/cp/parser.c: In function ‘cp_parser_declaration_seq_opt’:
>> ../../src-trunk/gcc/cp/parser.c:11049:0: error: verify_flow_info: Wrong
>> probability of edge 53->54 -1192228368
>>  cp_parser_declaration_seq_opt (cp_parser* parser)
>>  ^
>> ../../src-trunk/gcc/cp/parser.c:11049:0: internal compiler error:
>> verify_flow_info failed
>> 0x6b2e3a verify_flow_info()
>>         ../../src-trunk/gcc/cfghooks.c:260
>> 0xddcd27 cleanup_tree_cfg_noloop
>>         ../../src-trunk/gcc/tree-cfgcleanup.c:765
>> 0xddcd27 cleanup_tree_cfg()
>>         ../../src-trunk/gcc/tree-cfgcleanup.c:814
>> 0xbf5804 execute_function_todo
>>         ../../src-trunk/gcc/passes.c:1704
>> 0xc02287 do_per_function
>>         ../../src-trunk/gcc/passes.c:1478
>> 0xc02287 execute_todo
>>         ../../src-trunk/gcc/passes.c:1808
>> Please submit a full bug report,
>> with preprocessed source if appropriate.
>> Please include the complete backtrace with any bug report.
>> See <http://gcc.gnu.org/bugs.html> for instructions.
>> make[7]: *** [/tmp/cchv0APR.ltrans3.ltrans.o] Error 1
>> make[7]: *** Waiting for unfinished jobs....
>> lto-wrapper: fatal error: make returned 2 exit status
>> compilation terminated.
>> /usr/local/bin/ld: lto-wrapper failed
>> collect2: error: ld returned 1 exit status
>> make[6]: *** [cc1plus] Error 1
>
> Arg, looks very similar so maybe another instance of the duplicate
> threading is slipping through? My own lto profiled bootstrap succeeded
> with my patch. I will try updating to r216039 and redo it to see if I
> can provoke the same failure.
>
> Teresa
>
>>
>> --
>> You are receiving this mail because:
>> You are on the CC list for the bug.
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

Comment 26 H.J. Lu 2014-10-10 15:15:26 UTC

I will find a testcase when it fails again.

Comment 27 H.J. Lu 2014-10-13 15:53:06 UTC

(In reply to Teresa Johnson from comment #24)

> Arg, looks very similar so maybe another instance of the duplicate
> threading is slipping through? My own lto profiled bootstrap succeeded
> with my patch. I will try updating to r216039 and redo it to see if I
> can provoke the same failure.
> 

I sent you another testcase against r216150.

Comment 28 Teresa Johnson 2014-10-13 21:33:27 UTC

On Mon, Oct 13, 2014 at 8:53 AM, hjl.tools at gmail dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63432
>
> --- Comment #27 from H.J. Lu <hjl.tools at gmail dot com> ---
> (In reply to Teresa Johnson from comment #24)
>
>> Arg, looks very similar so maybe another instance of the duplicate
>> threading is slipping through? My own lto profiled bootstrap succeeded
>> with my patch. I will try updating to r216039 and redo it to see if I
>> can provoke the same failure.
>>
>
> I sent you another testcase against r216150.

Thanks for the testcase, I reproduced it. It is a case of garbage in /
garbage out. The fre2 pass is introducing some big profile count
insanities, leading to the probability insanity being introduced when
we try to use the counts to compute the new probability in
recompute_probabilities. There is already handling for really large
probabilities due to this issue, and we need to add the same thing for
negative probabilities - essentially the patch you had originally
suggested for the first problem which wasn't necessary for that one
since that was an actually jump threading induced issue.

Will test that and send for review.

>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Comment 29 Teresa Johnson 2014-10-14 14:42:25 UTC

On Mon, Oct 13, 2014 at 2:32 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Mon, Oct 13, 2014 at 8:53 AM, hjl.tools at gmail dot com
> <gcc-bugzilla@gcc.gnu.org> wrote:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63432
>>
>> --- Comment #27 from H.J. Lu <hjl.tools at gmail dot com> ---
>> (In reply to Teresa Johnson from comment #24)
>>
>>> Arg, looks very similar so maybe another instance of the duplicate
>>> threading is slipping through? My own lto profiled bootstrap succeeded
>>> with my patch. I will try updating to r216039 and redo it to see if I
>>> can provoke the same failure.
>>>
>>
>> I sent you another testcase against r216150.
>
> Thanks for the testcase, I reproduced it. It is a case of garbage in /
> garbage out. The fre2 pass is introducing some big profile count
> insanities, leading to the probability insanity being introduced when
> we try to use the counts to compute the new probability in
> recompute_probabilities. There is already handling for really large
> probabilities due to this issue, and we need to add the same thing for
> negative probabilities - essentially the patch you had originally
> suggested for the first problem which wasn't necessary for that one
> since that was an actually jump threading induced issue.

Actually, I traced the initial profile insanity back to inlining. FRE
merely propagated it further.

I have a better overflow test done before the scaling, running it
through LTO profiledbootstrap then will send for review.

Teresa

>
> Will test that and send for review.
>
>>
>> --
>> You are receiving this mail because:
>> You are on the CC list for the bug.
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

Comment 30 tejohnson 2014-10-15 15:46:30 UTC

Author: tejohnson
Date: Wed Oct 15 15:45:59 2014
New Revision: 216269

URL: https://gcc.gnu.org/viewcvs?rev=216269&root=gcc&view=rev
Log:
2014-10-15  Teresa Johnson  <tejohnson@google.com>

	PR bootstrap/63432
	* tree-ssa-threadupdate.c (recompute_probabilities): Better
	overflow checking.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-threadupdate.c

Comment 31 Richard Biener 2014-11-24 13:36:20 UTC

Assuming fixed (this bug is horrid to follow - please open new bugs for new issues, not hijack old ones for same symptoms).