gfortran.fortran-testsuite/execute/retarray_2.f90 GCSE2 is deleting instructions that appear to be identical. The problem is that the address register is being updated by a PRE_INC, yet GCSE2 apparently believes the value is unchanging expr: (mem/s:SI (plus:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (reg:SI 0 0 [135])) [5 S4 A8]) hashcode: 8826 list of occurences: (insn:HI 45 44 46 1 (set (reg:SI 9 9 [137]) (mem/s:SI (plus:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (reg:SI 0 0 [135])) [5 S4 A8])) 265 {*movsi_internal1} (insn_lis t:REG_DEP_ANTI 113 (insn_list:REG_DEP_OUTPUT 112 (insn_list 111 (insn_list 96 (i nsn_list 44 (insn_list:REG_DEP_ANTI 114 (insn_list:REG_DEP_ANTI 84 (insn_list 13 2 (insn_list:REG_DEP_ANTI 99 (insn_list:REG_DEP_ANTI 25 (nil))))))))))) (nil)) (insn:HI 82 85 81 4 (set (reg:SI 9 9 [137]) (mem/s:SI (plus:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (reg:SI 0 0 [135])) [5 S4 A8])) 265 {*movsi_internal1} (insn_lis t 132 (insn_list:REG_DEP_ANTI 25 (nil))) (nil)) (insn:HI 97 96 98 5 (set (reg:SI 9 9 [137]) (mem/s:SI (plus:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (reg:SI 0 0 [135])) [5 S4 A8])) 265 {*movsi_internal1} (insn_lis t:REG_DEP_ANTI 83 (insn_list:REG_DEP_OUTPUT 82 (insn_list 96 (insn_list:REG_DEP_ ANTI 84 (insn_list 132 (insn_list:REG_DEP_ANTI 25 (nil))))))) (nil)) (insn:HI 112 111 113 6 (set (reg:SI 9 9 [137]) (mem/s:SI (plus:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (reg:SI 0 0 [135])) [5 S4 A8])) 265 {*movsi_internal1} (insn_lis t:REG_DEP_ANTI 98 (insn_list:REG_DEP_OUTPUT 97 (insn_list 96 (insn_list 111 (ins n_list:REG_DEP_ANTI 99 (insn_list 132 (insn_list:REG_DEP_ANTI 84 (insn_list:REG_ DEP_ANTI 25 (nil))))))))) (nil)) generating move from 9 to 9 on edge from 6 to 1 generating move from 9 to 9 on edge from 4 to 5 generating move from 9 to 9 on edge from 5 to 6 deleting insn: (insn/v:HI 45 44 46 1 (set (reg:SI 9 9 [137]) (mem/s:SI (plus:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (reg:SI 0 0 [135])) [5 S4 A8])) 265 {*movsi_internal1} (insn_lis t:REG_DEP_ANTI 113 (insn_list:REG_DEP_OUTPUT 112 (insn_list 111 (insn_list 96 (i nsn_list 44 (insn_list:REG_DEP_ANTI 114 (insn_list:REG_DEP_ANTI 84 (insn_list 13 2 (insn_list:REG_DEP_ANTI 99 (insn_list:REG_DEP_ANTI 25 (nil))))))))))) (nil)) deleting insn: (insn/v:HI 97 96 98 5 (set (reg:SI 9 9 [137]) (mem/s:SI (plus:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (reg:SI 0 0 [135])) [5 S4 A8])) 265 {*movsi_internal1} (insn_lis t:REG_DEP_ANTI 83 (insn_list:REG_DEP_OUTPUT 82 (insn_list 96 (insn_list:REG_DEP_ ANTI 84 (insn_list 132 (insn_list:REG_DEP_ANTI 25 (nil))))))) (nil)) deleting insn: (insn/v:HI 112 111 113 6 (set (reg:SI 9 9 [137]) (mem/s:SI (plus:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (reg:SI 0 0 [135])) [5 S4 A8])) 265 {*movsi_internal1} (insn_lis t:REG_DEP_ANTI 98 (insn_list:REG_DEP_OUTPUT 97 (insn_list 96 (insn_list 111 (ins n_list:REG_DEP_ANTI 99 (insn_list 132 (insn_list:REG_DEP_ANTI 84 (insn_list:REG_ DEP_ANTI 25 (nil))))))))) (nil)) GCSE AFTER RELOAD stats: copies inserted: 0 moves inserted: 3 insns deleted: 3 Deleted 3 trivially dead insns; 2 iterations In the previous pass, 24.postreload, one sees that r29 was incremented: (insn:HI 111 109 112 6 (set (reg:SI 11 11 [136]) (mem/s:SI (pre_inc:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118])) [5 b S4 A8])) 265 {*movsi_internal1} (insn_list:REG_DEP_ANTI 98 (insn_list:REG_DEP_AN TI 97 (insn_list:REG_DEP_ANTI 81 (insn_list:REG_DEP_ANTI 82 (insn_list 96 (insn_ list:REG_DEP_ANTI 99 (insn_list 132 (insn_list:REG_DEP_ANTI 84 (insn_list:REG_DE P_ANTI 25 (nil)))))))))) (expr_list:REG_INC (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (nil))) (insn:HI 112 111 113 6 (set (reg:SI 9 9 [137]) (mem/s:SI (plus:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (reg:SI 0 0 [135])) [5 S4 A8])) 265 {*movsi_internal1} (insn_lis t:REG_DEP_ANTI 98 (insn_list:REG_DEP_OUTPUT 97 (insn_list 96 (insn_list 111 (ins n_list:REG_DEP_ANTI 99 (insn_list 132 (insn_list:REG_DEP_ANTI 84 (insn_list:REG_ DEP_ANTI 25 (nil))))))))) (nil))
The problem appears to be the call to reset_opr_set_tables() in eliminate_partially_redundant_loads(). The last set information is computed in compute_hash_table_after_reload in gcse_after_reload_main() and then deleted in eliminate_partially_redundant_loads(), which tests the table.
oprs_unchanged_p() used to test reg_avail_info[] for last_bb versus current_bb, and then first_set/last_set versus CUID. Now it only tests the presence of reg_avail_info[], which now is cleared for each BB. case REG: { struct reg_avail_info *info = ®_avail_info[REGNO (x)]; if (info->last_bb != current_bb) return 1; if (avail_p) return info->last_set < INSN_CUID (insn); else return info->first_set >= INSN_CUID (insn); }
Seems I need to look at this again...
Created attachment 7134 [details] retarray_2.f90.t60.optimized
It seems that we need to look at autoincrement in postreload gcse. Before flow, there are no interesting autoincrements, but after reload, we do. So I think that the bug is that we do not look for autoincs (either in the insn or via REG_INC notes) and mark regs appearing in them as modified in mark_oprs_set.
David, I tried to reproduce this with a cross from amd64 to ppc-aix5.2, but unfortunately that didn't work. What are the exact command line options you passes to the compiler? As I said in an earlier comment for this PR, I think mark_oprs_set just needs to be taught that autoincrements count as register modifying operations. If I am right, the attached patch *should* fix the problem. But as I cannot reproduce the bug myself here so far, I cannot test the patch. Can you please give it a spin and see if it fixes the problem for you? Index: postreload-gcse.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/postreload-gcse.c,v retrieving revision 2.4 diff -c -3 -p -r2.4 postreload-gcse.c *** postreload-gcse.c 10 Sep 2004 11:02:26 -0000 2.4 --- postreload-gcse.c 14 Sep 2004 23:05:24 -0000 *************** oprs_unchanged_p (rtx x, rtx insn, bool *** 524,532 **** case POST_INC: case PRE_MODIFY: case POST_MODIFY: ! if (after_insn) ! return 0; ! break; default: break; --- 524,530 ---- case POST_INC: case PRE_MODIFY: case POST_MODIFY: ! return 0; default: break; *************** mark_clobber (rtx pat, rtx insn) *** 738,746 **** static void mark_oprs_set (rtx insn) { ! rtx pat = PATTERN (insn); int i; if (GET_CODE (pat) == SET) mark_set (pat, insn); --- 736,747 ---- static void mark_oprs_set (rtx insn) { ! rtx pat, note; int i; + /* Look at the pattern to see what REGs and MEMs may be set + by this insn. */ + pat = PATTERN (insn); if (GET_CODE (pat) == SET) mark_set (pat, insn); *************** mark_oprs_set (rtx insn) *** 762,767 **** --- 763,779 ---- else if (GET_CODE (pat) == CALL) mark_call (insn); + + /* Also record autoincremented REGs for this insn as changed. */ + for (note = REG_NOTES (insn); note; note = XEXP (note, 1)) + { + if (REG_NOTE_KIND (note) = REG_INC) + { + reg = XEXP (link, 0); + gcc_assert (REG_P (reg)); + record_last_reg_set_info (insn, REGNO (reg)); + } + } }
The commandline options are -O3 -funroll-loops. One can look at the numerous gfortran failures for AIX on the gcc-testresults mailinglist to see all of the examples that fails in this way. The problem is not whether the analysis recognizes the PRE_INC as modifying a register. The current code already does that. I tried your patch, after fixing the typos, and it does not solve the failure. I can see reg_avail_info[29], for example, being set correctly without your changes. oprs_changed_p() recursively walks into the INSN and sees the change to r29. The problem is that the information is deleted by reset_oprs_set_table() before beginning the eliminate_partially_redundant_loads() phase. The original code, and the version in gcse.c, allocates reg_set_bitmap and reg_avail_info, with the latter containing last_bb, first_set, and last_set. reg_avail_info is persistent in all of the gcse phases. reset_oprs_set_table() only clear reg_set_bitmap in gcse.c, unlike postreload-gcse.c that clears reg_avail_info. Your restructuring and simplification cannot work safely.
Given your dumps, (insn:HI 111 109 112 6 (set (reg:SI 11 11 [136]) (mem/s:SI (pre_inc:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118])) [5 b S4 A8])) 265 {*movsi_internal1} (insn_list:REG_DEP_ANTI 98 (insn_list:REG_DEP_AN TI 97 (insn_list:REG_DEP_ANTI 81 (insn_list:REG_DEP_ANTI 82 (insn_list 96 (insn_ list:REG_DEP_ANTI 99 (insn_list 132 (insn_list:REG_DEP_ANTI 84 (insn_list:REG_DE P_ANTI 25 (nil)))))))))) (expr_list:REG_INC (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (nil))) (insn:HI 112 111 113 6 (set (reg:SI 9 9 [137]) (mem/s:SI (plus:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (reg:SI 0 0 [135])) [5 S4 A8])) 265 {*movsi_internal1} (insn_lis t:REG_DEP_ANTI 98 (insn_list:REG_DEP_OUTPUT 97 (insn_list 96 (insn_list 111 (ins n_list:REG_DEP_ANTI 99 (insn_list 132 (insn_list:REG_DEP_ANTI 84 (insn_list:REG_ DEP_ANTI 25 (nil))))))))) (nil)) oprs_unchanged_p should return false for insn 112. Does it?
I should have said, oprs_unchanged_p (PATTERN(insn 112), insn 112, false) shoud be false.
Subject: Re: GCSE after reload replacing changing instructions >>>>> steven at gcc dot gnu dot org writes: steven> oprs_unchanged_p (PATTERN(insn 112), insn 112, false) steven> should return false for insn 112. Does it? No, it returns true. It records that r29 changed during the earlier analysis, but that information is cleared by the time it is called with after_insn=false. David
On Wednesday 15 September 2004 02:49, David Edelsohn wrote: > I already tried the change you suggested and it does not work. > That is not the main problem -- it may be another problem but it is not > the fundamental design flaw in the restructuring. You keep saying that, but I really, honestly believe there is no such restructuring problem. Let me explain. First of all, GCSE2 is misnamed. It is *not* global common sub- expression ellimination. A more appropriate name would be, ehm, only-just-not-local subexpression elimination. If you think it is a truely global pass, that probably fooled you into thinking that this restructuring is fundamentally broken. But that may in fact be not the problem at all (which doesn't mean that the bug wasn't my mistake, of course - but please read on...). The algorithm works as follows: a) Build the hash table of expressions (loads, if you will) that are available at the end of each basic block. Let us give it a name, AVAIL_OUT. The odd thing about postreload gcse is that AVAIL_OUT is only a *local* property. In classic GCSE, AVAIL_OUT is the union of available expressions at the block entry minus the expressions killed in the block: AVAIL_IN(B) = U AVAIL_OUT(S), S E pred(B) AVAIL_OUT(B) = (AVAIL_IN(B) U EXPR_GEN (B)) - KILL(B) This is clearly a global data flow problem, which needs to be solved using a fixed-point iteration. In postreload GCSE, this is not the case. Instead, AVAIL_OUT(B) is now the set of expressions generated in B that are not killed before the end of the block, AVAIL_IN(B) = emtpy AVAIL_OUT(B) = GEN_LOC(B) - KILL_LOC(B) in postreload GCSE This is computed as follows 1. Walk all instructions in the basic block from start to end, and record the last point where each reg is modified. 2. Walk all instructions again from start to end, and add the expressions to the hash table if the set destination of the expression is not set after the current instruction. Let's call the condition in step 2 "condition 1". The CUIDs are used to determine the value of the condition: if CUID(last_set)>CUID(insn) then the set in insn is killed in a later instruction - skip it. else add the set to AVAIL_OUT(B) where B is the basic block for insn. There is no global dataflow involved, it is merely a local property that tells you that the expression generated in insn in block B is not killed before the end of block B. An important consequence of this is that the expression is at least partially available in the successors of B. b) Perform the partial redundancy elimination. Again, postreload GCSE does not actually do classic partial redundancy elimination. Classic PRE uses global data flow analysis to determine AVAIL_OUT and ANTIC_IN. In postreload GCSE, ANTIC_IN(B) is empty (and in fact not even defined, as such). Instead, postreload uses an ad-hoc approach as follows: 1. Walk all instructions INSN in the basic block B, start to end. 2. If any of the operands of the expression generated by INSN have changed since the beginning of the block, ignore the expression, because even if it is in AVAIL_OUT of any pred(B) then it was killed in B before we saw insn. Let's call this "condition 2". 3. If the expression was not killed, see if it is available in in any predecessor of B, ie. in any E in AVAIL_OUT(pred(B)). If so, then E is partially redundant. Try to eliminate it. 4. Finally record all registers/mems killed by INSN, so that for the next instruction INSN in the walk of the instructions in B, we can determine condition 2. c) Delete redundant expressions. This part should not need explaining. Now, there are two important conclusions you can draw from this sketch of how this algorithm is supposed to work: 1) Assume some expression E is generated in block A by insn1, and in block B by insn2. If condition 1 is satisfied in block A, and condition 2 is satisfied in block B, and A is in pred(B), then the expression generated in insn1 is not killed on the path from insn1 to insn2, or, E is in AVAIL_OUT(E) and E is not killed in any insns between the head of block B and insn2. Therefore, the expression in insn1 is either partially or fully redundant, depending on if E is in AVAIL_OUT(E) for all pred(B), or only for some blocks in pred(B). But condition 1 and condition 2 are both locally computed (!), hence postreload gcse is in fact a local optimization. This is why the last_bb and first_set checks, that is necessary for the "real" global cse, are *not* necessary for postreload gcse. 2) In your bug, you have the following two insns: (insn:HI 111 109 112 6 (set (reg:SI 11 11 [136]) (mem/s:SI (pre_inc:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118])) [5 b S4 A8])) 265 {*movsi_internal1} (...)) (expr_list:REG_INC (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (nil))) (insn:HI 112 111 113 6 (set (reg:SI 9 9 [137]) (mem/s:SI (plus:SI (reg/f:SI 29 29 [orig:118 ivtmp.43 ] [118]) (reg:SI 0 0 [135])) [5 S4 A8])) 265 {*movsi_internal1} (...)) (nil)) both insn 111 and insn 112 are in basic block 6. You say that postreload gcse eliminates the load in insn 112, ignoring the pre_inc in insn 111. Now looking at the description of the algorithm given above, you see that condition 2 is not satisfied, since reg 29 has been modified between the start of basic block 6 and insn 112 in the same block. Therefore, even though (apparently, I haven't seen the full dump) the expression in insn 112 is also generated in at least one predecessor of basic block 6, the expression in insn 112 is not redundant. The bug is that condition 2 is somehow not being checked properly. Now, unfortunately my x86_64-suse-linux-gnu x powerpc-ibm-aix5.2 cross compiler refuses to reproduce the bug, so I'm sort of lost here. Could you please attach the full .postreload and .gcse2 dumps to the bug report?
Created attachment 7140 [details] postreload dump file
Created attachment 7141 [details] postreload gcse dump file
Subject: Re: [4.0 Regression] GCSE after reload replacing changing instructions Steven, One thing that stands out from looking at the code is the routines record_last_set_info and mark_set don't appear to be taking any note of any REG_INC notes that may be present within the MEM's Should the existing code: if (REG_P (dest)) record_last_reg_set_info (insn, REGNO (dest)); else if (MEM_P (dest)) record_last_mem_set_info (insn); contain this following check for registers mentioned in REG_INC notes? note = find_reg_note (insn, REG_INC, NULL_RTX); if (note) record_last_reg_set_info (insn, REGNO (XEXP (note, 0))); Graham
Subject: Re: [4.0 Regression] GCSE after reload replacing changing instructions On Wednesday 15 September 2004 21:00, graham dot stott at btinternet dot com wrote: > ------- Additional Comments From graham dot stott at btinternet dot com > 2004-09-15 19:00 ------- Subject: Re: [4.0 Regression] GCSE after reload > replacing changing instructions > > Steven, > > One thing that stands out from looking at the code is the routines > > record_last_set_info and mark_set don't appear to be taking any note > of any REG_INC notes that may be present within the MEM's > > Should the existing code: > > if (REG_P (dest)) > record_last_reg_set_info (insn, REGNO (dest)); > else if (MEM_P (dest)) > record_last_mem_set_info (insn); > > contain this following check for registers mentioned in REG_INC notes? > > note = find_reg_note (insn, REG_INC, NULL_RTX); > if (note) > record_last_reg_set_info (insn, REGNO (XEXP (note, 0))); That is what I thought too, see my patch in comment #6. But apparently that does not fix the bug...
Subject: Re: [4.0 Regression] GCSE after reload replacing changing instructions Note that the patch as written cannot compile. Maybe something else in my attempt to fix the patch prevented it from working correctly. David
Created attachment 7147 [details] Proposed patch I still need to figure out if this is correct or not. But according to David, this patch does cure the bug.
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01783.html Awaiting review.
Subject: Bug 17482 CVSROOT: /cvs/gcc Module name: gcc Changes by: steven@gcc.gnu.org 2004-09-21 07:48:34 Modified files: gcc : ChangeLog postreload-gcse.c Log message: PR rtl-optimization/17482 * postreload-gcse.c (reg_avail_info, oprs_unchanged_p, load_killed_in_block_p): Clarify comments. (record_last_reg_set_info): Make static inline. (mark_call, mark_set, mark_clobber, mark_oprs_set): Remove. (record_opr_changes): New function to replace the above. (compute_hash_table): Clarify comments. Use record_opr_changes. (reg_set_between_after_reload_p): Clean up. (reg_used_between_after_reload_p): Likewise. (eliminate_partially_redundant_load): Clarify comments. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.5545&r2=2.5546 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/postreload-gcse.c.diff?cvsroot=gcc&r1=2.4&r2=2.5
.