Basically these two functions should produce the same code, the check for NotFound is not needed at all. void g(); void h(); void o(); void t(int l, int *y) { int i; int NotFound; o(); for(i = 0, NotFound = 1;NotFound && i<l;i++) { if (y[i]) { NotFound = 0; g(); } } h(); } void t1(int l, int *y) { int i; int NotFound; o(); for(i = 0, NotFound = 1;NotFound && i<l;i++) { if (y[i]) { NotFound = 0; g(); break; } } h(); }
This should be able to be done on the tree-ssa.
This is a problem with jump threading as if you change the fcuntion t to be: void t(int l, int *y) { int i; int NotFound; o(); for(i = 0, NotFound = 1;i<l;i++) { if (y[i]) { NotFound = 0; g(); } if (!NotFound) break; } h(NotFound); } It works correctly but changing it to: void t(int l, int *y) { int i; int NotFound; o(); for(i = 0, NotFound = 1;i<l;i++) { if (!NotFound) break; if (y[i]) { NotFound = 0; g(); } } h(NotFound); } It does not.
Subject: Re: loop not fully optimized In message <20040128220328.24048.qmail@sources.redhat.com>, "pinskia at gcc dot gnu dot org" writes: > >------- Additional Comments From pinskia at gcc dot gnu dot org 2004-01-28 2 >2:03 ------- >This is a problem with jump threading as if you change the fcuntion t to be: >void t(int l, int *y) >{ > int i; > int NotFound; > o(); > for(i = 0, NotFound = 1;i<l;i++) > { > if (y[i]) > { > NotFound = 0; > g(); > } > if (!NotFound) > break; > } > h(NotFound); >} > >It works correctly but changing it to: >void t(int l, int *y) >{ > int i; > int NotFound; > o(); > for(i = 0, NotFound = 1;i<l;i++) > { > if (!NotFound) > break; > if (y[i]) > { > NotFound = 0; > g(); > } > } > h(NotFound); >} >It does not. Right. This is actually something I had already started looking at and addressing it is going to be a reasonable amount of work. To correctly handle this we need to record temporary equivalences for every PHI we pass through during the dominator walk so that a temporary equivalence created by a PHI in an earlier block can be used to thread through a later block. Right now we only record temporary equivalences for the PHI in the block we want to thread through. If we look at the dump if your code we have: # BLOCK 0 # PRED: ENTRY [100.0%] (fallthru,exec) # MT.4_17 = VDEF <MT.4_16>; o (); goto <bb 5> (<L4>); # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 1 # PRED: 5 [95.0%] (true,exec) <L0>:; if (NotFound_3 == 0) goto <L5>; else goto <L1>; # SUCC: 2 [95.0%] (false,exec) 6 [5.0%] (true,exec) # BLOCK 2 # PRED: 1 [95.0%] (false,exec) <L1>:; i.0_7 = (unsigned int)i_1; T.1_8 = i.0_7 * 4; T.2_10 = T.1_8 + y_9; # VUSE <MT.4_15>; T.3_11 = *T.2_10; if (T.3_11 != 0) goto <L2>; else goto <L3>; # SUCC: 4 [70.0%] (false,exec) 3 [30.0%] (true,exec) # BLOCK 3 # PRED: 2 [30.0%] (true,exec) <L2>:; # MT.4_19 = VDEF <MT.4_15>; g (); # SUCC: 4 [100.0%] (fallthru,exec) # BLOCK 4 # PRED: 2 [70.0%] (false,exec) 3 [100.0%] (fallthru,exec) # MT.4_14 = PHI <MT.4_15(2), MT.4_19(3)>; # NotFound_2 = PHI <NotFound_3(2), 0(3)>; <L3>:; i_12 = i_1 + 1; # SUCC: 5 [100.0%] (fallthru,dfs_back,exec) # BLOCK 5 # PRED: 4 [100.0%] (fallthru,dfs_back,exec) 0 [100.0%] (fallthru,exec) # MT.4_15 = PHI <MT.4_17(0), MT.4_14(4)>; # NotFound_3 = PHI <1(0), NotFound_2(4)>; # i_1 = PHI <0(0), i_12(4)>; <L4>:; if (i_1 < l_6) goto <L0>; else goto <L5>; # SUCC: 6 [5.0%] (false,exec) 1 [95.0%] (true,exec) # BLOCK 6 # PRED: 5 [5.0%] (false,exec) 1 [5.0%] (true,exec) <L5>:; # MT.4_18 = VDEF <MT.4_15>; h (NotFound_3); return; # SUCC: EXIT [100.0%] } We need to record a temporary equivalence NotFound_3 = 0 when we start processing block #5 so that we can use that equivalence to eliminate the test in block #1. To do this effectively we really need a temporary equivalence table which is maintained through the dominator walk for the sole use by the jump threader -- these equivalences can't be used for const/copy propagation for example since they are specific to a particular path through the dominator tree. Note this will not fix the first testcase you mention in the PR -- something more complex is necessary to analyze the behavior of NotFound. See "Beyond Induction Variables" from Gerlek, Stolz & Wolfe -- IIRC their analysis would be able to determine the behavior of NotFound in the loop and optimize the code appropriately. I think everyone would agree that having the kind of analysis mentioned in that paper needs to be on the "plan". jeff
The orginal testcase in comment #0 is fixed now but not the one in comment #2.
Could it be that this is now fixed?
(In reply to comment #5) > Could it be that this is now fixed? Nope, the second testcase in comment #2 is still very obvious missing an optimization with respect with jump threading: cmpw cr7,r31,r30 li r3,0 cmpwi cr6,r3,0 bne+ cr7,L6 L6: beq- cr6,L4 That is only time I have seen an missed optimization that is so obvious.
This is a loop optimization issue, not a jump threading bug. To really optimize this loop well we will likely need the kinds of analysis found in "Beyond Induction Variables", the classic paper describing loop flip-flops and other interesting loop variables that are not simple induction variables. Anyway, I'm removing this from the set of bugs related to jump threading. Jeff
Loop header copying allows dom to duplicate the latch and thus we can now eliminate the NotFound check: ;; Function t (t) t (l, y) { int i; <bb 2>: o (); if (l > 0) goto <L15>; else goto <L4>; <L15>:; i = 0; Invalid sum of incoming frequencies 7098, should be 9100 <L0>:; if (MEM[base: y, index: (unsigned int) i, step: 4] != 0) goto <L1>; else goto <L2>; <L1>:; g (); goto <bb 7> (<L4>); <L2>:; i = i + 1; if (l > i) goto <L0>; else goto <L4>; Invalid sum of incoming frequencies 2902, should be 900 <L4>:; h () [tail call]; return; } So, fixed. Since 4.1 actually.
> So, fixed. Since 4.1 actually. No, the example in comment #2 is not fixed.
These all are optimized now so closing as fixed.