This is the mail archive of the
`gcc@gcc.gnu.org`
mailing list for the GCC project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

Other format: | [Raw text] |

*From*: Kelvin Nilsen <kdnilsen at linux dot vnet dot ibm dot com>*To*: gcc at gcc dot gnu dot org*Date*: Fri, 11 Sep 2015 15:40:47 -0600*Subject*: Planned work on loop unrolling frequencies*Authentication-results*: sourceware.org; auth=none

Iâm writing to alert the community to my intention to implement improvements to the calculations of basic block frequencies during loop unrolling optimizations. My work is sponsored by IBM and is motivated by an observation that certain loop optimizations are being confused by the incorrect block frequencies that are being reported currently by gcc as a result of existing errors in the loop unrolling implementation. Hereâs a very simple program to demonstrate the existing problems: void foo(double *d, unsigned long int n) { unsigned long int i; for (i=0;i<n;i++) d[i*2] = 0.0; } Compile using these options and examine the trace files (or insert your own instrumentation) to verify my findings. gcc -S -m64 -O3 -fno-tree-vectorize -funroll-loops --param max-unroll-times=4 -da loop.c The trace file loop.c.212r.loop2_invariant displays the intermediate representation (rt.) immediately before loop unrolling and the trace file loop.c.213r.loop2_unroll shows the intermediate representation immediately following loop unrolling. Before unrolling, we have the following (iâm providing meaningful names to some of the compiler-generated temporary variables): B0: /* the method prologue */ /* fall through to B2 (100% probability) */ B2: /* frequency: 900 */ d[reg 183] = ap[48]; /* fetch incoming argument into register 183 */ n[reg 184] = ap[56]; ivtmp.6[reg 178] = d; end_of_d[reg 182] = ivtmp.6 + t; if (o < n) goto B3; /* conditional jump to B3 with probability 91% */ else goto B4; /* âfall through" to B4 with probability 9% */ B3: /* frequency: 819 */ zero_preload[reg 187] = 0.0d; /* fall through to B5 with probability 100% */ /* aside: the back edge of a loop is assumed to have probability 91% This has the * effect of treating the body of the loop as if its frequency is the incoming edgeâs * frequency divided by 0.09. */ B5: /* frequency 9100 = 819 (from B3) + 8281 (from B6) */ mem[ivtmp.6] = zero_preload; ivtmp.6 += 16; if (ivtmp.6 != end_of_d) goto B6; /* probability 91%, frequency 8281 = 91% of 9100 */ else goto B7; /* probability 9%, frequency 819 = 9% of 9100 */ B6: /* frequency 8281 */ /* no code */ goto B5; /* "fall through" with 100% probability, frequency 8281 */ B7: /* frequency 819 */ /* no code */ goto B4; /* âfall throughâ with 100% probability, frequency 819 */ B4: /* frequency 900 = 819 (from B7) + 81 (from B2) */ /* no code */ goto B1; /* âfall throughâ with 100% probability, frequency 900 */ B1: /* the method epilogue */ Note that for each basic block with multiple outgoing flows, the sum of the probabilities on those outgoing flows always equals 100%. Also, for any block that has multiple predecessors, the sum of the frequencies associated with the incoming edges always equals the frequency of that block. Hereâs what the transformed program looks like following loop unrolling: B0: /* the method prologue */ /* fall through to B2 (100% probability) */ B2: /* frequency 900 */ _16 = n << 4; ivtmp.6 = d; _17 = ivtmp.6 + _16; if (0 < n) goto B3 /* fall-through, probability = 91%, frequency 819 */ else goto B4; /* probability 9%, frequency 81 */ B3: /* frequency 819 */ zero_preload = 0.0; goto B8; /* fall through, probability 100%, frequency 819 */ B8: /* frequency 819 */ d_length = _17 - ivtmp.6; extended_length = d_length - 16; extended_element_count = extended_length >> 4; mod4_elements = extended_element_count & 0x03; goto B9; /* fall through, probability 100%, frequency 819 */ B9: /* frequency 819 */ *((double *) ivtmp.6) = zero_preload; ivtmp.6 += 16; if (ivtmp.6 == _17) goto B10; /* probability 91%, frequency 745 */ else goto B7; /* fall through, probability 9%, frequency 74 */ B10: /* frequency 745 */ /* no code */ goto BB23; /* fall through, probability 75%, frequency 559 */ /* WRONG: probability should be 100%, frequency should be 745 */ B23: /* frequency 497: WRONG: B10 is the only predecessor, frequency should be 559 based on existing B10 information */ if (mod4_elements == 0) goto B22; /* 25% probability, frequency 124 */ else goto BB19; /* fall through, probability 100%, frequency 497. WRONG: should be probability 75%, frequency 373 */ B19: /* frequency: 373. INCONSISTENT with B23 data, but this is a correct value by some standard. */ if (mod4_elements == 1) goto BB18; /* 33% probability, frequency 123 */ else goto B15; /* fall-through, probability 100%, frequency 373. WRONG: should be probability 67%, frequency 250 */ B15: /* frequency: 745. WRONG: only predecessor is B19, should be 373 as B19 is currently described */ if (mod4_elements == 2) goto B14; /* probability 50%, frequency = 373 */ else goto B11; /* fall through, probability 100%, frequency 745. WRONG: should be probability 50%, frequency 373 */ B11: /* frequency 745 */ /* note: mod4_elements == 3 */ goto B12; /* fall through, probability 100%, frequency = 745 */ B12: /* frequency 745 */ *((double *) ivtmp.6) = zero_preload; ivtmp.6 += 16; goto B13; /* fall through, probability 100%, frequency 745 */ B13: /* frequency 745 */ /* no code */ goto B14; */ fall through, probability 100%, frequency 745 */ B14: /* frequency 745 WRONG: predecessors are B13 and B15, should be 1118 = 745 + 373 */ /* no code */ goto B16; /* fall through, probability 100%, frequency 745 */ B16: /* frequency 745 */ *((double *) ivtmp.6) = zero_preload; ivtmp.6 += 16; goto B17; /* fall through, probability 100%, frequency 745 */ B17: /* frequency 745 */ goto B18; /* fall through, probability 100%, frequency 745 */ B18: /* frequency 745. WRONG: predecessors are B19 and B17. should be 868 = 123 + 745 */ goto B20; /* fall through, probability 100%, frequency 745 */ B20: /* frequency 745 */ *((double *) ivtmp.6) = zero_preload; ivtmp.6 += 16; if (ivtmp.6 != _17) goto B21; /* probability 91%, frequency 678 */ else goto B7; /* fall through, 9%, frequency 67 */ B21: /* frequency 678 */ /* no code */ goto B22; /* fall through, probability 100%, frequency 678 */ B22: /* frequency 678. WRONG: should be 802 = 678 (predecessor B21) + 124 (predecessor B23) */ goto B5; /* fall through, probability 100%, frequency 678 */ B24: /* frequency 1884 */ *((double *) ivtmp.6) = zero_preload; ivtmp.6 = unrolled_ivtmp_base + 16; goto B25; /* fall through, probability 100%, frequency 1884 */ B25: /* frequency 1884 */ /* no code */ goto B26; /* fall through, probability 100%, frequency 1884 */ B26: /* frequency 1884 */ *((double *) ivtmp.6) = zero_preload; ivtmp.6 = unrolled_ivtmp_base + 32; goto B27; B27: /* frequency 1884 */ /* no code */ goto B28; /* fall through, probability 100%, frequency 1884 */ B28: /* frequency 1884 */ *((double *) ivtmp.6) = zero_preload; ivtmp.6 = unrolled_ivtmp_base + 48; if (ivtmp.6 != _17) goto B29; /* probability 91%, frequency 1715 */ else goto B7; /* fall through, probability 9%, frequency 170 */ B29: /* frequency 1715 */ /* no code */ goto B5; /* fall through, probability 100%, frequency 1715 */ B5: /* frequency 1884. WRONG: should be 2393 = 678 (predecessor B22) + 1715 (predecessor B29) */ /* Note: as represented currently, the frequency of the original loop body has been divided by 4 as a consequence of * unrolling the loop 4 times. As part of our planned improvements, we intend to not divide the original frequency by * the loop unroll factor. This is because the same heuristic for calculating loop body frequencies can be applied both * to the original and unrolled loop. Also, it has been our experience that dividing the loop body frequencies by the loop * unroll factor causes certain subsequent optimization passes to not recognize the unrolled loop body as âhot codeâ * which is deserving of further optimization. * * When we eventually get around to doing similar improvements for the count field associated with basic blocks, we would * expect to scale count values according to the loop unroll factor. This is because count values are based on actual * measurements rather than heuristic approximations. */ *((double *) ivtmp.6) = zero_preload; unrolled_ivtmp_base = ivtmp.6 + 16; ivtmp.6 = unrolled_ivtmp_base; goto B6; /* fall through, probability 100%, frequency 1884 */ B6: /* frequency 1884 */ /* no code */ goto B29; /* fall through, probability 100%, frequency 1884 */ B7: /* frequency 819. WRONG: should be 311 = 67 (predecessor B20) + 170 (predecessor B28) + 74 (predecessor B9) */ /* Note: by a different argument, 819 is the right value, because this is the only exit from the loop, and there are 819 * entries into the loop. */ /* no code */ goto B4; /* fall through, probability 100%, frequency 819 */ B4: /* frequency 900 */ /* no code */ goto B1; /* fall through, probability 100%, frequency 900 */ B1: /* method epilogue */ Though this very simple program is sufficient to demonstrate the problems, I will be testing my solution with a variety of other test programs to make sure I handle distinctions between loops that iterate over a constant number of iterations vs. those that iterate over a number of iterations known only at run time, and also will look at various conditional control structures embedded within the loops, and will also look at an inner-nested loop, and at loops with multiple exits. Iâm guessing that loops with multiple entries are not unrolled, but Iâll confirm this with testing and make sure that the implementation does not explode in the presence of unnatural loops. At the moment, I am not planning to make any changes to the maintenance of the count fields associated with each basic block during loop unrolling. After the frequency fixes are completed, tested, and integrated, I plan to examine whether accounting associated with the count fields also deserves attention, and will address that separately. Also, we have noticed that there are some opportunities to improve the code generation templates that are used for unrolling loops and may address those with a future patch, but I am not planning to make any changes to code generation at the current time. I invite any guidance or suggestions on how to proceed. Iâm also hoping that if someone else is already in the middle of addressing these shortcomings, youâll alert me so that we can avoid redundant efforts. Thanks much.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |