This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug c/68212] New: Loop unroller breaks basic block frequencies


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212

            Bug ID: 68212
           Summary: Loop unroller breaks basic block frequencies
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kelvin.nilsen at gmail dot com
  Target Milestone: ---

Hereâs a very simple program to demonstrate the existing problems:

void foo(double *d, unsigned long int n) {
  unsigned long int i;

  for (i=0;i<n;i++)
      d[i*2] =  0.0;
}

Compile using these options and examine the trace files (or insert your own
instrumentation) to verify my findings.

        gcc -S -m64 -O3 -fno-tree-vectorize -funroll-loops --param
max-unroll-times=4 -da loop.c

The trace file loop.c.212r.loop2_doloop shows the intermediate representation
immediately following loop unrolling.

Before unrolling, we have the following (iâm providing meaningful names to some
of the compiler-generated temporary variables):

 B0: /* the method prologue */
       /* fall through to B2 (100% probability) */

 B2: /* frequency: 900 */
       d[reg 183] = ap[48];  /* fetch incoming argument into register 183 */
       n[reg 184] = ap[56];
       ivtmp.6[reg 178] = d;
       end_of_d[reg 182] = ivtmp.6 + t;
       if (o < n) goto B3;   /* conditional jump to B3 with probability 91% */
       else goto B4;  /* âfall through" to B4 with probability 9% */

 B3: /* frequency: 819 */
       zero_preload[reg 187] = 0.0d;
       /* fall through to B5 with probability 100% */

        /* aside: the back edge of a loop is assumed to have probability 91%.
         * This has the effect of treating the body of the loop as if its 
         * frequency is the incoming edgeâs frequency divided by 0.09.
         */

 B5: /* frequency 9100 = 819 (from B3) + 8281 (from B6) */
       mem[ivtmp.6] = zero_preload;
       ivtmp.6 += 16;
       if (ivtmp.6 != end_of_d) goto B6;  /* probability 91%, frequency 8281 =
91% of 9100 */
       else goto B7;  /* probability 9%, frequency 819 = 9% of 9100 */

 B6: /* frequency 8281 */
       /* no code */
       goto B5;  /* "fall through" with 100% probability, frequency 8281 */

 B7: /* frequency 819 */
       /* no code */
       goto B4;  /* âfall throughâ with 100% probability, frequency 819 */

 B4: /* frequency 900 = 819 (from B7) + 81 (from B2) */
       /* no code */
       goto B1;  /* âfall throughâ with 100% probability, frequency 900 */

 B1: /* the method epilogue */

Note that for each basic block with multiple outgoing flows, the sum of the
probabilities on those outgoing flows always equals 100%.

Also, for any block that has multiple predecessors, the sum of the frequencies
associated with the incoming edges always equals the frequency of that block.

Hereâs what the transformed program looks like following loop unrolling:

B0: /* the method prologue */
      /* fall through to B2 (100% probability) */

 B2: /* frequency 900 */
       _16 = n << 4;
       ivtmp.6 = d;
       _17 = ivtmp.6 + _16;
       if (0 < n) goto B3 /* fall-through, probability = 91%, frequency 819 */
       else goto B4; /* probability 9%, frequency 81 */

 B3: /* frequency 819 */
       zero_preload = 0.0;
       goto B8;  /* fall through, probability 100%, frequency 819 */

 B8: /* frequency 819 */
       d_length = _17 - ivtmp.6;
       extended_length = d_length - 16;
       extended_element_count = extended_length >> 4;
       mod4_elements = extended_element_count & 0x03;
      goto B9;  /* fall through, probability 100%, frequency 819 */

 B9: /* frequency 819 */
       *((double *) ivtmp.6) = zero_preload;
       ivtmp.6 += 16;
       if (ivtmp.6 == _17) goto B10; /* probability 91%, frequency 745 */
      else goto B7; /* fall through, probability 9%, frequency 74 */

 B10: /* frequency 745 */
         /* no code */
         goto BB23;  /* fall through, probability 75%, frequency 559 */
         /* WRONG: probability should be 100%, frequency should be 745 */

 B23: /* frequency 497: WRONG: B10 is the only predecessor, frequency 
       * should be 559 based on existing B10 information 
       */
        if (mod4_elements == 0) goto B22; /* 25% probability, frequency 124 */
        else goto BB19; /* fall through, probability 100%, frequency 497. 
WRONG: should be probability 75%, frequency 373 */

 B19: /* frequency: 373.  INCONSISTENT with B23 data, but this is a correct
value by some standard. */
         if (mod4_elements == 1) goto BB18; /* 33% probability, frequency 123
*/
         else goto B15; /* fall-through, probability 100%, frequency 373. 
WRONG: should be probability 67%, frequency 250 */

 B15: /* frequency: 745.  WRONG: only predecessor is B19, should be 373 as B19
is currently described */
         if (mod4_elements == 2) goto B14; /* probability 50%, frequency = 373
*/
         else goto B11; /* fall through, probability 100%, frequency 745. 
WRONG: should be probability 50%, frequency 373 */

 B11: /* frequency 745 */
         /* note: mod4_elements == 3 */
         goto B12;  /* fall through, probability 100%, frequency = 745 */

 B12: /* frequency 745 */
         *((double *) ivtmp.6) = zero_preload;
         ivtmp.6 += 16;
         goto B13;  /* fall through, probability 100%, frequency 745 */

 B13: /* frequency 745 */
          /* no code */
          goto B14; */ fall through, probability 100%, frequency 745 */

 B14: /* frequency 745  WRONG: predecessors are B13 and B15, should be 1118 =
745 + 373 */
         /* no code */
         goto B16;  /* fall through, probability 100%, frequency 745 */

 B16: /* frequency 745 */
         *((double *) ivtmp.6) = zero_preload;
         ivtmp.6 += 16;
         goto B17; /* fall through, probability 100%, frequency 745 */

 B17: /* frequency 745 */
         goto B18; /* fall through, probability 100%, frequency 745 */

 B18: /* frequency 745. WRONG: predecessors are B19 and B17.  should be 868 =
123 + 745 */
         goto B20; /* fall through, probability 100%, frequency 745 */

 B20: /* frequency 745 */
         *((double *) ivtmp.6) = zero_preload;
         ivtmp.6 += 16;
         if (ivtmp.6 != _17) goto B21;  /* probability 91%, frequency 678 */
         else goto B7; /* fall through, 9%, frequency 67 */

 B21: /* frequency 678 */
         /* no code */
         goto B22; /* fall through, probability 100%, frequency 678 */

 B22: /* frequency 678. WRONG: should be 802 = 678 (predecessor B21) + 124
(predecessor B23) */
         goto B5; /* fall through, probability 100%, frequency 678 */

 B24: /* frequency 1884 */
         *((double *) ivtmp.6) = zero_preload;
         ivtmp.6 = unrolled_ivtmp_base + 16;
         goto B25; /* fall through, probability 100%, frequency 1884 */

 B25: /* frequency 1884 */
         /* no code */
         goto B26; /* fall through, probability 100%, frequency 1884 */

 B26: /* frequency 1884 */
         *((double *) ivtmp.6) = zero_preload;
         ivtmp.6 = unrolled_ivtmp_base + 32;
         goto B27;

 B27: /* frequency 1884 */
         /* no code */
         goto B28; /* fall through, probability 100%, frequency 1884 */

 B28: /* frequency 1884 */
         *((double *) ivtmp.6) = zero_preload;
         ivtmp.6 = unrolled_ivtmp_base + 48;
         if (ivtmp.6 != _17) goto B29; /* probability 91%, frequency 1715 */
         else goto B7;  /* fall through, probability 9%, frequency 170 */

 B29: /* frequency 1715 */
         /* no code */
         goto B5; /* fall through, probability 100%, frequency 1715 */

 B5: /* frequency 1884. WRONG: should be 2393 = 678 (predecessor B22) + 1715
(predecessor B29) */
       /* Note: as represented currently, the frequency of the original loop 
        * body has been divided by 4 as a consequence of unrolling the loop 
        * 4 times.  As part of our planned improvements, we intend to not 
        * divide the original frequency by the loop unroll factor.  This is 
        * because the rationale for the heuristic that calculates loop body 
        * frequencies applies just as well to the original and unrolled loops. 
        * 
        * It has been our experience at IBM that dividing the loop body 
        * frequencies by the loop unroll factor causes certain subsequent 
        * optimization passes to not recognize the unrolled loop body as 
        * âhot code" which would be deserving of further optimization.
        */
       *((double *) ivtmp.6) = zero_preload;
       unrolled_ivtmp_base = ivtmp.6 + 16;
       ivtmp.6 = unrolled_ivtmp_base;
       goto B6; /* fall through, probability 100%, frequency 1884 */

 B6: /* frequency 1884 */
       /* no code */
       goto B29; /* fall through, probability 100%, frequency 1884 */

 B7: /* frequency 819. WRONG: should be 311 = 67 (predecessor B20) + 170
(predecessor B28) + 74 (predecessor B9) */
       /* Note: by a different argument, 819 is the right value, because this
is the only exit from the loop, and there are 819
        * entries into the loop.
        */
       /* no code */
       goto B4; /* fall through, probability 100%, frequency 819 */

 B4: /* frequency 900 */
       /* no code */
       goto B1; /* fall through, probability 100%, frequency 900 */

 B1: /* method epilogue */

Though this very simple program is sufficient to demonstrate the problems, a
correct resolution should properly handle additional tests, such as:

  1. Loops that are bounded by compile-time constants that may or may not be 
     a multiple of the unroll factor
  2. Loops that are nested within other loops (with the understanding that
     the current implementation only unrolls the inner-most loop)
  3. Loops with multiple exits
  4. Loops with multiple entries (may not unroll, but solution should be
robust)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]