[Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3

hubicka at gcc dot gnu dot org gcc-bugzilla@gcc.gnu.org
Wed Feb 6 13:29:00 GMT 2008



------- Comment #17 from hubicka at gcc dot gnu dot org  2008-02-06 13:28 -------
One problem is the following:
  do {
        ;
        match = window + cur_match;
        if (match[best_len] != scan_end ||
            match[best_len-1] != scan_end1 ||
            *match != *scan ||
            *++match != scan[1]) continue;
        scan += 2, match++;
        do {
        } while (*++scan == *++match && *++scan == *++match &&
                 *++scan == *++match && *++scan == *++match &&
                 *++scan == *++match && *++scan == *++match &&
                 *++scan == *++match && *++scan == *++match &&
                 scan < strend);

....

The internal loop is the string comparsion thingy, while the branch prediction
logic completely misses it: the continue statement looks like it is forming 4
nested loops, so it concludes that this is the internal loop.

We used to have prediction heuristic guessing that continue statement is not
used to form a loop.  This was killed when gimplification was introduced. 
Perhaps we should bring it back, since this is resonably common scenario.

Looking at longest_match in not unrolled version, the "loops" formed by
continue statement has frequencies: 298, 961, 2139, 3100, 6900, 1000
so every loop is predicted to iterate about twice.

The outer real loop now gets frequency 92, ie small enough to be predicted as
cold.  The string comparsion loop now get freuqnecy 344, predicted to iterate 3
times (quite realistically).  But because the frequency is so small we end up
allocating one of the two pointers in memory:
.L9:
        leal    1(%ecx), %eax
        movl    %eax, -16(%ebp)
        movzbl  1(%ecx), %eax
        cmpb    1(%edx), %al
        jne     .L8
        leal    2(%ecx), %eax
        movl    %eax, -16(%ebp)
        movzbl  2(%ecx), %eax
        cmpb    2(%edx), %al
        jne     .L8
        leal    3(%ecx), %eax
        movl    %eax, -16(%ebp)
        movzbl  3(%ecx), %eax
        cmpb    3(%edx), %al
        jne     .L8
        leal    4(%ecx), %eax
        movl    %eax, -16(%ebp)
        movzbl  4(%ecx), %eax 
        cmpb    4(%edx), %al
        jne     .L8
        leal    5(%ecx), %eax
        movl    %eax, -16(%ebp)
        movzbl  5(%ecx), %eax
        cmpb    5(%edx), %al
        jne     .L8

This happens in offline copy of longest_match. The inline gets this detail
right, but frequencies of the deflate functions are all crazy, naturally.

I guess I should revive the patch for language scope branch predictors.
Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761



More information about the Gcc-bugs mailing list