This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/33761] non-optimal inlining heuristics pessimizes gzip SPEC score at -O3



------- Comment #24 from hubicka at gcc dot gnu dot org  2008-02-08 15:11 -------
Hi,
the tonight runs with continue heuristics shows again improvements on 64bit
scores , but degradation on 32bit scores.  Looking into the loop, the real
trouble seems to be that the main loop has 6 loop carried variables:

scan_end, scan_end1, best_len, scan, chain_length, cur_match

plus few temporaries are needed too. Obviously we can't fit in registers on
i386. Making profile more realistic sometime helps sometimes hurts pretty much
at random basis.

One case where I think register presure is increased is the fact that different
SSA names of both scan_end and scan_end1 variables are actually not fully
coalesced in out-of-SSA.  This is result of optimizing:

        if (match[best_len] != scan_end ||
            match[best_len-1] != scan_end1 ||
            *match != *scan ||
            *++match != scan[1]) continue;
       ...later code sometimes modifying scan_end....

into computing match[best_len] into name of scan_end that is sometimes assigned
int the later code on the path not modifying scan_end.  As a result we do have
two scan_ends live at once.  I wonder if we can avoid this behaviour, though it
looks all right on SSA form, it would save 2 "global" registers: there is no
need at all to cache match[best_len]/match[best_len1] in register unless I
missed something. Those two vars are manipulated on the hot paths through the
loop.

Now the RA is driven by frequencies (bit confused by fact that two of loop
carried vars are split) and by their "liveranges" that is actually number of
instructions in bettween first and last occurence.  Since we are bit carelless
on BB ordering moving some code to the very end of function, this heuristics is
not realistic at all.  It would probably make more sense to replace it by
number of inssn it is live across, but this is probably ninsn*npseudos to
compute. Other idea would be degree in conflict graph, but I am not sure we
want to start such experiemtns in parallel with YARA.

I tested YARA and it does not handle this situation much better. Perhaps
Vladimir can help?

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]