This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [tree-ssa] cfg.texi needs reviewing by a native speaker (Was: Re: "Documentation by paper")


> In message <20040209231001.GQ20341@kam.mff.cuni.cz>, Jan Hubicka writes:
>  >> And if you insert it (carefully making sure that it only applies to user
>  >> written gotos) then it still loses badly.  I haven't investigated why.
>  >
>  >How did you get the trick?
> Ignore jumps to targets with DECL_ARTIFICIAL set.

That still mess things up for example in 

if (cond)
{
  ...
  goto bb;
}

where the artifical goto coming from conditional will be forwarded
across the user specified goto and similar cases, so it won't do
particularly good.  But for perl this shall be moot and I will dig into
this later anyway, so lets skip it for now.  This thing help little (or
not at all) to SPEC, kernel is probably major consumer here.
> 
>  >The high level predictions needs to be lowered before CFG is cleaned up
>  >and optimized in general, so if we do some pretty aggressive
>  >optimizations before doing so, we can expect surprises.
>  >It don't lose on mainline at all (I dimply recall that perl has been one
>  >of benchmarks benefiting a lot from highlevel predictors, but it is long
>  >ago and my memory is biassed)
> About 40% of perl's dynamic branches are _not_ predicted.  Those fall into
> a few key categories:

This is weird.  Andreas tester has:

HEURISTICS                  BRANCHES  (REL)  HITRATE             COVERAGE  (REL)
DS theory                       8101  64.1%  80.11%/ 94.05%   3453275146  47.9%
combined                       12643 100.0%  74.78%/ 92.52%   7203748247 100.0%
call                            4143  32.8%  82.65%/ 96.30%   2130740141  29.6%
opcode values positive           429   3.4%  88.34%/ 89.09%    304001983   4.2%
opcode values nonequal          3034  24.0%  66.94%/ 90.08%    811368857  11.3%
loop header                      954   7.5%  79.77%/ 93.88%    410490743   5.7%
first match                     1370  10.8%  82.45%/ 90.29%   1904508787  26.4%
loop exit                        812   6.4%  67.56%/ 91.09%    524736878   7.3%
no prediction                   3172  25.1%  56.89%/ 91.95%   1845964314  25.6%
loop branch                      556   4.4%  88.11%/ 89.99%   1379795296  19.1%
pointer                         1242   9.8%  93.06%/ 98.11%    587903278   8.2%
early return                     117   0.9%  90.55%/ 94.79%     33862048   0.5%
goto                             170   1.3%  79.02%/ 94.22%     19885611   0.3%
nil return                        95   0.8%  96.44%/ 99.63%     37988786   0.5%
continue                          94   0.7%   9.38%/ 90.67%     17406599   0.2%
negative return                  194   1.5%  99.97%/ 99.99%      6983805   0.1%
const return                     144   1.1%  80.19%/ 93.11%      6485456   0.1%
loop iterations                   20   0.2%  98.60%/ 98.60%        12201   0.0%
noreturn call                      8   0.1% 100.00%/100.00%            9   0.0%

So only 25% of branches are not predicted.  All the major players
(loop/pointer/opcode/call shall be the same on tree-SSA)
> 
>   1. We need to predict the branch which checks for the range on a switch
>      table.  My gut tells me that this branch is going to be rarely taken
>      (ie, choose the fallthru path with the indirect jump for the switch rather
>      than the branch around the switch).  That gut feeling certainly
>      applies to perl, but I haven't checked it on a wider codebase and I
>      haven't seen any papers which discuss this issue.

I had code to do so (as well as predicting even distribution of the
cases in the decision tree), but the patch never got in.
I can dig it out.  It gained few % but nothing radical.
In mid term I would like to rewrite the jumptable expansion into tree
level and use profile (in fact it will be probably one of first steps
once I get the profile on trees from prenatal stages), then we won't
have this prediction heuristics at all (we will introduce the branches
at the time we already have profile), so perhaps it don't make sense to
do this right now
(of course if this won't turn out to be problem for tree-ssa merge
acceptance, but I won't expect so)
> 
>   2. We need to predict bit comparisons -- basically you predict that any
>      specific bit is off.  This is discussed in the paper from Deitrich,
>      Cheng and Hwu.    I cobbled together some code to do this and it's
>      reasonably effective.

You mean "restricted AND heuristics" or similarly called one?  I recall
seeing it in one paper and claims in other paper that it is wrong, so I
concluded that it is dificult to get this pattern on RTL and never got
across to implement it.  Look forward for your patch :)
> 
>   3. We need to handle FP comparisons.  Right now GT, GE, LT, LE against
>      the common constants such as -1, 0, 1 are only predicted for integer
>      comparisons.  This is made somewhat more difficult by the fact that
>      some architectures (ia32) don't include the FP constant in the
>      comparison -- you have to walk backwards to determine that the
>      second register in the comparison is actually holding one of these
>      special values (and that tidbit of knowledge might be buried in a
>      REG_EQUAL note).  I cobbled together some code to do this and it's
>      quite effective.

I also had patch for this one, that somehow vanished, but it was kind of
hit or miss on SPECfp (the claim that all rationals are positive is more
a lie than the same claim about integers apparently :)
> 
> 
> Those changes radically improve the coverage of the predictors and also raise
> also raise the overall hit rate from ~60% to ~75%.

Mainline reports 74.48%, I wonder what makes the tree-ssa code so
different, but in general your code shall help in other scenarios too.
> 
> There's a tree-ssa specific issue that also needs to be addressed.  tree-ssa
> marks a lot more things with REG_POINTER -- particularly registers holding
> pointer types which were loaded from structure fields, but which are
> never dereferenced.  These are then checked against null quite often, problem
> is these values are null.  So the pointer-nonnull heuristic needs some work --
> note this dovetails with the findings of Deitrich, that pointers loaded from
> arrays have a much higher likelihood of being null.

This sounds interesting.  Perhaps we can work this out via REG_ATTRs
that points to original DECL node, that, well points to tmp that says
nothing...
but it is probably place where we shall go.  For scalarized structures
and temporaries holding array fields we shall feed in the proper
DECL+offset so Josef's code can be extended to handle aggregates.
> 
> Now if we wanted to get really good hit rates and coverage, there is a guarded
> store into an array that we do not predict, but which is taken 99% of the time.
> The trick to predicting this is that the guard is a bounds check.    While
> the store heuristics are rather controversial in the general form, I
> wouldn't be terribly surprised to find that a bounds check guard can be
> accurately predicted.

Yep, I just skipped the guard heuristics as it has very low reported
hitrate and in both directions across the papers.  We can try it out.
The current scheme of feeding the values from analyze_brprob into DS
combining allows us to take even weak heuristics as a hints.
> 
> 
>  >But probably we need to analyze_brprob both mainline and tree-ssa to see
>  >how scores looks like.  Perhaps just editing predictors.def and feeding
>  >in the tree-ssa values will make scores better.
> Well, the biggest issue for tree-ssa at this time is the pointer heuristic
> and simply having more branches which are not predictable than mainline.
> Having analyze_brbrob has been quite useful.

sometimes the magic is easier when one has some numbers :)
It is real shame that we don't have profiled SPEC tester that can keep
track of the predictor analysis like we do have for mainline.  I will
branch the profiling branch at the end of week and try to arrange
something for it.
> 
> I've got a change I'm testing which cuts down on the number of dynamic
> branches for tree-ssa (making it better than mainline).  Then I've got
> to figure out how to deal with the pointer problem.

Thanks for working on this!
(i meant to play around this for a while, but didn't quite get into
that)

Honza
> 
> jeff
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]