This is the mail archive of the
mailing list for the GCC project.
Re: [RFA] optimizing predictable branches on x86
On Monday 03 March 2008 22:38, Jan Hubicka wrote:
> I had to tweak the testcase a bit to not compute minimum: GCC optimizes
> this early into MIN_EXPR throwing away any profile information. If we
> get serious here we can maintain it via histogram, but I am not sure it
> is worth the effort at least until IL is sanitized and expansion cleaned
> up with tupple branch.
> I also had to fix bug in branch prediction ignoring __builtin_expect of
> any early inlined function and update your testcase to not use
> __buliltin_expect in predictable case.
I guess you mean, not to use it in the _unpredictable_ case?
> However this is what I get on AthlonXP:
> no deps, predictable -- C code took 13.71ns per iteration
> no deps, predictable -- cmov code took 13.83ns per iteration
> no deps, predictable -- jmp code took 13.94ns per iteration
> has deps, predictable -- C code took 15.54ns per iteration
> has deps, predictable -- cmov code took 22.21ns per iteration
> has deps, predictable -- jmp code took 16.55ns per iteration
> no deps, unpredictable -- C code took 13.99ns per iteration
> no deps, unpredictable -- cmov code took 13.76ns per iteration
> no deps, unpredictable -- jmp code took 26.12ns per iteration
> has deps, unpredictable -- C code took 120.37ns per iteration
> has deps, unpredictable -- cmov code took 120.76ns per iteration
> has deps, unpredictable -- jmp code took 165.82ns per iteration
At least for the __builtin_expect case, I guess this is showing
that gcc now does exactly what we'd like of it.
> The patch is quite SPEC neutral, saving 190Kb in FDO binaries. Still I
> think it is worthwhile to have especially because I do believe that all
> the target COST predicates should be populated by hotness argument so we
> get same results for -Os or -O2 with profile feeback specifying that
> nothing is executed or if one marks all functions cold.
> At the moment profile feedback with all functions not executed leads to
> code smaller than -O2 but closer to -O2 than -Os so there is quite some
> fruit here. With LTO or for codebases with more __builtin_expect and
> cold hints like kernel or libstdc++ we can get a lot of this benefits
> without FDO too.
I hope so too. For the kernel we have some parts where
__builtin_expect is used quite a lot and noticably helps, and could
help even more if we cut down the use of cmov too. I guess on
architectures with even more predictated instructions it could be
even more useful too.