This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: [patches] Re: i386 multiple-jumps fp comparisons


> 
> You should pass in the branch prediction note and re-distribute
> the probabilities across the two branches.  Leave that for
> another patch though.
Thats good catch!
> 
> > +   /* AMD Athlon and probably other CPUs too have fast bypass path
> >	 between the comparison and first branch.  The second branch
> >	 takes longer to execute so place first branch the worse
> >	 predicable one if possible.  */
> 
> I'd think you'd want to place first the branch that is more likely
> to be taken, so that more of the time you don't have to execute the
> second branch at all.
This condition is true for the current code too, since only ORDERED
branc can be swapped. The UNORDERED test is always bypass, we unfortunately
can't swap. I am not aware of any way avoiding the bypass tests (luckily
enought EQ is the only condition code they do happen).

Perhaps we can simply reverse the branch and fixup by unconditional branches.
The jump pass should fix the damage and reorganize the code to avoid
those unconditional ones.
But this should break the fast path trought code and looks like deadly trick
too.

I am about to implement the conditional moves/setcc patterns next
(then I will be ready for SSE :) and then we can turn such jump with bypass
to conditional move made from two setCC instructions. I am not sure if this
is cheaper alternative even on PPRO tought (it is not on Athlon nor
P4).
> 
> I was not aware that these CPUs had static predictors based on the
It does not IMO (at least the Athlon).
What I am shooting for in the reordering, 
> branch test.  They do, however, have large branch tag buffers, which
> allows them to learn how a particular branch behaves.
> 
> Moreover, some CPUs (I don't know about x86 implementations,
> unfortunately) can only apply this BTB data to the first branch in
> a cache line.  Which would again imply that the first branch should
> be the one most likely to succeed.
This is good point! At least AMD manual starts by claiming that
every fifth instruction is branch, so I guess they optimized
for more frequent branches. Similary Pentium definitly associated
branch brediction with exact address of the pair.
I will check the PPro manual and ask AMD people for details.
> 
> I would think UNORDERED would be the least likely, ORDERED would be
> the most likely, and everything else mostly in between.
True.

Honza
> 
> 
> 
> r~

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]