i386 multiple-jumps fp comparisons

Richard Henderson rth@redhat.com
Fri Jan 12 11:14:00 GMT 2001


On Fri, Jan 12, 2001 at 06:35:31PM +0100, Jan Hubicka wrote:
> 	* i386.c (ix86_fp_comparison_arithmetics_cost,
> 	ix86_fp_comparison_fcomi_cost, ix86_fp_comparison_sahf_cost,
> 	ix86_fp_comparison_cost): New functions.
> 	(ix86_expand_fp_compare): Use the costs to choose best bethod; add
> 	two new parameters SECOND_TEST and BYPASS_TEST; allow generating
> 	two-branch sequences; make static.
> 	(ix86_use_fcomi_compare): Do decision according to the costs.
> 	(split_fp_branch): New.
> 	* i386.md (compare-and-branch patterns): Use split_fp_branch.
> 	* i386-protos.h (ix86_expand_fp_compare): Remove
> 	(ix86_split_fp_branch): Declare.

Ok.

> +   /* Return arbitarily high cost when instruction is not supported - this
> +      avoids gcc from using it.  */

"prevents", not "avoids".

> !   /* Do fcomi/sahf based test when profitable.  */
> !   if ((bypass_code == NIL || bypass_test) && (second_code == NIL || second_test)
> !       && ix86_fp_comparison_arithmetics_cost (code) > cost)

Please wrap this.

> + /* Split branch based on floating point condition.  */
> + void
> + ix86_split_fp_branch (condition, op1, op2, target1, target2, tmp)
> +      rtx condition, op1, op2, target1, target2, tmp;

You should pass in the branch prediction note and re-distribute
the probabilities across the two branches.  Leave that for
another patch though.

> +   /* AMD Athlon and probably other CPUs too have fast bypass path
>	 between the comparison and first branch.  The second branch
>	 takes longer to execute so place first branch the worse
>	 predicable one if possible.  */

I'd think you'd want to place first the branch that is more likely
to be taken, so that more of the time you don't have to execute the
second branch at all.

I was not aware that these CPUs had static predictors based on the
branch test.  They do, however, have large branch tag buffers, which
allows them to learn how a particular branch behaves.

Moreover, some CPUs (I don't know about x86 implementations,
unfortunately) can only apply this BTB data to the first branch in
a cache line.  Which would again imply that the first branch should
be the one most likely to succeed.

I would think UNORDERED would be the least likely, ORDERED would be
the most likely, and everything else mostly in between.



r~


More information about the Gcc-patches mailing list