This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: ifcvt conditional execution and rtx_cost

From: Roger Sayle <roger at eyesopen dot com>
To: David Edelsohn <dje at watson dot ibm dot com>
Cc: gcc-patches at gcc dot gnu dot org, Eric Christopher <echristo at redhat dot com>, Uros Bizjak <uros at kss-loka dot si>
Date: Wed, 7 Jul 2004 07:08:49 -0600 (MDT)
Subject: Re: ifcvt conditional execution and rtx_cost

On Tue, 6 Jul 2004, David Edelsohn wrote:
> 	With all of your recent rtx_cost changes and your patch for
> ifcvt.c, I would like to get your opinion on the appended ifcvt.c patch.

Interesting!  The problem is that half of the tests updated by your
patch really need to be testing the instruction count and comparing
that against MAX_CONDITIONAL_EXECUTE and the other half need to be
summing the the rtx_costs and comparing that against BRANCH_COST.
Neither the current code that always tests instruction counts, nor
your patch that only compares rtx_costs is ideal, as these two different
types of tests really require checking different metrics.  But I agree
some of the aspects of your patch are needed in ifcvt.c, and can be
used to address Uros' PR rtl-optimization/15187.  There's also another
bug in this code...

The first class of transformations concerns the conversion of
if-statements (with and without else clauses) into conditionally
executed instructions, i.e. cond_exec_process_if_block.  In these
transformations it is the *number* of instructions that is significant.

Consider the case without an else-clause.  If the condition is true,
the conditional instructions are executed, so the performance of this
path should be almost identical to before (as conditional instructions
that execute cost about the same as their unconditional variants),
less the overhead of the conditional branch.  When the condition is
false, there's a performance penalty of not executing N conditional
instructions less the branch overhead.  In these cases, its fair to
assume that the rtx_cost of not executing a conditional instruction
is constant, hence its the number of instructions that's significant.
This number is then compared against the backends MAX_CONDITIONAL_EXECUTE
value which specifies the cost ratio on unexecuted conditional
instructions to the target's branch cost.

There's actually a bug in the with an else-clause logic, which sums
the number of instructions in the then and else clauses and compares
this to twice the MAX_CONDITIONAL_EXECUTE.  I believe for consistency
this should be against MAX_CONDITIONAL_EXEUTE (i.e. the max *= 2 is
bogus).  Without the else, consider the case where the conditional
is true/false 50% of the time.  The worst case overhead of a patch
is MAX_CONDITIONAL_EXECUTE and the average case overhead is half of
that.  In the if-the-else case, the worst case overhead is currently
2*MAX_CONDITIONAL_EXECUTE-1 and the average is MAX_CONDITIONAL_EXECUTE.
Given that in both cases we only save a single conditional jump, the
measures we're using are inconsistent.  Removing the "max *= 2" allows
the backend to give a single well-defined value for the target macro
MAX_CONDITIONAL_EXECUTE.

The second class of transformations affect by your patch are those
performed in find_if_case_1 and find_if_case_2.  In these cases, we
take a number of instructions that we conditionally executed
previously, hoist them before the condition and save ourselves the
cost of a single condition jump.  In these cases, clearly, without
a shadow of doubt, we need to use rtx_costs as done in your patch!
I was going to ask if you could reduce a convenient testcase from
Apple's Skidmarks benchmarks, but fortunately Uros has pointed out
PR 15187:

double test(double x) {
        if (x > 0.0)
                return cos(x);
        else
                return sin(x);
}

Where we pull the evaluation of an fcos instruction outside of the
if-stmt, and executes it unconditionally.  As you can imagine that's
*lots* of cycles.

Hence the only bits of your patch that are required at the last two
hunks, where instead of calling "count_bb_insns (bb) > BRANCH_COST"
these need to be changed to "total_bb_rtx_cost (bb) >
COSTS_N_INSNS (BRANCH_COST)", introducing a new total_bb_rtx_cost
function based upon your changes to count_bb_insns.  Notice in this
case you should retain the comparison against BRANCH_COST as it means
something completely different to MAX_CONDITIONAL_EXECUTE (as I've
described above).

This approach also fixes a "wart" with your patch, that you were
changing the semantics of count_bb_insns, n_insns and max_insns
in various functions without changing their names.  Although you
did change the commentry, this would have been confusing for
someone coming to the code in future.

Any chance that I could ask you to draw up a patch based upon my
suggestions above, and test whether it affects the Skidmarks performance
degradation you're seeing?  If you're busy, I'd be happy to fix this
ifcvt.c bug for you and check that it resolves PR optimization/15187
(which will need x86_rtx_costs changes for the x86's UNSPEC_FCOS and
UNSPEC_FSIN).

Many thanks again,

Roger
--

Follow-Ups:
- Re: ifcvt conditional execution and rtx_cost
  - From: Richard Earnshaw
- Re: ifcvt conditional execution and rtx_cost
  - From: David Edelsohn

References:
- ifcvt conditional execution and rtx_cost
  - From: David Edelsohn

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]