This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [4.0 and mainline] Fix multiplication by constant expansion



Hi Jan,

On Mon, 2 Jan 2006, Jan Hubicka wrote:
> > I'm fairly certain that disabling the splitting of "cost" ties based
> > on latency in synth_mult to work around an rtx_costs problem with
> > -march=athlon isn't the correct solution.
>
> It is not.  The code produced by the rule is equivalent of
>
> tmp = tmp << 5
> tmp += multiplicant
>
> This sequence has latency of shift and add operation, not the shift
> operation as expected.


There appear to be two things you're misunderstanding.  The first
is that as far as the backend is concerned, all synthetic multiplication
sequences are selected by the middle-end based upon cost.  The term
"latency" is a theoretical/hypothetical value used by synth_mult to
split ties between sequences of the same cost.  One possible
implementation is for the middle-end to select between equal cost
sequences randomly.  If the athlon backend claims that shift-and-add
and lea have identically the same rtx_cost, it is up to expmed.c to
choose deterministically between the alternatives.  Currently this
calculation uses an internal measure of parallelism.  It could
equally well use rand().  If a backend wants one sequence over
another, it needs to tweaks the cost, and not just rely on the order
in which synth_mult considers different decompositions.  i.e. changes
to latency calculations are red-herrings for target maintainers.

Secondly, is understanding how "latency" is calculated.  You are
quite right that in your example above:

	tmp = (tmp << 5) + multiplicant

the latency is min(shift_cost+add_cost, shiftadd_cost) + latency (tmp)

The concurrency spotted in synth_mult is for the other use of
shift_add instructions:

	tmp = (multiplicant << 5) + tmp

where latency may be max(shift_cost,latency(tmp)) + add_cost

i.e. in this situation the multiplicant may be shifted in parallel
with the calculation of "tmp".  Please refer to the original patch
descriptions to see why one is considered better than the other.


Of course, latency calculation has absolutely nothing to do with your
problem that you want a shift followed by an addition to be more expensive
than an lea.  In fact, you can remove the latency calculation altogether
and simply reorder the recursive search to attempt lea before
shift-and-add (or use <= instead of < in the code), to reintroduce the
same non-determinism.


Please just try the timings.  If a simple one or two line tweak, gets
you 10% (or more) on six-track, that is the best possible outcome.

Roger
--



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]