This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] improved algorithm for gcc/expmed.c::choose_multiplier()
- From: Roger Sayle <roger at eyesopen dot com>
- To: Denis Vlasenko <vda dot linux at googlemail dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Wed, 2 Aug 2006 22:55:38 -0600 (MDT)
- Subject: Re: [PATCH] improved algorithm for gcc/expmed.c::choose_multiplier()
On Mon, 31 Jul 2006, Jim Wilson wrote:
> At the moment, there is probably no one who understands this code as
> well as you do, so you may not get much help from others.
In my defence, I'm struggling to get up to speed with all of the
issues. The first and obvious comments are that patches should be
submitted to email@example.com rather than firstname.lastname@example.org,
we're currently in stage 3, which is a code freeze for regression
fixes, and therefore missed optimization improvements such as yours
can't be committed until gcc 4.2 has branched, which means reviewers
tend not to review complex patches until then, and finally you really
should take a look at GCC's coding standards documentation as
the formatting of your submission is just all wrong.
In the meantime, however, I have been running your example code,
and it does indeed find better correct multipliers than the algorithm
currently used by GCC.
As mentioned by Jim, the current GCC algorithm is based upon the
paper by Torbjorn Granlund and Peter Montgomery, "Division by
Invariant Integers using Multiplication", PLDI-94.
Another excellent reference describing division by integer constant
algorithms is Henry S. Warren Jr.'s "Hacker's Delight", Addision-Wesley
The strange thing is that all three of GCC, the paper and the book
agree on the a preferred multiplier, which differs from the one
selected by your code. Following through both sets of correctness
proofs is tedious, but it looks like Granlund's work attempts to
find multipliers whose residual error is less than 1/n, which is
sufficient by not necessary. Your code relies on round to zero
semantics of truncating division, so that provided the residual is
less than 1, we'll always truncate to the correct result.
One issue is that Grunlund's derivation starts with signed division
and is then extended/tweaked to handle unsigned division. Your
work on the other hand, starts its derivation with unsigned division.
So currently I'm trying to find the catch. For which divisions
are 33-bit operations required. Your proof looks reasonable and
the code does the correct thing for the divisions in your examples,
though the coding style issues prevent it being used in the compiler
without being cleaned up. There's also a return clause where it gives
up after failing to find a suitable multiplier, where falling back
to the exisiting code might be a better compromise. I've not played
with your algorithm enough to determine how often it triggers.
Congratulations for the impressive work! You'll forgive me if it
takes a while to understand why your method is better than previous
works, i.e. what it is they overlooked (or that you've overlooked).
To be honest all this discrete math makes my head hurt. Indeed,
when its appreciated that its not the smallest multiplier that we'd
prefer, but the cheapest (using shifts/adds etc...) and that
choose_multiplier is potentially intertwined with synth_mult, the
world starts spinning and I need to reach for the headache tablets.