This is the mail archive of the
mailing list for the GCC project.
Re: revise -mfpmath=sse comparisons
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Richard Henderson <rth at redhat dot com>, gcc-patches at gcc dot gnu dot org
- Date: Fri, 14 Jan 2005 15:16:20 +0100
- Subject: Re: revise -mfpmath=sse comparisons
- References: <20050114003945.GA15157@redhat.com>
> This fixes three open PRs that we have. I'm beginning to think that
> Uros is right when he says that -mfpmath=sse,387 is worse than useless.
> At least without a much much better register allocator in place. I
> won't remove the option, but I think we can safely discourage its use.
It was meant for that.
I originally hoped that register allocator with catch up in resonable
horisont, but it didn't unforutnately. I simply didn't wanted to
constrain md more than neccesary when there is chance that we might use
I did some experiments in this direction and it seems to be doable in
few basic steps. I gave up as I hoped new-ra to solve this better but
this is not going to happen anytime soon, so this might be nice thing to
revisit in 4.1 timeframe.
Basically it seems to be doable in three basic stepts
1) modify regclass to make optimistic rather than pesimistic
For multiple alternative isnsn we currently expect the worst case
scenario to happen that leads to unrealistic costs. This is avoided by
adding '#' alternatives in each pattern that makes instruction to
look like it had no alternatives and allowed all possible combination
so we get what we deserve.
I made patch for regclass to expect that cheapest alternative will
match and set register preferences accordingly. This allowed to get
rid of '#' signs and seemed mostly perofmrance neutral (i didn't have
SPEC that time), so it might be interesting at it's own right, but we
didn't get into conclusing concerning the properties of cost model
2) Add sort of dtaflow over def-use graph to propagate the preferences
across alternatives. This might be expensive, but the observation
about fixed width lattice apply here too, so it should not be
terribly expensive. I iplemented it iteratively and it seemed to
work just fine in the past, but I probably never posted the patch.
Now with du we might propagate across the edges.
3) Teach regalloc that once some register is assigned, the registerclass
preferences needs to be propagated across similarly as in 2).
The time complexity is slightly more tricky here as the linearity
proof from 2) won't apply directly - sometimes we allocate worse
class that might lead to wider alternatives elsewhere. But both 2)
and 3) might be cut off after some threshold in worst case.
I still tend to think that 1) would be winner that avoids a lot of
cludges ('#' noise in particular). I might revisit the idea and patch
if the original seems uncomprehensive or wrong.
I am not sure about 2) and 3) but it looks like not terribly
difficult/expensive to implement that woth good amount of kludges in the
compiler, so perhaps it is not too bad even if it didn't pay back that
significantly (it would probably kill such degerate cases we are running
into now but won't buy that much in average I tend to guess, but have no
What do you think?
> Tested on i686 and x86-64 linux, and with a povray 3.6 benchmark run.
> The runtime of the povray benchmark improved 2.0% on the p4, and 7.9%
> on the athlon64. The later is surely helped by the amd64 abi being
This sounds rather cool ;) I did similar benchmarks in the distant past
with reversed results (thus my reservations over the original
discussion), but apparently we tweaked SSE since then so we get more
consistent scores over i387.