This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: revise -mfpmath=sse comparisons


> This fixes three open PRs that we have.  I'm beginning to think that
> Uros is right when he says that -mfpmath=sse,387 is worse than useless.
> At least without a much much better register allocator in place.  I
> won't remove the option, but I think we can safely discourage its use.
It was meant for that.
I originally hoped that register allocator with catch up in resonable
horisont, but it didn't unforutnately.  I simply didn't wanted to
constrain md more than neccesary when there is chance that we might use
it.

I did some experiments in this direction and it seems to be doable in
few basic steps.  I gave up as I hoped new-ra to solve this better but
this is not going to happen anytime soon, so this might be nice thing to
revisit in 4.1 timeframe.
Basically it seems to be doable in three basic stepts
1) modify regclass to make optimistic rather than pesimistic
   assumptions.
   For multiple alternative isnsn we currently expect the worst case
   scenario to happen that leads  to unrealistic costs.  This is avoided by
   adding '#' alternatives in each pattern that makes instruction to
   look like it had no alternatives and allowed all possible combination 
   so we get what we deserve.
   I made patch for regclass to expect that cheapest alternative will
   match and set register preferences accordingly.  This allowed to get
   rid of '#' signs and seemed mostly perofmrance neutral (i didn't have
   SPEC that time), so it might be interesting at it's own right, but we
   didn't get into conclusing concerning the properties of cost model
   (http://gcc.gnu.org/ml/gcc-patches/2001-02/msg00874.html)
2) Add sort of dtaflow over def-use graph to propagate the preferences
   across alternatives.  This might be expensive, but the observation
   about fixed width lattice apply here too, so it should not be
   terribly expensive.  I iplemented it iteratively and it seemed to
   work just fine in the past, but I probably never posted the patch.
   Now with du we might propagate across the edges.
3) Teach regalloc that once some register is assigned, the registerclass
   preferences needs to be propagated across similarly as in 2).
   The time complexity is slightly more tricky here as the linearity
   proof from 2) won't apply directly - sometimes we allocate worse
   class that might lead to wider alternatives elsewhere.  But both 2)
   and 3) might be cut off after some threshold in worst case.

I still tend to think that 1) would be winner that avoids a lot of
cludges ('#' noise in particular).  I might revisit the idea and patch
if the original seems uncomprehensive or wrong.
I am not sure about 2) and 3) but it looks like not terribly
difficult/expensive to implement that woth good amount of kludges in the
compiler, so perhaps it is not too bad even if it didn't pay back that
significantly (it would probably kill such degerate cases we are running
into now but won't buy that much in average I tend to guess, but have no
data)
What do you think?
> 
> Tested on i686 and x86-64 linux, and with a povray 3.6 benchmark run.
> The runtime of the povray benchmark improved 2.0% on the p4, and 7.9%
> on the athlon64.  The later is surely helped by the amd64 abi being

This sounds rather cool ;) I did similar benchmarks in the distant past
with reversed results (thus my reservations over the original
discussion), but apparently we tweaked SSE since then so we get more
consistent scores over i387.

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]