This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Floating point trouble with x86's extended precision


On 21 Aug, Jim Wilson wrote:
> Volker Reichelt wrote:
> > Just to make sure I get this right: The register is spilled with
> > the last bits truncated (cut off) instead of rounded, right?
> 
> It is just an fst/fstp instruction.  According to the manual I have, it 
> does a rounding operation before the store.
> 
> > And one question out of curiosity. What happens to values in the FPU
> > that finally get written into memory *not* because the floating point
> > stack is filled, but because of other reasons (like the variable y in my
> > example). Do they also suffer from not being rounded but truncated?
> > I'd think so, right?

I got confused by your usage of the word truncated. In the example with the
full FP stack you used "truncated" as a synononym for "rounded", right?
And you wanted to say that after some intermediate results no rounding is
done (no rounding to 64 bits that is, you have to round to 80 bits of course),
and after some results (apparently correct) rounding to 64 bit is done.

> It is the same fst/fstp instruction, so it is the same rounding.
> 
> > But I think the difference between rounding and truncating is not the
> > problem for the users.
> 
> The problem isn't that the value is rounded.  The problem is that the 
> value is rounded at unpredictable locations,

Agreed. But that's not a bug IMHO.

> making it impossible to compensate for the rounding.

What do you mean by "compensate"?
I don't quite get what kind of goal you really want to obtain.
As I understand the word "compensate", you now want to get rid of rounding
- which would match my definition ii) of accuracy.
Later on, you argue for rounding after each step - which would match
definition i).

IMHO the current behavior of GCC matches definition ii) of accuracy:
Postpone rounding to 64 bit as long as possible (if it is not possible
due to a full FP stack for example, resort to spilling stuff which will
cause rounding).
I completely agree with you that it's bad that the user cannot decide
which result is spilled and which one is not. But IMHO i) is an even
worse solution: you can reproduce the results, but at the expense of
rounding operations everywhere, which
1) lead to larger rounding errors in total
2) cause program slowdown

The second point could be tackled by the hardware (and actually is in AMD's
and Intel's 64 bit archtictures, if I understood you correctly), but the
first point still holds.

It's a matter of which goals you are trying to reach.

> > You'll get much better results with the extended precision than without
> > (since I get 1 rounding to 64 bits instead of 10000).
> 
> No one here is saying that extended precision is bad.  It is very 
> important for correct results for some algorithms.  That is why the IEEE 
> FP standards require its presense.
> 
> However, the problem with the x86 fp reg-stack is that you can't control 
> when the extended precision is used.  It is always used, whether you 
> want it or not.  This makes it very hard for a compiler to get correct 
> results without sacrificing performance, accuracy, or access to long 
> double.

This is not true, if your definition of accuracy is ii) IMHO.

> There is really no good solution to this problem that makes everyone happy.

> This is a design from the 1980's.  It seemed like a good idea at the 
> time, but experience has shown it was a mistake, and no one designs FP 
> hardware like that anymore.  People who did design FP hardware like that 
> have since fixed the mistake.  The original m68k FPU (68881) had this 
> mistake, and Motorola fixed it in the 68040/68060.  As I mentioned 
> before, Intel and AMD have both fixed the x86 mistake with Itanium and 
> AMD64 respectively.
> 
> > i) You could say that accurate results are obtained, if you do the
> >    rounding to 64 bit after each computation.
> >    (That has the consequence that the result does not change with
> >    optimization, if computations aren't rearranged using commutativity
> >    or associativity etc.)
> 
> This is the ideal situation.

Only if your goal is sacrificing accuracy in terms of ii) for reproducability
of the results. If you want to achieve ii), then it's far from ideal.

>  All gcc targets work this way except x86 
> (if using the reg-stack not the SSE registers) and m68k (pre-68040).
> 
> > ii) You could say that accurate results are obtained, if the final
> >     result is close to the exact arithmetic result. This means, that you
> >     should postpone rounding to 64 bit as long as possible.
> >     (In my understanding this is what GCC currently does - if one
> >     ignores the fact that it doesn't round, but truncate in several
> >     cases, right?)
> 
> Modulo the gcc bugs, yes, this is what we get.  This isn't very useful 
> unless we fix the bugs though.

I still don't get what the bugs are (when you want to achieve ii).
(I thought is was truncating instead of rounding after your first mail,
but that was a misunderstanding on my part). As I understand it, the
compiler tries to use as few rounding operations as possible, and the
user just lacks the control when the rounding happens.

> > But you said it yourself: "In this case, I think we have to admit that both
> > viewpoints are valid, and then agree to disagree."
> 
> Yes, I think that is a good way to state my position.
> 
> > Your proposed fixes all try to enforce definition i) of accuracy, but
> > I think definition ii) is also a valid position.
> 
> That is a valid criticism.
> 
> > The only thing GCC can really blamed for is that there's no option to turn
> > on the workaround in c). Therefore, I'd rather call this a missing feature.
> 
> There are two things missing.  The abililty to turn on the workaround in 
>   c (i.e. emit FP rounding instruction after every operation), and a bug 
> fix for the register spills.

I don't get what kind of bug that is (see above).

> Even with those things, I think we are still in trouble.  In the first 
> case, having explicit rounding instructions eliminates the excess 
> precision problem, but it introduces a double-rounding problem.  So we 
> have lost again there.

What do you mean by double rounding? If you round to 64 bit and then
once agein, the value shouldn't change the second time. If you first
round to 80 bit and then to 64 bit, then you might get different results
in rare cases, so that the error is bounded by 0.500000x ulps instead of
0.5 ulps. I don't think that this really is a problem.

> This is probably not fixable without changing 
> the hardware.  In the second case, fixing the problem with reload spills 
> eliminates one source of unpredictable rounding, however, we still have 
> the problem that local variables get rounded if they are allocated to 
> the stack and not rounded if they are allocated to an FP reg-stack 
> register.  Thus we still have the problem that we get different results 
> at different optimization levels.  So we still lose there again also.

Of course, losing control is not nice, but that's not really a bug, IMHO.
It's the price we pay for the higher accuracy of the intermediate results.
Less rounding is better, even when I can't control when rounding is done,
if I want to reach ii).

> This might be fixable by promoting all stack locals to long double to 
> avoid unexpected rounding, but that will probably cause other problems 
> in turn.  It will break all programs that expect assignments doubles 
> will round to double for instance.  If we don't promote stack locals, 
> then we need explicit rounding for them, and then we have 
> double-rounding again.
> 
> I really see no way to fix this problem other than by fixing the 
> hardware.  The hardware has to have explicit float and double operations.
> -- 
> Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com

To sum it up: At present you cannot reach i) with GCC on x86. That's a
missing feature IMHO as long as you are able to reach ii).
If it is the other way round with other architectures, that's fine with me,
too. It would of course be even nicer if you could reach both goals with
any achitecture (depending on a command line switch).

If GCC's implementation for ii) on x86 is buggy, then it's of course a bug.
But, as stated above, I still fail to see where there's a bug.
Could you please give it another try?

Thanks for your time,
Volker



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]