This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PATCH for loop.c, SIGFPE with bad integer operands on host (linux-)ix86
- To: Jeffrey A Law <law at cygnus dot com>
- Subject: Re: PATCH for loop.c, SIGFPE with bad integer operands on host (linux-)ix86
- From: Linus Torvalds <torvalds at transmeta dot com>
- Date: Sat, 7 Aug 1999 10:18:20 -0700 (PDT)
- cc: egcs-patches at egcs dot cygnus dot com
Heh. What irony. The shoe is on the other foot now, and Jeffrey is the one
arguing for new features.
Anyway, instead of flaming you for invocing non-standard behaviour, I'll
just give the technical arguments for why the kernel is not going to
change unless you can come up with _much_ stronger arguments. I'm happy to
give reasonable extensions to people (linux isn't exactly known for being
plain), but they do have to be reasonable.
Feel free to refute them on technical grounds (and quite frankly, in this
issue you are very very weak on any non-technical grounds, so if I were
you I wouldn't even try).
On Sat, 7 Aug 1999, Jeffrey A Law wrote:
>
> In message <7oghkk$h5i$1@palladium.transmeta.com>you write:
> > I don't think so. What would you suggest? We silently ignore a integer
> > division overflow?
>
> Yes. Precisely. While it is not mandated that one must ignore integer
> overflows in C,
"Not mandated". In fact, it is _expressly_ said to be undefined behaviour.
Including signed overflow for add and subtract, as I saw that somebody was
wondering about that.
And the reason it is undefined behaviour is that different hardware do
different things. I certainly agree with you that for add and subtract it
is extremely uncommon to trap, although some systems have conditional
traps even there (but people don't tend to use them, because the
nontrapping version is the more common, and it is basically never slower).
> the vast majority of systems do ignore them,
I disagree. "vast majority" is purely in your head, not in reality.
When it comes to signed integer divide, the vast majority does indeed
trap. Count them. It's every PC out there. Sure, there are non-trapping
architectures, but they are losing ground rather than making progress.
> why should we
> treat integer division overflow any different from addition, subtraction or
> multiplication?
We don't. We treat them all exactly the same: we do what the hardware does
best. That's how much of C was designed.
Differences in hardware behaviour is where most of the original C
"undefined behaviour" comes from in the first place. You should know that,
but it seems you choose to ignore the fact when you want to.
> > That would be rather stupid, don't you think?
>
> Nope. It is what is expected on most systems.
I'll tell you why it IS stupid, and I'll give you the technical arguments
for it (and I already told you off about your "most systems"
exaggeration):
- it doesn't buy you any real performance.
In portable C, it buys you no performance at all, because it's a
undefined and fairly uninteresting condition anyway.
In unportable C or in Java, it adds about two cycles to an operation
that otherwise would take about 16 cycles anyway to just test the thing
beforehand. Do it well, and you can combine it with the check for zero.
In short, it's something that is slow in the first place, and the
overhead of doing proper checking is basicall ylost in the noise. Very
few programs are divide-bound, and the ones that are are usually MUCH
better optimized by trying to avoid the divides in the first place
rather than trying to make them 10% faster for a uncommon case.
For example, if you really _are_ concerned about divide performance,
you would sure as hell not want to trap anyway. Either you know that
it's not going to trap (so you don't have to slow your code down by
testing), or you're concerned about it trapping so you migth as well
add the two cycles in order to avoid the 1000+ cycles for the trap
overhead.
More importantly, you try to turn the divide into multiplies or
something like that. Very few problems end up really being divide-
bound for cases that might overflow (at least on the integer side).
For the patch to gcc in question, I don't think the patch makes _any_
performance difference what-so-ever, and the patch will make gcc use
standards-defined behaviour. In short, it's certainly the right thing
to do do change gcc rather than the kernel.
- It can look horribly bad on benchmarks
Lets say that you were somebody trying to concoct a benchmark showing
how bad the competition was. It's been done before, with software
fixups on IEEE behaviour, for example.
So you choose a problem set that traps all the time on one
architecture, and the other architecture just does a simple test and
does the divide in 2 cycles.
- it's hard as hell.
I bet that on PA-RISC, you got a divide overflow fault, and the fault
handler did the equivalent of something like this:
add $pc,$pc,4
ret
which is trivial. When it is that trivial, it is probably worth doing
just to avoid the headache, and then you tout it as a feature. Good
marketing, and not bad engineering.
In contrast, on the x86, it is HARD to do right. I'll tell you why,
just so that you can understand why you had better come up with better
arguments to convince me that it's a good idea.
(a) You get exactly the same error for both a divide by zero and a
divide overflow. You and I both agree that divide by zero MUST
trap on any reasonable machine. In short, you have to find out
by hand which case it was.
(b) In order to know whether it was a overflow or a divide by
zero, you need to look at the arguments. HOWEVER, that's where
it gets nasty. In order to see what you divided by, you have
to disassemble the divide instruction, because the arguments
come from the modrm byte and can be in memory etc.
Now, a divide-specific disassembler can't be too bad, can it?
Wrong. It's simpler than the generic case for sure, but it's
still got 90% of the difficulty:
Finding the instruction. This may sound trivial ("look at
eip"), but it isn't. You have to look at the flags at the
time of the fault (virtual x86 mode or not?), and then at
the value of CS (flat segment or not?), and get the right
base, which includes looking it up in the local descriptor
table if necessary.
Decoding the instruction. Again, it's certainly more than
a few lines of code: you have to handle data and address
size overrides, take the default instruction encoding into
account (you get that from the CS information above), and
you have to decode the addressing mode (register vs
memory, and what memory address?).
Note that you need to decode the instruction anyway,
because you need to know how long it was in order to jump
over it. It's not as simple as just adding four to the
program counter.
Fetching the value from memory: you have to check what the
data segment was (any segment overrides? It may not be
DS), do all the segment base adds, and all the limit
checks etc by hand. Nontrivial in the extreme.
Basically, you probably have on the order of a few hundred
lines of C code if you want to do it right. And doing it right
is the only option, because if you don't, then you'll end up
taking faults or randomly ignoring divide-by-zero faults
inside programs that do something strange (like DOSEMU or
Wine - all the world is NOT plain gcc-generated C).
- Finally, there are security implications.
The straightforward implementation of the above is unsecure as hell.
You end up with maybe two hundred lines of C that is basically never
invoced in real life, and thus almost never tested. What's worse, it's
almost certainly going to be buggy in subtle ways. Let me count the
ways:
Hardware "race conditions" with the instruction prefetcher.
A bad user that finds out about the bugs above does the
following on a Pentium or i486 computer: he executes a
divide instruction that divides by zero (or overflows),
and in the preceding instructions he CHANGES the
instruction. The instruction prefetch means that the
divide will actually be executed (and will trap), but by
the time the trap handler starts decoding the instruction,
the instruction (or the modrm) byte is no longer the same.
This means, for example, that you _DO_ have to check all
the segment limits by hand. They were checked the first
time around by hardware, but if the divide is still there,
you have to check again. And you have to remember that
when you do the actual access to read the value from
memory, you're now doing it from supervisor mode, which
means that you have to check that the clever cracker isn't
trying to feed you a kernel address by changing the modrm
field. You could end up hanging the machine if you read
from a kernel-mapped IO range, for example.
You also have to have logic in your instruction decoder to
make sure that nobody is trying to do anything funny like
having infinite amounts of prefixes and making the
instruction decoder go into a very very long loop.
Race conditions with the local descriptor tables.
The same bad user above creates two threads, one that
traps, and one that modifies the LDT. This effectively
gives him the same kind of race condition as above even on
machines like a PII that have very strict prefetch
coherency.
Software bugs that would have been caught if the overflow trapped.
Enough said. A divide overflow might show a real bug, and
the thing would disable that checking.
Have I convinced you yet that it's a really stupid idea? I bet you didn't
realize just how complex it is. That, coupled with the fact that the trap
IS actually quite standard - not rare at all as you try to make it out to
be - basically means that there is no point at all in trying to implement
anything like the above. I've become very good over the years in seeing
security issues, but maybe I missed something..
The other approach is to say "Oh, all the world is flat gcc-generated C,
and you don't need to do any of the above complexity, because we'll limit
the decoding to the subset we're interested in". That approach is
fundamentally suspect, in my opinion: suddenly you get different behaviour
for the same code depending on what mode you happen to run it in.
This is why I really think that if you feel strongly about divide overflow
not trapping, you should do it in user mode with a signal handler. In user
mode you (a) do not have any of the security implications (trivially
proven: the signal handler does not have any special privileges) and (b)
user mode CAN validly know about what mode it was executing in, so a pure
gcc-compiled binary doesn't have to even consider the non-flat modes.
Feel free to try to convince me. I _can_ be convinced by technical
arguments, no question about that. I personally consider it to be
extremely unlikely that you'll ever come up with a compelling enough
argument, but hey, I'm open to suggestions.
Linus