PATCH for loop.c, SIGFPE with bad integer operands on host (linux-)ix86

Linus Torvalds torvalds@transmeta.com
Sat Aug 7 10:17:00 GMT 1999


Heh. What irony. The shoe is on the other foot now, and Jeffrey is the one
arguing for new features.

Anyway, instead of flaming you for invocing non-standard behaviour, I'll
just give the technical arguments for why the kernel is not going to
change unless you can come up with _much_ stronger arguments. I'm happy to
give reasonable extensions to people (linux isn't exactly known for being
plain), but they do have to be reasonable.

Feel free to refute them on technical grounds (and quite frankly, in this
issue you are very very weak on any non-technical grounds, so if I were
you I wouldn't even try).

On Sat, 7 Aug 1999, Jeffrey A Law wrote:
> 
>   In message < 7oghkk$h5i$1@palladium.transmeta.com >you write:
>   > I don't think so. What would you suggest? We silently ignore a integer
>   > division overflow?
>
> Yes.  Precisely.  While it is not mandated that one must ignore integer
> overflows in C,

"Not mandated". In fact, it is _expressly_ said to be undefined behaviour.
Including signed overflow for add and subtract, as I saw that somebody was
wondering about that.

And the reason it is undefined behaviour is that different hardware do
different things. I certainly agree with you that for add and subtract it
is extremely uncommon to trap, although some systems have conditional
traps even there (but people don't tend to use them, because the
nontrapping version is the more common, and it is basically never slower).

>	 the vast majority of systems do ignore them,

I disagree. "vast majority" is purely in your head, not in reality.

When it comes to signed integer divide, the vast majority does indeed
trap. Count them. It's every PC out there. Sure, there are non-trapping
architectures, but they are losing ground rather than making progress.

>							 why should we
> treat integer division overflow any different from addition, subtraction or
> multiplication?

We don't. We treat them all exactly the same: we do what the hardware does
best. That's how much of C was designed.

Differences in hardware behaviour is where most of the original C
"undefined behaviour" comes from in the first place. You should know that,
but it seems you choose to ignore the fact when you want to. 

>   > That would be rather stupid, don't you think?
>
> Nope.  It is what is expected on most systems.

I'll tell you why it IS stupid, and I'll give you the technical arguments
for it (and I already told you off about your "most systems"
exaggeration):

 - it doesn't buy you any real performance.

   In portable C, it buys you no performance at all, because it's a
   undefined and fairly uninteresting condition anyway.

   In unportable C or in Java, it adds about two cycles to an operation
   that otherwise would take about 16 cycles anyway to just test the thing
   beforehand. Do it well, and you can combine it with the check for zero.
   In short, it's something that is slow in the first place, and the
   overhead of doing proper checking is basicall ylost in the noise. Very
   few programs are divide-bound, and the ones that are are usually MUCH
   better optimized by trying to avoid the divides in the first place
   rather than trying to make them 10% faster for a uncommon case.

   For example, if you really _are_ concerned about divide performance,
   you would sure as hell not want to trap anyway. Either you know that
   it's not going to trap (so you don't have to slow your code down by
   testing), or you're concerned about it trapping so you migth as well
   add the two cycles in order to avoid the 1000+ cycles for the trap
   overhead.

   More importantly, you try to turn the divide into multiplies or
   something like that. Very few problems end up really being divide-
   bound for cases that might overflow (at least on the integer side).

   For the patch to gcc in question, I don't think the patch makes _any_
   performance difference what-so-ever, and the patch will make gcc use
   standards-defined behaviour. In short, it's certainly the right thing
   to do do change gcc rather than the kernel.

 - It can look horribly bad on benchmarks

   Lets say that you were somebody trying to concoct a benchmark showing
   how bad the competition was. It's been done before, with software
   fixups on IEEE behaviour, for example.

   So you choose a problem set that traps all the time on one
   architecture, and the other architecture just does a simple test and
   does the divide in 2 cycles.

 - it's hard as hell.

   I bet that on PA-RISC, you got a divide overflow fault, and the fault
   handler did the equivalent of something like this:

		add $pc,$pc,4
		ret

   which is trivial. When it is that trivial, it is probably worth doing
   just to avoid the headache, and then you tout it as a feature. Good
   marketing, and not bad engineering.

   In contrast, on the x86, it is HARD to do right. I'll tell you why,
   just so that you can understand why you had better come up with better
   arguments to convince me that it's a good idea.

	(a) You get exactly the same error for both a divide by zero and a
	    divide overflow. You and I both agree that divide by zero MUST
	    trap on any reasonable machine. In short, you have to find out
	    by hand which case it was.

	(b) In order to know whether it was a overflow or a divide by
	    zero, you need to look at the arguments. HOWEVER, that's where
	    it gets nasty. In order to see what you divided by, you have
	    to disassemble the divide instruction, because the arguments
	    come from the modrm byte and can be in memory etc.

	    Now, a divide-specific disassembler can't be too bad, can it?

	    Wrong. It's simpler than the generic case for sure, but it's
	    still got 90% of the difficulty:

		Finding the instruction. This may sound trivial ("look at
		eip"), but it isn't. You have to look at the flags at the
		time of the fault (virtual x86 mode or not?), and then at
		the value of CS (flat segment or not?), and get the right
		base, which includes looking it up in the local descriptor
		table if necessary.

		Decoding the instruction. Again, it's certainly more than
		a few lines of code: you have to handle data and address
		size overrides, take the default instruction encoding into
		account (you get that from the CS information above), and
		you have to decode the addressing mode (register vs
		memory, and what memory address?).

		Note that you need to decode the instruction anyway,
		because you need to know how long it was in order to jump
		over it. It's not as simple as just adding four to the
		program counter.

		Fetching the value from memory: you have to check what the
		data segment was (any segment overrides? It may not be
		DS), do all the segment base adds, and all the limit
		checks etc by hand. Nontrivial in the extreme.

	    Basically, you probably have on the order of a few hundred
	    lines of C code if you want to do it right. And doing it right
	    is the only option, because if you don't, then you'll end up
	    taking faults or randomly ignoring divide-by-zero faults
	    inside programs that do something strange (like DOSEMU or
	    Wine - all the world is NOT plain gcc-generated C).

 - Finally, there are security implications.

   The straightforward implementation of the above is unsecure as hell.
   You end up with maybe two hundred lines of C that is basically never
   invoced in real life, and thus almost never tested. What's worse, it's
   almost certainly going to be buggy in subtle ways. Let me count the
   ways:

	Hardware "race conditions" with the instruction prefetcher.

		A bad user that finds out about the bugs above does the
		following on a Pentium or i486 computer: he executes a
		divide instruction that divides by zero (or overflows),
		and in the preceding instructions he CHANGES the
		instruction. The instruction prefetch means that the
		divide will actually be executed (and will trap), but by
		the time the trap handler starts decoding the instruction,
		the instruction (or the modrm) byte is no longer the same.

		This means, for example, that you _DO_ have to check all
		the segment limits by hand. They were checked the first
		time around by hardware, but if the divide is still there,
		you have to check again. And you have to remember that
		when you do the actual access to read the value from
		memory, you're now doing it from supervisor mode, which
		means that you have to check that the clever cracker isn't
		trying to feed you a kernel address by changing the modrm
		field. You could end up hanging the machine if you read
		from a kernel-mapped IO range, for example.

		You also have to have logic in your instruction decoder to
		make sure that nobody is trying to do anything funny like
		having infinite amounts of prefixes and making the
		instruction decoder go into a very very long loop.

	Race conditions with the local descriptor tables.

		The same bad user above creates two threads, one that
		traps, and one that modifies the LDT. This effectively
		gives him the same kind of race condition as above even on
		machines like a PII that have very strict prefetch
		coherency.

	Software bugs that would have been caught if the overflow trapped.

		Enough said. A divide overflow might show a real bug, and
		the thing would disable that checking.

Have I convinced you yet that it's a really stupid idea? I bet you didn't
realize just how complex it is. That, coupled with the fact that the trap
IS actually quite standard - not rare at all as you try to make it out to
be - basically means that there is no point at all in trying to implement
anything like the above. I've become very good over the years in seeing
security issues, but maybe I missed something..

The other approach is to say "Oh, all the world is flat gcc-generated C,
and you don't need to do any of the above complexity, because we'll limit
the decoding to the subset we're interested in". That approach is
fundamentally suspect, in my opinion: suddenly you get different behaviour
for the same code depending on what mode you happen to run it in.

This is why I really think that if you feel strongly about divide overflow
not trapping, you should do it in user mode with a signal handler. In user
mode you (a) do not have any of the security implications (trivially
proven: the signal handler does not have any special privileges) and (b)
user mode CAN validly know about what mode it was executing in, so a pure
gcc-compiled binary doesn't have to even consider the non-flat modes.

Feel free to try to convince me. I _can_ be convinced by technical
arguments, no question about that. I personally consider it to be
extremely unlikely that you'll ever come up with a compelling enough
argument, but hey, I'm open to suggestions.

			Linus



More information about the Gcc-patches mailing list