This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Add support for the Win32 hook prologue (try 3)


On 09/12/2009 02:17 PM, Stefan Dösinger wrote:
Am Saturday 12 September 2009 17:26:47 schrieb Dave Korn:
   I think that should probably be considered a binutils bug shouldn't it?
I don't think it should relax an explicit ".align X,0x90" any more than it
should relax "dc.b 0x90,0x90,0x90" IMO.
I agree.

I agree, but it would be a while before we could rely on this being fixed. Better to use 0xcc between functions, since that seems to be well supported in your target environment.


Out of curiosity, what is the advantage of a 3-byte, 4-byte or 5-byte nop over
the same amount of 0x90's? I know MSFT uses the two byte nop inside the
hookable function to allow atomic replacement. But what's the advantage if
the code is never executed? Or if it is executed, but never inteded to be
replaced, like alignment before a jump label?

Multi-byte nops are intended for use when the nop *is* executed, as in alignment before a jump label. They're faster than single byte nops in passing through the instruction decoder.


Each x86 implementation has a maximum instruction size than can go through its fast-path instruction decoder. This decoding size maximum is at least 7 bytes. Further, each x86 implementation has a maximum number of instructions that can be decoded within a single cycle via its fast-path decoders. The number of decoders is usually between 2 and 4. So if you use 5x 0x90, it can take between 2 and 3 cycles to execute those nops, whereas if you use a single 5-byte nop, it can be decoded and discarded in a single cycle, on a single decoder.


r~



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]