parameters to _mm_mwait intrinsic

Kumar, Venkataramanan
Wed Jun 3 12:47:00 GMT 2015


I was going through the "monitor" and "mwait" builtin implementation.
I need clarification on the parameters passed to _mm_mwait intrinsic.

We have the following defined in "pmmintrin.h"

extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_monitor (void const * __P, unsigned int __E, unsigned int __H)
  __builtin_ia32_monitor (__P, __E, __H);

extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_mwait (unsigned int __E, unsigned int __H)
  __builtin_ia32_mwait (__E, __H);

I assume parameter  names indicates  
P -> Address 
E -> Extensions
H -> Hints

Mwait as per AMD ISA manual 
EAX specifies optional hints for the MWAIT instruction. There are currently no hints defined and all
bits should be 0. Setting a reserved bit in EAX is ignored by the processor.
ECX specifies optional extensions for the MWAIT instruction. The only extension currently defined is
ECX bit 0, which allows interrupts to wake MWAIT, even when eFLAGS.IF = 0. Support for this
extension is indicated by a feature flage returned by the CPUID instruction. Setting any unsupported
bit in ECX results in a #GP exception. 

Mwait defined as per intel ISA manual. 
This instruction's operation is the same in non-64-bit modes and 64-bit mode.
ECX specifies optional extensions for the MWAIT instruction. EAX may contain hints such as the preferred optimized
state the processor should enter. The first processors to implement MWAIT supported only the zero value for
EAX and ECX. Later processors allowed setting ECX[0] to enable masked interrupts as break events for MWAIT
(see below). Software can use the CPUID instruction to determine the extensions and hints supported by the

So for if a user calls  _mm_mwait (__E, __H)  __E should go into ECX and __H should go into EAX.

However I see implementation in GCC

      arg0 = CALL_EXPR_ARG (exp, 0);
      arg1 = CALL_EXPR_ARG (exp, 1);
      op0 = expand_normal (arg0);
      op1 = expand_normal (arg1);
      if (!REG_P (op0))
        op0 = copy_to_mode_reg (SImode, op0);
      if (!REG_P (op1))
        op1 = copy_to_mode_reg (SImode, op1);
      emit_insn (gen_sse3_mwait (op0, op1));
      return 0;

(define_insn "sse3_mwait"
  [(unspec_volatile [(match_operand:SI 0 "register_operand" "a")
                     (match_operand:SI 1 "register_operand" "c")]
;; 64bit version is "mwait %rax,%rcx". But only lower 32bits are used.
;; Since 32bit register operands are implicitly zero extended to 64bit,
;; we only need to set up 32bit registers.
  [(set_attr "length" "3")])

Here first argument __E is moved to "EAX"  and __H is moved to "ECX"
Should the constraint be swaped for the operands in the pattern?
Or My understanding is wrong?


More information about the Gcc mailing list