[PATCH] x86 interrupt attribute

Sun Oct 4 10:29:00 GMT 2015

On Sun, Oct 4, 2015 at 7:23 AM, Yulia Koval <vaalfreja@gmail.com> wrote:
> Hi,
>
> Here is the last version of the patch. Regtested/bootstraped for
> Linux/i686 and Linux/x86_64.
>
> Date: Fri, 4 Sep 2015 08:53:23 -0700
> Subject: [PATCH] Implement x86 interrupt attribute
>
> The interrupt and exception handlers are called by x86 processors.  X86
> hardware pushes information onto stack and calls the handler.  The
> requirements are
>
> 1. Both interrupt and exception handlers must use the 'IRET' instruction,
> instead of the 'RET' instruction, to return from the handlers.
> 2. All registers are callee-saved in interrupt and exception handlers.
> 3. The difference between interrupt and exception handlers is the
> exception handler must pop 'ERROR_CODE' off the stack before the 'IRET'
> instruction.
>
> The design goals of interrupt and exception handlers for x86 processors
> are:
>
> 1. Support both 32-bit and 64-bit modes.
> 2. Flexible for compilers to optimize.
> 3. Easy to use by programmers.
>
> To implement interrupt and exception handlers for x86 processors, a
> compiler should support:
>
> 'interrupt' attribute
>
> Use this attribute to indicate that the specified function with
> mandatory arguments is an interrupt or exception handler.  The compiler
> generates function entry and exit sequences suitable for use in an
> interrupt handler when this attribute is present.  The 'IRET' instruction,
> instead of the 'RET' instruction, is used to return from interrupt or
> exception handlers.  All registers, except for the EFLAGS register which
> is restored by the 'IRET' instruction, are preserved by the compiler.
>
> Any interruptible-without-stack-switch code must be compiled with
> -mno-red-zone since interrupt handlers can and will, because of the
> hardware design, touch the red zone.
>
> 1. interrupt handler must be declared with a mandatory pointer argument:
>
> struct interrupt_frame;
>
> __attribute__ ((interrupt))
> void
> f (struct interrupt_frame *frame)
> {
> ...
> }
>
> and user must properly define the structure the pointer pointing to.
>
> 2. exception handler:
>
> The exception handler is very similar to the interrupt handler with
> a different mandatory function signature:
>
> typedef unsigned long long int uword_t;
> typedef unsigned int uword_t;
>
> struct interrupt_frame;
>
> __attribute__ ((interrupt))
> void
> f (struct interrupt_frame *frame, uword_t error_code)
> {
> ...
> }
>
> and compiler pops the error code off stack before the 'IRET' instruction.
>
> The exception handler should only be used for exceptions which push an
> error code and all other exceptions must use the interrupt handler.
> The system will crash if the wrong handler is used.
>
> To be feature complete, compiler may implement the optional
> 'no_caller_saved_registers' attribute:
>
> Use this attribute to indicate that the specified function has no
> caller-saved registers.  That is, all registers are callee-saved.
> The compiler generates proper function entry and exit sequences to
> save and restore any modified registers.
>
> The user can call functions specified with 'no_caller_saved_registers'
> attribute from an interrupt handler without saving and restoring all
> call clobbered registers.

Looking a bit deeper into the code, it looks that we want to realign
the stack in the interrupt handler. Let's assume that interrupt
handler is calling some other function that saves SSE vector regs to
the stack. According to the x86 ABI, incoming stack of the called
function is assumed to be aligned to 16 bytes. But, interrupt handler
violates this assumption, since the stack could be aligned to only 4
bytes for 32bit and 8 bytes for 64bit targets. Entering the called
function with stack, aligned to less than 16 bytes will certainly
violate ABI.

So, it looks to me that we need to realign the stack in the interrupt
handler unconditionally to 16bytes. In this case, we also won't need
the following changes:

@@ -11284,9 +11474,29 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
       /* The only ABI that has saved SSE registers (Win64) also has a
          16-byte aligned default stack, and thus we don't need to be
      within the re-aligned local stack frame to save them.  */
-      gcc_assert (INCOMING_STACK_BOUNDARY >= 128);
-      offset = ROUND_UP (offset, 16);
-      offset += frame->nsseregs * 16;
+      unsigned int incoming_stack_boundary
+    = MAX (crtl->parm_stack_boundary, ix86_incoming_stack_boundary);
+      unsigned int vec_regsize;
+
+      /* We must save full vector registers if there are no
+     caller-saved registers.  */
+      if (cfun->machine->no_caller_saved_registers)
+    vec_regsize = (TARGET_AVX512F ? 64 : (TARGET_AVX ? 32 : 16));
+      else
+    vec_regsize = 16;
+
+      if (cfun->machine->func_type == TYPE_NORMAL)
+    {
+      /* Incoming stack may be 32-bit aligned in 32-bit mode.  */
+      gcc_assert (!TARGET_64BIT || incoming_stack_boundary >= 128);
+      /* Don't over-align vector register save area.  */
+      incoming_stack_boundary /= BITS_PER_UNIT;
+      if (incoming_stack_boundary > vec_regsize)
+        incoming_stack_boundary = vec_regsize;
+      offset = ROUND_UP (offset, incoming_stack_boundary);
+    }
+
+      offset += frame->nsseregs * vec_regsize;
     }
   frame->sse_reg_save_offset = offset;

...

@@ -11513,8 +11727,28 @@ ix86_emit_save_reg_using_mov (machine_mode
mode, unsigned int regno,
   addr = choose_baseaddr (cfa_offset);
   mem = gen_frame_mem (mode, addr);

-  /* For SSE saves, we need to indicate the 128-bit alignment.  */
-  set_mem_align (mem, GET_MODE_ALIGNMENT (mode));
+  /* For SSE saves, we need to indicate the 128-bit alignment.  In
+     interrupt handler, stack is aligned to word_mode.  We can't use
+     gen_sse_storeups since RTX_FRAME_RELATED_P is set and
+     dwarf2out_flush_queued_reg_saves doesn't like UNSPEC_STOREU.
+     Also gen_sse_storeups doesn't cover AVX nor AVX512.  */
+  unsigned int incoming_stack_boundary;
+  if (cfun->machine->func_type == TYPE_NORMAL)
+    {
+      incoming_stack_boundary
+    = MAX (crtl->parm_stack_boundary, ix86_incoming_stack_boundary);
+      if (!TARGET_64BIT || incoming_stack_boundary >= 128)
+    {
+      /* Don't over-align vector register save slot.  */
+      if (incoming_stack_boundary > GET_MODE_ALIGNMENT (mode))
+        incoming_stack_boundary = GET_MODE_ALIGNMENT (mode);
+    }
+      else
+    gcc_unreachable ();
+    }
+  else
+    incoming_stack_boundary = GET_MODE_ALIGNMENT (word_mode);
+  set_mem_align (mem, incoming_stack_boundary);

   insn = emit_move_insn (mem, reg);
   RTX_FRAME_RELATED_P (insn) = 1;

and

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -875,10 +875,18 @@
     case MODE_V16SF:
     case MODE_V8SF:
     case MODE_V4SF:
-      if (TARGET_AVX
+      /* We must support misaligned SSE load and store in interrupt
+         handler or there are no caller-saved registers and we are
+         in 32-bit mode since ix86_emit_save_reg_using_mov generates
+         the normal *mov<mode>_internal pattern to save and restore
+         SSE registers with misaligned stack.  */
+      if ((TARGET_AVX
+           || cfun->machine->func_type != TYPE_NORMAL
+           || (!TARGET_64BIT
+           && cfun->machine->no_caller_saved_registers))
           && (misaligned_operand (operands[0], <MODE>mode)
           || misaligned_operand (operands[1], <MODE>mode)))
-        return "vmovups\t{%1, %0|%0, %1}";
+        return "%vmovups\t{%1, %0|%0, %1}";
       else
         return "%vmovaps\t{%1, %0|%0, %1}";

If we want realignment to work for x86_64, we can also enable
-mstack-realign also for x86_64, as was recently requested from
Windows people. (I can't find the relevant PR ATM...)


What also bothers me is the following change:

@@ -11248,11 +11433,16 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
        = !expensive_function_p (count);
     }

+  /* We must use move to save bound and mask registers.  */
   frame->save_regs_using_mov
-    = (TARGET_PROLOGUE_USING_MOVE && cfun->machine->use_fast_prologue_epilogue
-       /* If static stack checking is enabled and done with probes,
-      the registers need to be saved before allocating the frame.  */
-       && flag_stack_check != STATIC_BUILTIN_STACK_CHECK);
+    = (frame->nbndregs > 0
+       || frame->nmaskregs > 0
+       || (TARGET_PROLOGUE_USING_MOVE
+       && cfun->machine->use_fast_prologue_epilogue
+       /* If static stack checking is enabled and done with probes,
+          the registers need to be saved before allocating the
+          frame.  */
+       && flag_stack_check != STATIC_BUILTIN_STACK_CHECK));

In the above code, we force the whole frame to save_regs_using_mov,
when we have to save BND and MASK registers. However, AFAICS,
cygwin/mingw is able to save SSE registers (that also don't have PUSH
insns) without this requirement. We should use the same approach to
save BND and MASK registers.

Uros.