[PATCH 1/4] x86: Add -mindirect-branch=

H.J. Lu hjl.tools@gmail.com
Sat Jan 13 15:50:00 GMT 2018


On Fri, Jan 12, 2018 at 9:47 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> gcc/
>>
>>       * config/i386/i386-opts.h (indirect_branch): New.
>>       * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
>>       * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
>>       with local indirect jump when converting indirect call and jump.
>>       (ix86_set_indirect_branch_type): New.
>>       (ix86_set_current_function): Call ix86_set_indirect_branch_type.
>>       (indirectlabelno): New.
>>       (indirect_thunk_needed): Likewise.
>>       (indirect_thunk_bnd_needed): Likewise.
>>       (indirect_thunks_used): Likewise.
>>       (indirect_thunks_bnd_used): Likewise.
>>       (INDIRECT_LABEL): Likewise.
>>       (indirect_thunk_name): Likewise.
>>       (output_indirect_thunk): Likewise.
>>       (output_indirect_thunk_function): Likewise.
>>       (ix86_output_indirect_branch): Likewise.
>>       (ix86_output_indirect_jmp): Likewise.
>>       (ix86_code_end): Call output_indirect_thunk_function if needed.
>>       (ix86_output_call_insn): Call ix86_output_indirect_branch if
>>       needed.
>>       (ix86_handle_fndecl_attribute): Handle indirect_branch.
>>       (ix86_attribute_table): Add indirect_branch.
>>       * config/i386/i386.h (machine_function): Add indirect_branch_type
>>       and has_local_indirect_jump.
>>       * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump
>>       to true.
>>       (tablejump): Likewise.
>>       (*indirect_jump): Use ix86_output_indirect_jmp.
>>       (*tablejump_1): Likewise.
>>       (simple_return_indirect_internal): Likewise.
>>       * config/i386/i386.opt (mindirect-branch=): New option.
>>       (indirect_branch): New.
>>       (keep): Likewise.
>>       (thunk): Likewise.
>>       (thunk-inline): Likewise.
>>       (thunk-extern): Likewise.
>>       * doc/extend.texi: Document indirect_branch function attribute.
>>       * doc/invoke.texi: Document -mindirect-branch= option.
>>
>> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
>> index f245c1573cf..f14cbeee7a1 100644
>> --- a/gcc/config/i386/i386-opts.h
>> +++ b/gcc/config/i386/i386-opts.h
>> @@ -106,4 +106,12 @@ enum prefer_vector_width {
>>      PVW_AVX512
>>  };
>>
>> +enum indirect_branch {
>> +  indirect_branch_unset = 0,
>> +  indirect_branch_keep,
>> +  indirect_branch_thunk,
>> +  indirect_branch_thunk_inline,
>> +  indirect_branch_thunk_extern
>> +};
> I think it would make sense to simply place the body of your introduction email
> here as a comment explaining what enum indirect_branhc does.

Will do.

>> -/* Return true if a red-zone is in use.  */
>> +/* Return true if a red-zone is in use.  We can't use red-zone when
>> +   there are local indirect jumps, like "indirect_jump" or "tablejump",
>> +   which jumps to another place in the function, since "call" in the
>> +   indirect thunk pushes the return address onto stack, destroying
>> +   red-zone.  */
>
> Technically we can use red-zone if we reserve space for the call address, right?
> Perhaps mention it as TODO, that should even not be too hard to implement
> but is probably not being that important code quality wise in practice.

Will do,

>>  bool
>>  ix86_using_red_zone (void)
>>  {
>> -  return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI;
>> +  return (TARGET_RED_ZONE
>> +       && !TARGET_64BIT_MS_ABI
>> +       && (!cfun->machine->has_local_indirect_jump
>> +           || cfun->machine->indirect_branch_type == indirect_branch_keep));
>>  }
>>
>>  /* Return a string that documents the current -m options.  The caller is
>> @@ -5797,6 +5804,37 @@ ix86_set_func_type (tree fndecl)
>>      }
>>  }
>>
>> @@ -10639,6 +10681,191 @@ ix86_setup_frame_addresses (void)
>>  # endif
>>  #endif
>>
>> +static int indirectlabelno;
>> +static bool indirect_thunk_needed = false;
>> +static bool indirect_thunk_bnd_needed = false;
>> +
>> +static int indirect_thunks_used;
>> +static int indirect_thunks_bnd_used;
>
> Add comments for variables.

Will do.

>> +
>> +#ifndef INDIRECT_LABEL
>> +# define INDIRECT_LABEL "LIND"
>> +#endif
>> +
>> +/* Fills in the label name that should be used for the indirect thunk.  */
>> +
>> +static void
>> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
>> +{
>> +  if (USE_HIDDEN_LINKONCE)
>> +    {
>> +      const char *bnd = need_bnd_p ? "_bnd" : "";
>> +      if (regno >= 0)
>> +     {
>> +       const char *reg_prefix;
>> +       if (LEGACY_INT_REGNO_P (regno))
>> +         reg_prefix = TARGET_64BIT ? "r" : "e";
>> +       else
>> +         reg_prefix = "";
>> +       sprintf (name, "__x86_indirect_thunk%s_%s%s",
>> +                bnd, reg_prefix, reg_names[regno]);
>> +     }
>> +      else
>> +     sprintf (name, "__x86_indirect_thunk%s", bnd);
>> +    }
>> +  else
>> +    {
>> +      if (regno >= 0)
>> +     {
>> +       if (need_bnd_p)
>> +         ASM_GENERATE_INTERNAL_LABEL (name, "LITBR", regno);
>> +       else
>> +         ASM_GENERATE_INTERNAL_LABEL (name, "LITR", regno);
>> +     }
>> +      else
>> +     {
>> +       if (need_bnd_p)
>> +         ASM_GENERATE_INTERNAL_LABEL (name, "LITB", 0);
>> +       else
>> +         ASM_GENERATE_INTERNAL_LABEL (name, "LIT", 0);
>> +     }
>> +    }
>> +}
>> +
>> +/* Output a call and return thunk for indirect branch.  If BND_P is
>> +   true, the BND prefix is needed.   If REGNO != -1,  the function
>> +   address is in REGNO.  Otherwise, the function address is on the
>> +   top of stack.  */
>
> Add here comment which gives example of full thunk body so it is easier
> to see what is going on here.

Sure.

>> +
>> +static void
>> +output_indirect_thunk (bool need_bnd_p, int regno)
>> +{
>> +  char indirectlabel1[32];
>> +  char indirectlabel2[32];
>> +
>> +  ASM_GENERATE_INTERNAL_LABEL (indirectlabel1, INDIRECT_LABEL,
>> +                            indirectlabelno++);
>> +  ASM_GENERATE_INTERNAL_LABEL (indirectlabel2, INDIRECT_LABEL,
>> +                            indirectlabelno++);
>> +
>> +  /* Call */
>> +  if (need_bnd_p)
>> +    fputs ("\tbnd call\t", asm_out_file);
>> +  else
>> +    fputs ("\tcall\t", asm_out_file);
>> +  assemble_name_raw (asm_out_file, indirectlabel2);
>> +  fputc ('\n', asm_out_file);
>> +
>> +  ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel1);
>> +
>> +  /* Pause .  */
>> +  fprintf (asm_out_file, "\tpause\n");
>> +
>> +  /* Jump.  */
>> +  fputs ("\tjmp\t", asm_out_file);
>> +  assemble_name_raw (asm_out_file, indirectlabel1);
>> +  fputc ('\n', asm_out_file);
>> +
>> +  ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel2);
>> +
>> +  if (regno >= 0)
>> +    {
>> +      /* MOV.  */
>> +      rtx xops[2];
>> +      xops[0] = gen_rtx_MEM (word_mode, stack_pointer_rtx);
>> +      xops[1] = gen_rtx_REG (word_mode, regno);
>> +      output_asm_insn ("mov\t{%1, %0|%0, %1}", xops);
>> +    }
>> +  else
>> +    {
>> +      /* LEA.  */
>> +      rtx xops[2];
>> +      xops[0] = stack_pointer_rtx;
>> +      xops[1] = plus_constant (Pmode, stack_pointer_rtx, UNITS_PER_WORD);
>> +      output_asm_insn ("lea\t{%E1, %0|%0, %E1}", xops);
>> +    }
>> +
>> +  if (need_bnd_p)
>> +    fputs ("\tbnd ret\n", asm_out_file);
>> +  else
>> +    fputs ("\tret\n", asm_out_file);
>> +}
>> +
>> +/* Output a funtion with a call and return thunk for indirect branch.
>> +   If BND_P is true, the BND prefix is needed.   If REGNO != -1,  the
>> +   function address is in REGNO.  Otherwise, the function address is
>> +   on the top of stack.  */
>> +
>> +static void
>> +output_indirect_thunk_function (bool need_bnd_p, int regno)
>> +{
>> +  char name[32];
>> +  tree decl;
>> +
>> +  /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd.  */
>> +  indirect_thunk_name (name, regno, need_bnd_p);
>> +  decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
>> +                  get_identifier (name),
>> +                  build_function_type_list (void_type_node, NULL_TREE));
>> +  DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
>> +                                NULL_TREE, void_type_node);
>> +  TREE_PUBLIC (decl) = 1;
>> +  TREE_STATIC (decl) = 1;
>> +  DECL_IGNORED_P (decl) = 1;
>
> DECL_ARTIFICIAL as well?

This is done exactly the same way as PIC thunk.  I don't think we
should change it here.

>
>> +
>> +#if TARGET_MACHO
>> +  if (TARGET_MACHO)
>> +    {
>> +      switch_to_section (darwin_sections[picbase_thunk_section]);
>> +      fputs ("\t.weak_definition\t", asm_out_file);
>> +      assemble_name (asm_out_file, name);
>> +      fputs ("\n\t.private_extern\t", asm_out_file);
>> +      assemble_name (asm_out_file, name);
>> +      putc ('\n', asm_out_file);
>> +      ASM_OUTPUT_LABEL (asm_out_file, name);
>> +      DECL_WEAK (decl) = 1;
>> +    }
>> +  else
>> +#endif
>> +    if (USE_HIDDEN_LINKONCE)
>> +      {
>> +     cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME (decl));
>> +
>> +     targetm.asm_out.unique_section (decl, 0);
>> +     switch_to_section (get_named_section (decl, NULL, 0));
>> +
>> +     targetm.asm_out.globalize_label (asm_out_file, name);
>> +     fputs ("\t.hidden\t", asm_out_file);
>> +     assemble_name (asm_out_file, name);
>> +     putc ('\n', asm_out_file);
>> +     ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl);
>> +      }
>> +    else
>> +      {
>> +     switch_to_section (text_section);
>> +     ASM_OUTPUT_LABEL (asm_out_file, name);
>> +      }
>> +
>
> Why do you need to output asm visibility directives by hand when you create
> symbol for the function anyway?

This is done exactly the same way as PIC thunk.  I don't think we
should change it here.

> I would expect that you can just use similar code as cgraph_node::expand_thunk
> when calls output_mi_thunk and get this done in a way that is independent of
> target assembler.

I took a look.  I don't see an easy to do it.   I'd like to keep it exactly the
same as PIC thunk.  And we change both thunks together later if needed.

>>
>> +/* Output indirect branch via a call and return thunk.  CALL_OP is
>> +   the branch target.  XASM is the assembly template for CALL_OP.
>> +   Branch is a tail call if SIBCALL_P is true.  */
>
> Again please add example of code sequences output here to make code easier to follow.

Will do.

>> +
>> +static void
>> +ix86_output_indirect_branch (rtx call_op, const char *xasm,
>> +                          bool sibcall_p)
>> +{
>> +  char thunk_name_buf[32];
>> +  char *thunk_name;
>> +  char push_buf[64];
>> +  bool need_bnd_p = ix86_bnd_prefixed_insn_p (current_output_insn);
>> +  int regno;
>> +
>> +  if (REG_P (call_op))
>> +    regno = REGNO (call_op);
>> +  else
>> +    regno = -1;
>> +
>> +  if (cfun->machine->indirect_branch_type
>> +      != indirect_branch_thunk_inline)
>> +    {
>> +      if (cfun->machine->indirect_branch_type == indirect_branch_thunk)
>> +     {
>> +       if (regno >= 0)
>> +         {
>> +           int i = regno;
>> +           if (i >= FIRST_REX_INT_REG)
>> +             i -= (FIRST_REX_INT_REG - LAST_INT_REG - 1);
>> +           if (need_bnd_p)
>> +             indirect_thunks_bnd_used |= 1 << i;
>> +           else
>> +             indirect_thunks_used |= 1 << i;
>> +         }
>> +       else
>> +         {
>> +           if (need_bnd_p)
>> +             indirect_thunk_bnd_needed = true;
>> +           else
>> +             indirect_thunk_needed = true;
>> +         }
>> +     }
>> +      indirect_thunk_name (thunk_name_buf, regno, need_bnd_p);
>> +      thunk_name = thunk_name_buf;
>> +    }
>> +  else
>> +    thunk_name = NULL;
>> +
>> +  snprintf (push_buf, sizeof (push_buf), "push{%c}\t%s",
>> +         TARGET_64BIT ? 'q' : 'l', xasm);
>> +
>> +  if (sibcall_p)
>> +    {
>> +      if (regno < 0)
>> +     output_asm_insn (push_buf, &call_op);
>> +      if (thunk_name != NULL)
>> +     {
>> +       if (need_bnd_p)
>> +         fprintf (asm_out_file, "\tbnd jmp\t%s\n", thunk_name);
>> +       else
>> +         fprintf (asm_out_file, "\tjmp\t%s\n", thunk_name);
>> +     }
>> +      else
>> +     output_indirect_thunk (need_bnd_p, regno);
>> +    }
>> +  else
>> +    {
>> +      if (regno >= 0 && thunk_name != NULL)
>> +     {
>> +       if (need_bnd_p)
>> +         fprintf (asm_out_file, "\tbnd call\t%s\n", thunk_name);
>> +       else
>> +         fprintf (asm_out_file, "\tcall\t%s\n", thunk_name);
>> +       return;
>> +     }
>> +
>> +      char indirectlabel1[32];
>> +      char indirectlabel2[32];
>> +
>> +      ASM_GENERATE_INTERNAL_LABEL (indirectlabel1,
>> +                                INDIRECT_LABEL,
>> +                                indirectlabelno++);
>> +      ASM_GENERATE_INTERNAL_LABEL (indirectlabel2,
>> +                                INDIRECT_LABEL,
>> +                                indirectlabelno++);
>> +
>> +      /* Jump.  */
>> +      if (need_bnd_p)
>> +     fputs ("\tbnd jmp\t", asm_out_file);
>> +      else
>> +     fputs ("\tjmp\t", asm_out_file);
>> +      assemble_name_raw (asm_out_file, indirectlabel2);
>> +      fputc ('\n', asm_out_file);
>> +
>> +      ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel1);
>> +
>> +      if (MEM_P (call_op))
>> +     {
>> +       struct ix86_address parts;
>> +       rtx addr = XEXP (call_op, 0);
>> +       if (ix86_decompose_address (addr, &parts)
>> +           && parts.base == stack_pointer_rtx)
>> +         {
>> +           /* Since call will adjust stack by -UNITS_PER_WORD,
>> +              we must convert "disp(stack, index, scale)" to
>> +              "disp+UNITS_PER_WORD(stack, index, scale)".  */
>> +           if (parts.index)
>> +             {
>> +               addr = gen_rtx_MULT (Pmode, parts.index,
>> +                                    GEN_INT (parts.scale));
>> +               addr = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>> +                                    addr);
>> +             }
>> +           else
>> +             addr = stack_pointer_rtx;
>> +
>> +           rtx disp;
>> +           if (parts.disp != NULL_RTX)
>> +             disp = plus_constant (Pmode, parts.disp,
>> +                                   UNITS_PER_WORD);
>> +           else
>> +             disp = GEN_INT (UNITS_PER_WORD);
>> +
>> +           addr = gen_rtx_PLUS (Pmode, addr, disp);
>> +           call_op = gen_rtx_MEM (GET_MODE (call_op), addr);
>> +         }
>> +     }
>> +
>> +      if (regno < 0)
>> +     output_asm_insn (push_buf, &call_op);
>> +
>> +      if (thunk_name != NULL)
>> +     {
>> +       if (need_bnd_p)
>> +         fprintf (asm_out_file, "\tbnd jmp\t%s\n", thunk_name);
>> +       else
>> +         fprintf (asm_out_file, "\tjmp\t%s\n", thunk_name);
>> +     }
>> +      else
>> +     output_indirect_thunk (need_bnd_p, regno);
>> +
>> +      ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel2);
>> +
>> +      /* Call.  */
>> +      if (need_bnd_p)
>> +     fputs ("\tbnd call\t", asm_out_file);
>> +      else
>> +     fputs ("\tcall\t", asm_out_file);
>> +      assemble_name_raw (asm_out_file, indirectlabel1);
>> +      fputc ('\n', asm_out_file);
>> +    }
>> +}
>
> It is really not very pretty that the whole sequence is injected into insn stream
> as a single blob.  How opaque it is? Does it need to be patched at dynamic link time?

It must be very opaque to optimizers.   No, we don't patch at load-time.

> I suppose it may make sense to split the insn and at least explicitly represent
> the fact that we (sometimes) push the target to stack.

Did you mean "split the function"?  I will break it into
ix86_output_indirect_branch_via_reg and
ix86_output_indirect_branch_via_push.

> Why the memory variant exists at first place?

When the function address is in memory, we can push it onto stack
to save a register.   Also it is needed to cover "call foo" to "call [foo@GOT]"
conversion.

>>  (define_insn "*indirect_jump"
>>    [(set (pc) (match_operand:W 0 "indirect_branch_operand" "rBw"))]
>>    ""
>> -  "%!jmp\t%A0"
>> +  "* return ix86_output_indirect_jmp (operands[0], false);"
>>    [(set_attr "type" "ibr")
>>     (set_attr "length_immediate" "0")
>>     (set_attr "maybe_prefix_bnd" "1")])
>
> I think you also want to update type to "many" when we do more than just indirect branch.

Did you mean "multi"?  I will change to "multi".

> We do not care much about this, but it feels wrong to have attributes off reality.
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index f3e4a63ab46..ddb6035be96 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -5754,6 +5754,16 @@ Specify which floating-point unit to use.  You must specify the
>>  @code{target("fpmath=sse+387")} because the comma would separate
>>  different options.
>>
>> +@item indirect_branch("@var{choice}")
>> +@cindex @code{indirect_branch} function attribute, x86
>> +On x86 targets, the @code{indirect_branch} attribute causes the compiler
>> +to convert indirect call and jump with @var{choice}.  @samp{keep}
>> +keeps indirect call and jump unmodified.  @samp{thunk} converts indirect
>> +call and jump to call and return thunk.  @samp{thunk-inline} converts
>> +indirect call and jump to inlined call and return thunk.
>> +@samp{thunk-extern} converts indirect call and jump to external call
>> +and return thunk provided in a separate object file.
>
> Please expand the documentation in a way that random user who is not aware of the
> issue will understand that those are security features that come at a cost.
>
>> +@opindex -mindirect-branch
>> +Convert indirect call and jump with @var{choice}.  The default is
>> +@samp{keep}, which keeps indirect call and jump unmodified.
>> +@samp{thunk} converts indirect call and jump to call and return thunk.
>> +@samp{thunk-inline} converts indirect call and jump to inlined call
>> +and return thunk.  @samp{thunk-extern} converts indirect call and jump
>> +to external call and return thunk provided in a separate object file.
>> +You can control this behavior for a specific function by using the
>> +function attribute @code{indirect_branch}.  @xref{Function Attributes}.
>
> Similarly here.

I will update documentation with user guide info  after Intel white
paper is published.

> Rest of the patch seems OK. We may want incrementally represent more of the
> indirect jump/call seqeuence in RTL, but at this point probably keeping things
> simple and localized is not a bad idea. This can be done incrementally.
>
> Please make updated patch and I would like to give others chance to comment today.
>
> Honza



-- 
H.J.



More information about the Gcc-patches mailing list