[PATCH 1/4] x86: Add -mindirect-branch=
H.J. Lu
hjl.tools@gmail.com
Sat Jan 13 15:50:00 GMT 2018
On Fri, Jan 12, 2018 at 9:47 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> gcc/
>>
>> * config/i386/i386-opts.h (indirect_branch): New.
>> * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
>> * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
>> with local indirect jump when converting indirect call and jump.
>> (ix86_set_indirect_branch_type): New.
>> (ix86_set_current_function): Call ix86_set_indirect_branch_type.
>> (indirectlabelno): New.
>> (indirect_thunk_needed): Likewise.
>> (indirect_thunk_bnd_needed): Likewise.
>> (indirect_thunks_used): Likewise.
>> (indirect_thunks_bnd_used): Likewise.
>> (INDIRECT_LABEL): Likewise.
>> (indirect_thunk_name): Likewise.
>> (output_indirect_thunk): Likewise.
>> (output_indirect_thunk_function): Likewise.
>> (ix86_output_indirect_branch): Likewise.
>> (ix86_output_indirect_jmp): Likewise.
>> (ix86_code_end): Call output_indirect_thunk_function if needed.
>> (ix86_output_call_insn): Call ix86_output_indirect_branch if
>> needed.
>> (ix86_handle_fndecl_attribute): Handle indirect_branch.
>> (ix86_attribute_table): Add indirect_branch.
>> * config/i386/i386.h (machine_function): Add indirect_branch_type
>> and has_local_indirect_jump.
>> * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump
>> to true.
>> (tablejump): Likewise.
>> (*indirect_jump): Use ix86_output_indirect_jmp.
>> (*tablejump_1): Likewise.
>> (simple_return_indirect_internal): Likewise.
>> * config/i386/i386.opt (mindirect-branch=): New option.
>> (indirect_branch): New.
>> (keep): Likewise.
>> (thunk): Likewise.
>> (thunk-inline): Likewise.
>> (thunk-extern): Likewise.
>> * doc/extend.texi: Document indirect_branch function attribute.
>> * doc/invoke.texi: Document -mindirect-branch= option.
>>
>> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
>> index f245c1573cf..f14cbeee7a1 100644
>> --- a/gcc/config/i386/i386-opts.h
>> +++ b/gcc/config/i386/i386-opts.h
>> @@ -106,4 +106,12 @@ enum prefer_vector_width {
>> PVW_AVX512
>> };
>>
>> +enum indirect_branch {
>> + indirect_branch_unset = 0,
>> + indirect_branch_keep,
>> + indirect_branch_thunk,
>> + indirect_branch_thunk_inline,
>> + indirect_branch_thunk_extern
>> +};
> I think it would make sense to simply place the body of your introduction email
> here as a comment explaining what enum indirect_branhc does.
Will do.
>> -/* Return true if a red-zone is in use. */
>> +/* Return true if a red-zone is in use. We can't use red-zone when
>> + there are local indirect jumps, like "indirect_jump" or "tablejump",
>> + which jumps to another place in the function, since "call" in the
>> + indirect thunk pushes the return address onto stack, destroying
>> + red-zone. */
>
> Technically we can use red-zone if we reserve space for the call address, right?
> Perhaps mention it as TODO, that should even not be too hard to implement
> but is probably not being that important code quality wise in practice.
Will do,
>> bool
>> ix86_using_red_zone (void)
>> {
>> - return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI;
>> + return (TARGET_RED_ZONE
>> + && !TARGET_64BIT_MS_ABI
>> + && (!cfun->machine->has_local_indirect_jump
>> + || cfun->machine->indirect_branch_type == indirect_branch_keep));
>> }
>>
>> /* Return a string that documents the current -m options. The caller is
>> @@ -5797,6 +5804,37 @@ ix86_set_func_type (tree fndecl)
>> }
>> }
>>
>> @@ -10639,6 +10681,191 @@ ix86_setup_frame_addresses (void)
>> # endif
>> #endif
>>
>> +static int indirectlabelno;
>> +static bool indirect_thunk_needed = false;
>> +static bool indirect_thunk_bnd_needed = false;
>> +
>> +static int indirect_thunks_used;
>> +static int indirect_thunks_bnd_used;
>
> Add comments for variables.
Will do.
>> +
>> +#ifndef INDIRECT_LABEL
>> +# define INDIRECT_LABEL "LIND"
>> +#endif
>> +
>> +/* Fills in the label name that should be used for the indirect thunk. */
>> +
>> +static void
>> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
>> +{
>> + if (USE_HIDDEN_LINKONCE)
>> + {
>> + const char *bnd = need_bnd_p ? "_bnd" : "";
>> + if (regno >= 0)
>> + {
>> + const char *reg_prefix;
>> + if (LEGACY_INT_REGNO_P (regno))
>> + reg_prefix = TARGET_64BIT ? "r" : "e";
>> + else
>> + reg_prefix = "";
>> + sprintf (name, "__x86_indirect_thunk%s_%s%s",
>> + bnd, reg_prefix, reg_names[regno]);
>> + }
>> + else
>> + sprintf (name, "__x86_indirect_thunk%s", bnd);
>> + }
>> + else
>> + {
>> + if (regno >= 0)
>> + {
>> + if (need_bnd_p)
>> + ASM_GENERATE_INTERNAL_LABEL (name, "LITBR", regno);
>> + else
>> + ASM_GENERATE_INTERNAL_LABEL (name, "LITR", regno);
>> + }
>> + else
>> + {
>> + if (need_bnd_p)
>> + ASM_GENERATE_INTERNAL_LABEL (name, "LITB", 0);
>> + else
>> + ASM_GENERATE_INTERNAL_LABEL (name, "LIT", 0);
>> + }
>> + }
>> +}
>> +
>> +/* Output a call and return thunk for indirect branch. If BND_P is
>> + true, the BND prefix is needed. If REGNO != -1, the function
>> + address is in REGNO. Otherwise, the function address is on the
>> + top of stack. */
>
> Add here comment which gives example of full thunk body so it is easier
> to see what is going on here.
Sure.
>> +
>> +static void
>> +output_indirect_thunk (bool need_bnd_p, int regno)
>> +{
>> + char indirectlabel1[32];
>> + char indirectlabel2[32];
>> +
>> + ASM_GENERATE_INTERNAL_LABEL (indirectlabel1, INDIRECT_LABEL,
>> + indirectlabelno++);
>> + ASM_GENERATE_INTERNAL_LABEL (indirectlabel2, INDIRECT_LABEL,
>> + indirectlabelno++);
>> +
>> + /* Call */
>> + if (need_bnd_p)
>> + fputs ("\tbnd call\t", asm_out_file);
>> + else
>> + fputs ("\tcall\t", asm_out_file);
>> + assemble_name_raw (asm_out_file, indirectlabel2);
>> + fputc ('\n', asm_out_file);
>> +
>> + ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel1);
>> +
>> + /* Pause . */
>> + fprintf (asm_out_file, "\tpause\n");
>> +
>> + /* Jump. */
>> + fputs ("\tjmp\t", asm_out_file);
>> + assemble_name_raw (asm_out_file, indirectlabel1);
>> + fputc ('\n', asm_out_file);
>> +
>> + ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel2);
>> +
>> + if (regno >= 0)
>> + {
>> + /* MOV. */
>> + rtx xops[2];
>> + xops[0] = gen_rtx_MEM (word_mode, stack_pointer_rtx);
>> + xops[1] = gen_rtx_REG (word_mode, regno);
>> + output_asm_insn ("mov\t{%1, %0|%0, %1}", xops);
>> + }
>> + else
>> + {
>> + /* LEA. */
>> + rtx xops[2];
>> + xops[0] = stack_pointer_rtx;
>> + xops[1] = plus_constant (Pmode, stack_pointer_rtx, UNITS_PER_WORD);
>> + output_asm_insn ("lea\t{%E1, %0|%0, %E1}", xops);
>> + }
>> +
>> + if (need_bnd_p)
>> + fputs ("\tbnd ret\n", asm_out_file);
>> + else
>> + fputs ("\tret\n", asm_out_file);
>> +}
>> +
>> +/* Output a funtion with a call and return thunk for indirect branch.
>> + If BND_P is true, the BND prefix is needed. If REGNO != -1, the
>> + function address is in REGNO. Otherwise, the function address is
>> + on the top of stack. */
>> +
>> +static void
>> +output_indirect_thunk_function (bool need_bnd_p, int regno)
>> +{
>> + char name[32];
>> + tree decl;
>> +
>> + /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd. */
>> + indirect_thunk_name (name, regno, need_bnd_p);
>> + decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
>> + get_identifier (name),
>> + build_function_type_list (void_type_node, NULL_TREE));
>> + DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
>> + NULL_TREE, void_type_node);
>> + TREE_PUBLIC (decl) = 1;
>> + TREE_STATIC (decl) = 1;
>> + DECL_IGNORED_P (decl) = 1;
>
> DECL_ARTIFICIAL as well?
This is done exactly the same way as PIC thunk. I don't think we
should change it here.
>
>> +
>> +#if TARGET_MACHO
>> + if (TARGET_MACHO)
>> + {
>> + switch_to_section (darwin_sections[picbase_thunk_section]);
>> + fputs ("\t.weak_definition\t", asm_out_file);
>> + assemble_name (asm_out_file, name);
>> + fputs ("\n\t.private_extern\t", asm_out_file);
>> + assemble_name (asm_out_file, name);
>> + putc ('\n', asm_out_file);
>> + ASM_OUTPUT_LABEL (asm_out_file, name);
>> + DECL_WEAK (decl) = 1;
>> + }
>> + else
>> +#endif
>> + if (USE_HIDDEN_LINKONCE)
>> + {
>> + cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME (decl));
>> +
>> + targetm.asm_out.unique_section (decl, 0);
>> + switch_to_section (get_named_section (decl, NULL, 0));
>> +
>> + targetm.asm_out.globalize_label (asm_out_file, name);
>> + fputs ("\t.hidden\t", asm_out_file);
>> + assemble_name (asm_out_file, name);
>> + putc ('\n', asm_out_file);
>> + ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl);
>> + }
>> + else
>> + {
>> + switch_to_section (text_section);
>> + ASM_OUTPUT_LABEL (asm_out_file, name);
>> + }
>> +
>
> Why do you need to output asm visibility directives by hand when you create
> symbol for the function anyway?
This is done exactly the same way as PIC thunk. I don't think we
should change it here.
> I would expect that you can just use similar code as cgraph_node::expand_thunk
> when calls output_mi_thunk and get this done in a way that is independent of
> target assembler.
I took a look. I don't see an easy to do it. I'd like to keep it exactly the
same as PIC thunk. And we change both thunks together later if needed.
>>
>> +/* Output indirect branch via a call and return thunk. CALL_OP is
>> + the branch target. XASM is the assembly template for CALL_OP.
>> + Branch is a tail call if SIBCALL_P is true. */
>
> Again please add example of code sequences output here to make code easier to follow.
Will do.
>> +
>> +static void
>> +ix86_output_indirect_branch (rtx call_op, const char *xasm,
>> + bool sibcall_p)
>> +{
>> + char thunk_name_buf[32];
>> + char *thunk_name;
>> + char push_buf[64];
>> + bool need_bnd_p = ix86_bnd_prefixed_insn_p (current_output_insn);
>> + int regno;
>> +
>> + if (REG_P (call_op))
>> + regno = REGNO (call_op);
>> + else
>> + regno = -1;
>> +
>> + if (cfun->machine->indirect_branch_type
>> + != indirect_branch_thunk_inline)
>> + {
>> + if (cfun->machine->indirect_branch_type == indirect_branch_thunk)
>> + {
>> + if (regno >= 0)
>> + {
>> + int i = regno;
>> + if (i >= FIRST_REX_INT_REG)
>> + i -= (FIRST_REX_INT_REG - LAST_INT_REG - 1);
>> + if (need_bnd_p)
>> + indirect_thunks_bnd_used |= 1 << i;
>> + else
>> + indirect_thunks_used |= 1 << i;
>> + }
>> + else
>> + {
>> + if (need_bnd_p)
>> + indirect_thunk_bnd_needed = true;
>> + else
>> + indirect_thunk_needed = true;
>> + }
>> + }
>> + indirect_thunk_name (thunk_name_buf, regno, need_bnd_p);
>> + thunk_name = thunk_name_buf;
>> + }
>> + else
>> + thunk_name = NULL;
>> +
>> + snprintf (push_buf, sizeof (push_buf), "push{%c}\t%s",
>> + TARGET_64BIT ? 'q' : 'l', xasm);
>> +
>> + if (sibcall_p)
>> + {
>> + if (regno < 0)
>> + output_asm_insn (push_buf, &call_op);
>> + if (thunk_name != NULL)
>> + {
>> + if (need_bnd_p)
>> + fprintf (asm_out_file, "\tbnd jmp\t%s\n", thunk_name);
>> + else
>> + fprintf (asm_out_file, "\tjmp\t%s\n", thunk_name);
>> + }
>> + else
>> + output_indirect_thunk (need_bnd_p, regno);
>> + }
>> + else
>> + {
>> + if (regno >= 0 && thunk_name != NULL)
>> + {
>> + if (need_bnd_p)
>> + fprintf (asm_out_file, "\tbnd call\t%s\n", thunk_name);
>> + else
>> + fprintf (asm_out_file, "\tcall\t%s\n", thunk_name);
>> + return;
>> + }
>> +
>> + char indirectlabel1[32];
>> + char indirectlabel2[32];
>> +
>> + ASM_GENERATE_INTERNAL_LABEL (indirectlabel1,
>> + INDIRECT_LABEL,
>> + indirectlabelno++);
>> + ASM_GENERATE_INTERNAL_LABEL (indirectlabel2,
>> + INDIRECT_LABEL,
>> + indirectlabelno++);
>> +
>> + /* Jump. */
>> + if (need_bnd_p)
>> + fputs ("\tbnd jmp\t", asm_out_file);
>> + else
>> + fputs ("\tjmp\t", asm_out_file);
>> + assemble_name_raw (asm_out_file, indirectlabel2);
>> + fputc ('\n', asm_out_file);
>> +
>> + ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel1);
>> +
>> + if (MEM_P (call_op))
>> + {
>> + struct ix86_address parts;
>> + rtx addr = XEXP (call_op, 0);
>> + if (ix86_decompose_address (addr, &parts)
>> + && parts.base == stack_pointer_rtx)
>> + {
>> + /* Since call will adjust stack by -UNITS_PER_WORD,
>> + we must convert "disp(stack, index, scale)" to
>> + "disp+UNITS_PER_WORD(stack, index, scale)". */
>> + if (parts.index)
>> + {
>> + addr = gen_rtx_MULT (Pmode, parts.index,
>> + GEN_INT (parts.scale));
>> + addr = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>> + addr);
>> + }
>> + else
>> + addr = stack_pointer_rtx;
>> +
>> + rtx disp;
>> + if (parts.disp != NULL_RTX)
>> + disp = plus_constant (Pmode, parts.disp,
>> + UNITS_PER_WORD);
>> + else
>> + disp = GEN_INT (UNITS_PER_WORD);
>> +
>> + addr = gen_rtx_PLUS (Pmode, addr, disp);
>> + call_op = gen_rtx_MEM (GET_MODE (call_op), addr);
>> + }
>> + }
>> +
>> + if (regno < 0)
>> + output_asm_insn (push_buf, &call_op);
>> +
>> + if (thunk_name != NULL)
>> + {
>> + if (need_bnd_p)
>> + fprintf (asm_out_file, "\tbnd jmp\t%s\n", thunk_name);
>> + else
>> + fprintf (asm_out_file, "\tjmp\t%s\n", thunk_name);
>> + }
>> + else
>> + output_indirect_thunk (need_bnd_p, regno);
>> +
>> + ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel2);
>> +
>> + /* Call. */
>> + if (need_bnd_p)
>> + fputs ("\tbnd call\t", asm_out_file);
>> + else
>> + fputs ("\tcall\t", asm_out_file);
>> + assemble_name_raw (asm_out_file, indirectlabel1);
>> + fputc ('\n', asm_out_file);
>> + }
>> +}
>
> It is really not very pretty that the whole sequence is injected into insn stream
> as a single blob. How opaque it is? Does it need to be patched at dynamic link time?
It must be very opaque to optimizers. No, we don't patch at load-time.
> I suppose it may make sense to split the insn and at least explicitly represent
> the fact that we (sometimes) push the target to stack.
Did you mean "split the function"? I will break it into
ix86_output_indirect_branch_via_reg and
ix86_output_indirect_branch_via_push.
> Why the memory variant exists at first place?
When the function address is in memory, we can push it onto stack
to save a register. Also it is needed to cover "call foo" to "call [foo@GOT]"
conversion.
>> (define_insn "*indirect_jump"
>> [(set (pc) (match_operand:W 0 "indirect_branch_operand" "rBw"))]
>> ""
>> - "%!jmp\t%A0"
>> + "* return ix86_output_indirect_jmp (operands[0], false);"
>> [(set_attr "type" "ibr")
>> (set_attr "length_immediate" "0")
>> (set_attr "maybe_prefix_bnd" "1")])
>
> I think you also want to update type to "many" when we do more than just indirect branch.
Did you mean "multi"? I will change to "multi".
> We do not care much about this, but it feels wrong to have attributes off reality.
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index f3e4a63ab46..ddb6035be96 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -5754,6 +5754,16 @@ Specify which floating-point unit to use. You must specify the
>> @code{target("fpmath=sse+387")} because the comma would separate
>> different options.
>>
>> +@item indirect_branch("@var{choice}")
>> +@cindex @code{indirect_branch} function attribute, x86
>> +On x86 targets, the @code{indirect_branch} attribute causes the compiler
>> +to convert indirect call and jump with @var{choice}. @samp{keep}
>> +keeps indirect call and jump unmodified. @samp{thunk} converts indirect
>> +call and jump to call and return thunk. @samp{thunk-inline} converts
>> +indirect call and jump to inlined call and return thunk.
>> +@samp{thunk-extern} converts indirect call and jump to external call
>> +and return thunk provided in a separate object file.
>
> Please expand the documentation in a way that random user who is not aware of the
> issue will understand that those are security features that come at a cost.
>
>> +@opindex -mindirect-branch
>> +Convert indirect call and jump with @var{choice}. The default is
>> +@samp{keep}, which keeps indirect call and jump unmodified.
>> +@samp{thunk} converts indirect call and jump to call and return thunk.
>> +@samp{thunk-inline} converts indirect call and jump to inlined call
>> +and return thunk. @samp{thunk-extern} converts indirect call and jump
>> +to external call and return thunk provided in a separate object file.
>> +You can control this behavior for a specific function by using the
>> +function attribute @code{indirect_branch}. @xref{Function Attributes}.
>
> Similarly here.
I will update documentation with user guide info after Intel white
paper is published.
> Rest of the patch seems OK. We may want incrementally represent more of the
> indirect jump/call seqeuence in RTL, but at this point probably keeping things
> simple and localized is not a bad idea. This can be done incrementally.
>
> Please make updated patch and I would like to give others chance to comment today.
>
> Honza
--
H.J.
More information about the Gcc-patches
mailing list