[PATCH, i386, Pointer Bounds Checker 31/x] Pointer Bounds Checker builtins for i386 target

Uros Bizjak ubizjak@gmail.com
Wed Sep 17 18:07:00 GMT 2014


On Wed, Sep 17, 2014 at 6:31 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:

>> >> I don't like the way arguments are prepared. For the case above,
>> >> bnd_ldx should have index_register_operand predicate in its pattern,
>> >> and this predicate (and its mode) should be checked in the expander
>> >> code. There are many examples of argument expansion in
>> >> ix86_expand_builtin function, including how Pmode is handled.
>> >>
>> >> Also, please see how target is handled there. Target can be null, so
>> >> REG_P predicate will crash.
>> >>
>> >> You should also select insn patterns depending on BNDmode, not TARGET_64BIT.
>> >>
>> >> Please use assign_386_stack_local so stack slots can be shared.
>> >> SLOT_TEMP is intended for short-lived temporaries, you can introduce
>> >> new slots if you need more live values at once.
>> >>
>> >> Uros.
>> >
>> > Thanks for comments!  Here is a new version in which I addressed all your concerns.
>>
>> Unfortunately, it doesn't. The patch only fixed one instance w.r.t to
>> target handling, the one I referred as an example. You still have
>> unchecked target, at least in IX86_BUILTIN_BNDMK.
>>
>> However, you have a general problems in your builtin expansion code,
>> so please look at how other builtins are handled. E.g.:
>>
>>   if (optimize || !target
>>       || GET_MODE (target) != tmode
>>       || !register_operand(target, tmode))
>>     target = gen_reg_rtx (tmode);
>>
>> also, here is an example how input operands are prepared:
>>
>>       op0 = expand_normal (arg0);
>>       op1 = expand_normal (arg1);
>>       op2 = expand_normal (arg2);
>>       if (!register_operand (op0, Pmode))
>>     op0 = ix86_zero_extend_to_Pmode (op0);
>>       if (!register_operand (op1, SImode))
>>     op1 = copy_to_mode_reg (SImode, op1);
>>       if (!register_operand (op2, SImode))
>>     op2 = copy_to_mode_reg (SImode, op2);
>>
>> So, Pmode is handled in a special way, even when x32 is not considered.
>>
>> BTW: I wonder if word_mode is needed here, Pmode can be SImode with
>> address prefix (x32).
>>
>> Inside the expanders, please use expand_simple_binop and expand_unop
>> on RTX, not tree expressions. Again, please see many examples.
>
> Thank you for additional explanations.  Hope this time I answer your concerns correctly :)

Yes, this version is MUCH better. There are further comments down the code.

> 2014-09-17  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>         * config/i386/i386-builtin-types.def (BND): New.
>         (ULONG): New.
>         (BND_FTYPE_PCVOID_ULONG): New.
>         (VOID_FTYPE_BND_PCVOID): New.
>         (VOID_FTYPE_PCVOID_PCVOID_BND): New.
>         (BND_FTYPE_PCVOID_PCVOID): New.
>         (BND_FTYPE_PCVOID): New.
>         (BND_FTYPE_BND_BND): New.
>         (PVOID_FTYPE_PVOID_PVOID_ULONG): New.
>         (PVOID_FTYPE_PCVOID_BND_ULONG): New.
>         (ULONG_FTYPE_VOID): New.
>         (PVOID_FTYPE_BND): New.
>         * config/i386/i386.c: Include tree-chkp.h, rtl-chkp.h.
>         (ix86_builtins): Add
>         IX86_BUILTIN_BNDMK, IX86_BUILTIN_BNDSTX,
>         IX86_BUILTIN_BNDLDX, IX86_BUILTIN_BNDCL,
>         IX86_BUILTIN_BNDCU, IX86_BUILTIN_BNDRET,
>         IX86_BUILTIN_BNDNARROW, IX86_BUILTIN_BNDINT,
>         IX86_BUILTIN_SIZEOF, IX86_BUILTIN_BNDLOWER,
>         IX86_BUILTIN_BNDUPPER.
>         (builtin_isa): Add leaf_p and nothrow_p fields.
>         (def_builtin): Initialize leaf_p and nothrow_p.
>         (ix86_add_new_builtins): Handle leaf_p and nothrow_p
>         flags.
>         (bdesc_mpx): New.
>         (bdesc_mpx_const): New.
>         (ix86_init_mpx_builtins): New.
>         (ix86_init_builtins): Call ix86_init_mpx_builtins.
>         (ix86_emit_cmove): New.
>         (ix86_emit_move_max): New.
>         (ix86_expand_builtin): Expand IX86_BUILTIN_BNDMK,
>         IX86_BUILTIN_BNDSTX, IX86_BUILTIN_BNDLDX,
>         IX86_BUILTIN_BNDCL, IX86_BUILTIN_BNDCU,
>         IX86_BUILTIN_BNDRET, IX86_BUILTIN_BNDNARROW,
>         IX86_BUILTIN_BNDINT, IX86_BUILTIN_SIZEOF,
>         IX86_BUILTIN_BNDLOWER, IX86_BUILTIN_BNDUPPER.
>         * config/i386/i386.h (ix86_stack_slot): Added SLOT_BND_STORED.

..

> +       /* We need to move bounds to memory before any computations.  */
> +       if (!MEM_P (op1))
> +         {
> +           m1 = assign_386_stack_local (BNDmode, SLOT_TEMP);
> +           emit_move_insn (m1, op1);
> +         }
> +       else
> +         m1 = op1;

No negative conditions, please. Just swap the arms of if sentence. It
is much more readable.

> +
> +       /* Generate mem expression to be used for access to LB and UB.  */
> +       m1h1 = gen_rtx_MEM (Pmode, XEXP (m1, 0));
> +       m1h2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (m1, 0),
> +                                                 GET_MODE_SIZE (Pmode)));

Please use adjust_address  instead of manually producing MEMs.

> +
> +       t1 = gen_reg_rtx (Pmode);
> +
> +       /* Compute LB.  */
> +       emit_move_insn (t1, m1h1);
> +       ix86_emit_move_max (t1, lb);
> +       emit_move_insn (m1h1, t1);
> +
> +       /* Compute UB.  UB is stored in 1's complement form.  Therefore
> +          we also use max here.  */
> +       emit_move_insn (t1, m1h2);
> +       ix86_emit_move_max (t1, ub);
> +       emit_move_insn (m1h2, t1);
> +
> +       op2 = gen_reg_rtx (BNDmode);
> +       emit_move_insn (op2, m1);
> +
> +       return chkp_join_splitted_slot (lb, op2);
> +      }
> +
> +    case IX86_BUILTIN_BNDINT:

The handling in this builtin looks strange.

I suggest to do it in a serial way, like this:

if (MEM_P (op0)
  m0 = op0;
else
  {
    m0 = stack (SLOT_TEMP)
    move (op0, m0)
  }

move parts of m0 to temporaries.

if (MEM_P (op1)
  m1 = op1;
else
  {
    m1 = stack (SLOT_TEMP)
    move (op1, m1)
  }

move parts of m1 to another temporaries.

Process temporaries.

res = stack (SLOT_BND_STORED)

move calculated stuff to res.

This will ensure that nobody will clobber your stack slot with the
result, assuming consumer will soon use it. SLOT_TEMP is a short-lived
slot, and can be reused.

> +      {
> +       unsigned bndsize = GET_MODE_SIZE (BNDmode);
> +       unsigned psize = GET_MODE_SIZE (Pmode);
> +       rtx res, m1, m2, m1h1, m1h2, m2h1, m2h2, t1, t2, rh1, rh2;
> +
> +       arg0 = CALL_EXPR_ARG (exp, 0);
> +       arg1 = CALL_EXPR_ARG (exp, 1);
> +
> +       op0 = expand_normal (arg0);
> +       op1 = expand_normal (arg1);
> +
> +       /* We need to move bounds to memory before any computations.  */
> +       if (!MEM_P (op0))
> +         {
> +           m1 = assign_386_stack_local (BNDmode, SLOT_TEMP);
> +           emit_move_insn (m1, op0);
> +         }
> +       else
> +         m1 = op0;
> +
> +       if (!MEM_P (op1))
> +         {
> +           m2 = assign_386_stack_local (BNDmode,
> +                                        MEM_P (op0)
> +                                        ? SLOT_TEMP
> +                                        : SLOT_BND_STORED);
> +           emit_move_insn (m2, op1);
> +         }
> +       else
> +         m2 = op1;
> +
> +       if (!MEM_P (op0))
> +         res = m1;
> +       else if (!MEM_P (op1))
> +         res = m2;
> +       else
> +         res = assign_386_stack_local (BNDmode, SLOT_TEMP);

> +       /* Generate mem expression to be used for access to LB and UB.  */
> +       m1h1 = gen_rtx_MEM (Pmode, XEXP (m1, 0));
> +       m1h2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (m1, 0), psize));
> +       m2h1 = gen_rtx_MEM (Pmode, XEXP (m2, 0));
> +       m2h2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (m2, 0), psize));
> +       rh1 = gen_rtx_MEM (Pmode, XEXP (res, 0));
> +       rh2 = gen_rtx_MEM (Pmode, plus_constant (Pmode, XEXP (res, 0), psize));

Please use adjust_address here.

> +
> +       /* Allocate temporaries.  */
> +       t1 = gen_reg_rtx (Pmode);
> +       t2 = gen_reg_rtx (Pmode);
> +
> +       /* Compute LB.  */
> +       emit_move_insn (t1, m1h1);
> +       emit_move_insn (t2, m2h1);
> +       ix86_emit_move_max (t1, t2);
> +       emit_move_insn (rh1, t1);
> +
> +       /* Compute UB.  UB is stored in 1's complement form.  Therefore
> +          we also use max here.  */
> +       emit_move_insn (t1, m1h2);
> +       emit_move_insn (t2, m2h2);
> +       ix86_emit_move_max (t1, t2);
> +       emit_move_insn (rh2, t1);
> +
> +       return res;
> +      }
> +
> +    case IX86_BUILTIN_SIZEOF:
> +      {
> +       enum machine_mode mode = Pmode;

No need for the above temporary...

> +       tree name;
> +       rtx temp;
> +
> +       if (!target
> +           || GET_MODE (target) != Pmode
> +           || !register_operand (target, Pmode))
> +         target = gen_reg_rtx (Pmode);
> +
> +       arg0 = CALL_EXPR_ARG (exp, 0);
> +       gcc_assert (TREE_CODE (arg0) == VAR_DECL);
> +
> +       name = DECL_ASSEMBLER_NAME (arg0);
> +       temp = gen_rtx_SYMBOL_REF (Pmode, IDENTIFIER_POINTER (name));
> +       temp = gen_rtx_UNSPEC (mode, gen_rtvec (1, temp), UNSPEC_SIZEOF);

Call expander directly, use target as an operand 0. You won't need move below.

> +       emit_move_insn (target, temp);
> +
> +       return target;
> +      }
> +
> +    case IX86_BUILTIN_BNDLOWER:
> +      {
> +       rtx mem, hmem;
> +
> +       if (!target
> +           || GET_MODE (target) != Pmode
> +           || !register_operand (target, Pmode))
> +         target = gen_reg_rtx (Pmode);
> +
> +       arg0 = CALL_EXPR_ARG (exp, 0);
> +       op0 = expand_normal (arg0);
> +
> +       /* We need to move bounds to memory first.  */
> +       if (!MEM_P (op0))
> +         {
> +           mem = assign_386_stack_local (BNDmode, SLOT_TEMP);
> +           emit_move_insn (mem, op0);
> +         }
> +       else
> +         mem = op0;

No negative conditions.

> +       /* Generate mem expression to access LB and load it.  */
> +       hmem = gen_rtx_MEM (Pmode, XEXP (mem, 0));

adjust_address again.

> +       emit_move_insn (target, hmem);
> +
> +       return target;
> +      }
> +
> +    case IX86_BUILTIN_BNDUPPER:
> +      {
> +       rtx mem, hmem;
> +
> +       if (!target
> +           || GET_MODE (target) != Pmode
> +           || !register_operand (target, Pmode))
> +         target = gen_reg_rtx (Pmode);
> +
> +       arg0 = CALL_EXPR_ARG (exp, 0);
> +       op0 = expand_normal (arg0);
> +
> +       /* We need to move bounds to memory first.  */
> +       if (!MEM_P (op0))
> +         {
> +           mem = assign_386_stack_local (BNDmode, SLOT_TEMP);
> +           emit_move_insn (mem, op0);
> +         }
> +       else
> +         mem = op0;

No negative conditions.

> +       /* Generate mem expression to access UB and load it.  */
> +       hmem = gen_rtx_MEM (Pmode,
> +                           gen_rtx_PLUS (Pmode, XEXP (mem, 0),
> +                                         GEN_INT (GET_MODE_SIZE (Pmode))));

adjust_address again.

> +       emit_move_insn (target, hmem);
> +
> +       /* We need to inverse all bits of UB.  */
> +       emit_move_insn (target, gen_rtx_NOT (Pmode, target));

Use emit_simple_unop here.

> +
> +       return target;
> +      }
> +
>      case IX86_BUILTIN_MASKMOVQ:
>      case IX86_BUILTIN_MASKMOVDQU:
>        icode = (fcode == IX86_BUILTIN_MASKMOVQ
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index a38c5d1..ededa67 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2317,6 +2317,7 @@ enum ix86_stack_slot
>    SLOT_CW_FLOOR,
>    SLOT_CW_CEIL,
>    SLOT_CW_MASK_PM,
> +  SLOT_BND_STORED,
>    MAX_386_STACK_LOCALS
>  };
>

Uros.



More information about the Gcc-patches mailing list