[PATCH] x86-64: Load external function address via GOT slot

Uros Bizjak ubizjak@gmail.com
Tue Jun 21 18:22:00 GMT 2016


On Tue, Jun 21, 2016 at 2:40 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Jun 20, 2016 at 12:46 PM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Uros Bizjak <ubizjak@gmail.com> writes:
>>> On Mon, Jun 20, 2016 at 9:19 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Mon, Jun 20, 2016 at 12:13 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>>> On Mon, Jun 20, 2016 at 7:05 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> This patch implements the alternate code sequence recommended in
>>>>>>
>>>>>> https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI
>>>>>>
>>>>>> to load external function address via GOT slot with
>>>>>>
>>>>>> movq func@GOTPCREL(%rip), %rax
>>>>>>
>>>>>> so that linker won't create an PLT entry for extern function
>>>>>> address.
>>>>>>
>>>>>> Tested on x86-64.  OK for trunk?
>>>>>
>>>>>> +  else if (ix86_force_load_from_GOT_p (op1))
>>>>>> +    {
>>>>>> +      /* Load the external function address via the GOT slot to
>>>>>> +        avoid PLT.  */
>>>>>> +      op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
>>>>>> +                           (TARGET_64BIT
>>>>>> +                            ? UNSPEC_GOTPCREL
>>>>>> +                            : UNSPEC_GOT));
>>>>>> +      op1 = gen_rtx_CONST (Pmode, op1);
>>>>>> +      op1 = gen_const_mem (Pmode, op1);
>>>>>> +      /* This symbol must be referenced via a load from the Global
>>>>>> +        Offset Table.  */
>>>>>> +      set_mem_alias_set (op1, ix86_GOT_alias_set ());
>>>>>> +      op1 = convert_to_mode (mode, op1, 1);
>>>>>> +      op1 = force_reg (mode, op1);
>>>>>> +      emit_insn (gen_rtx_SET (op0, op1));
>>>>>> +      /* Generate a CLOBBER so that there will be no REG_EQUAL note
>>>>>> +        on the last insn to prevent cse and fwprop from replacing
>>>>>> +        a GOT load with a constant.  */
>>>>>> +      rtx tmp = gen_reg_rtx (Pmode);
>>>>>> +      emit_clobber (tmp);
>>>>>> +      return;
>>>>>
>>>>> Jeff, is this the recommended way to prevent CSE, as far as RTL
>>>>> infrastructure is concerned? I didn't find any example of this
>>>>> approach with other targets.
>>>>>
>>>>
>>>> FWIW, the similar approach is used in ix86_expand_vector_move_misalign,
>>>> ix86_expand_convert_uns_didf_sse and ix86_expand_vector_init_general
>>>> as well as other targets:
>>>>
>>>> frv/frv.c:  emit_clobber (op0);
>>>> frv/frv.c:  emit_clobber (op1);
>>>> im32c/m32c.c:  /*  emit_clobber (gen_rtx_REG (HImode, R0L_REGNO)); */
>>>> s390/s390.c:  emit_clobber (addr);
>>>> s390/s390.md:  emit_clobber (reg0);
>>>> s390/s390.md:  emit_clobber (reg1);
>>>> s390/s390.md:  emit_clobber (reg0);
>>>> s390/s390.md:  emit_clobber (reg0);
>>>> s390/s390.md:  emit_clobber (reg1);
>>>
>>> These usages mark the whole register as being "clobbered"
>>> (=undefined), before only a part of register is written, e.g.:
>>>
>>>       emit_clobber (int_xmm);
>>>       emit_move_insn (gen_lowpart (DImode, int_xmm), input);
>>>
>>> They aren't used to prevent unwanted CSE.
>>
>> Since it's being called in the move expander, I thought the normal
>> way of preventing the constant being rematerialised would be to reject
>> it in the move define_insn predicates.
>>
>> FWIW, I agree that using a clobber for this is going to be fragile.
>>
>
> Here is the patch without clobber.  Tested on x86-64.  OK for
> trunk?

No, your patch has multiple problems:

1. It won't work for e.g. &bar+1, since you have to legitimize the
symbol in two places of ix86_expand_move. Also, why use TARGET_64BIT
in

+      op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
+    (TARGET_64BIT
+     ? UNSPEC_GOTPCREL
+     : UNSPEC_GOT));

when ix86_force_load_from_GOT_p rejects non-64bit targets?

2. New check should be added to ix86_legitimate_constant_p, not to
predicates of move insn patterns. Unfortunately, we still have to
change x86_64_immediate_operand in two places.

I have attached my version of the patch. It handles all your
testcases, plus &foo+1 case. Bootstrap is still running.

Does the patch work for you?

Uros.
-------------- next part --------------
Index: i386-protos.h
===================================================================
--- i386-protos.h	(revision 237653)
+++ i386-protos.h	(working copy)
@@ -70,6 +70,7 @@ extern bool ix86_expand_set_or_movmem (rtx, rtx, r
 extern bool constant_address_p (rtx);
 extern bool legitimate_pic_operand_p (rtx);
 extern bool legitimate_pic_address_disp_p (rtx);
+extern bool ix86_force_load_from_GOT_p (rtx);
 extern void print_reg (rtx, int, FILE*);
 extern void ix86_print_operand (FILE *, rtx, int);
 
Index: i386.c
===================================================================
--- i386.c	(revision 237653)
+++ i386.c	(working copy)
@@ -15120,6 +15120,19 @@ darwin_local_data_pic (rtx disp)
 	  && XINT (disp, 1) == UNSPEC_MACHOPIC_OFFSET);
 }
 
+/* True if operand X should be loaded from GOT.  */
+
+bool
+ix86_force_load_from_GOT_p (rtx x)
+{
+  return (TARGET_64BIT && !TARGET_PECOFF && !TARGET_MACHO
+	  && !flag_plt && !flag_pic
+	  && ix86_cmodel != CM_LARGE
+	  && GET_CODE (x) == SYMBOL_REF
+	  && SYMBOL_REF_FUNCTION_P (x)
+	  && !SYMBOL_REF_LOCAL_P (x));
+}
+
 /* Determine if a given RTX is a valid constant.  We already know this
    satisfies CONSTANT_P.  */
 
@@ -15188,6 +15201,12 @@ ix86_legitimate_constant_p (machine_mode mode, rtx
       if (MACHO_DYNAMIC_NO_PIC_P)
 	return machopic_symbol_defined_p (x);
 #endif
+
+      /* External function address should be loaded
+	 via the GOT slot to avoid PLT.  */
+      if (ix86_force_load_from_GOT_p (x))
+	return false;
+
       break;
 
     CASE_CONST_SCALAR_INT:
@@ -15596,6 +15615,9 @@ ix86_legitimate_address_p (machine_mode, rtx addr,
 	    return false;
 
 	  case UNSPEC_GOTPCREL:
+	    if (ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 0), 0, 0)))
+	      goto is_legitimate_pic;
+	    /* FALLTHRU */
 	  case UNSPEC_PCREL:
 	    gcc_assert (flag_pic);
 	    goto is_legitimate_pic;
@@ -18164,6 +18186,12 @@ ix86_print_operand_address_as (FILE *file, rtx add
 	    fputs ("ds:", file);
 	  fprintf (file, HOST_WIDE_INT_PRINT_DEC, INTVAL (disp));
 	}
+      /* Load the external function address via the GOT slot to avoid PLT.  */
+      else if (GET_CODE (disp) == CONST
+	       && GET_CODE (XEXP (disp, 0)) == UNSPEC
+	       && XINT (XEXP (disp, 0), 1) == UNSPEC_GOTPCREL
+	       && ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 0), 0, 0)))
+	output_pic_addr_const (file, disp, 0);
       else if (flag_pic)
 	output_pic_addr_const (file, disp, 0);
       else
@@ -19406,6 +19434,15 @@ ix86_expand_move (machine_mode mode, rtx operands[
 	    return;
 	  op1 = convert_to_mode (mode, op1, 1);
 	}
+      else if (ix86_force_load_from_GOT_p (op1))
+	{
+	  /* Load the external function address via GOT slot to avoid PLT.  */
+	  op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
+				UNSPEC_GOTPCREL);
+	  op1 = gen_rtx_CONST (Pmode, op1);
+	  op1 = gen_const_mem (Pmode, op1);
+	  set_mem_alias_set (op1, ix86_GOT_alias_set ());
+	}
       else if ((tmp = legitimize_pe_coff_symbol (op1, false)) != NULL_RTX)
 	op1 = tmp;
     }
@@ -19420,6 +19457,15 @@ ix86_expand_move (machine_mode mode, rtx operands[
       model = SYMBOL_REF_TLS_MODEL (symbol);
       if (model)
 	tmp = legitimize_tls_address (symbol, model, true);
+      else if (ix86_force_load_from_GOT_p (symbol))
+	{
+	  /* Load the external function address via GOT slot to avoid PLT.  */
+	  tmp = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, symbol),
+				UNSPEC_GOTPCREL);
+	  tmp = gen_rtx_CONST (Pmode, tmp);
+	  tmp = gen_const_mem (Pmode, tmp);
+	  set_mem_alias_set (tmp, ix86_GOT_alias_set ());
+	}
       else
         tmp = legitimize_pe_coff_symbol (symbol, true);
 
Index: predicates.md
===================================================================
--- predicates.md	(revision 237653)
+++ predicates.md	(working copy)
@@ -160,13 +160,18 @@
         return trunc_int_for_mode (val, SImode) == val;
       }
     case SYMBOL_REF:
+      /* TLS symbols are not constant.  */
+      if (SYMBOL_REF_TLS_MODEL (op))
+	return false;
+
+      /* Load the external function address via the GOT slot.  */
+      if (ix86_force_load_from_GOT_p (op))
+	return false;
+
       /* For certain code models, the symbolic references are known to fit.
 	 in CM_SMALL_PIC model we know it fits if it is local to the shared
 	 library.  Don't count TLS SYMBOL_REFs here, since they should fit
 	 only if inside of UNSPEC handled below.  */
-      /* TLS symbols are not constant.  */
-      if (SYMBOL_REF_TLS_MODEL (op))
-	return false;
       return (ix86_cmodel == CM_SMALL || ix86_cmodel == CM_KERNEL
 	      || (ix86_cmodel == CM_MEDIUM && !SYMBOL_REF_FAR_ADDR_P (op)));
 
@@ -207,6 +212,11 @@
 	      /* TLS symbols are not constant.  */
 	      if (SYMBOL_REF_TLS_MODEL (op1))
 		return false;
+
+	      /* Load the external function address via the GOT slot.  */
+	      if (ix86_force_load_from_GOT_p (op))
+	        return false;
+
 	      /* For CM_SMALL assume that latest object is 16MB before
 		 end of 31bits boundary.  We may also accept pretty
 		 large negative constants knowing that all objects are
@@ -273,10 +283,11 @@
       return !(INTVAL (op) & ~(HOST_WIDE_INT) 0xffffffff);
 
     case SYMBOL_REF:
-      /* For certain code models, the symbolic references are known to fit.  */
       /* TLS symbols are not constant.  */
       if (SYMBOL_REF_TLS_MODEL (op))
 	return false;
+
+     /* For certain code models, the symbolic references are known to fit.  */
       return (ix86_cmodel == CM_SMALL
 	      || (ix86_cmodel == CM_MEDIUM
 		  && !SYMBOL_REF_FAR_ADDR_P (op)));


More information about the Gcc-patches mailing list