This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=


On Tue, Jun 2, 2015 at 1:56 PM, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
> On Tue, Jun 2, 2015 at 7:15 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Mon, Jun 1, 2015 at 1:33 PM, Ramana Radhakrishnan
>> <ramana.gcc@googlemail.com> wrote:
>>> On Mon, Jun 1, 2015 at 7:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan
>>>> <ramana.gcc@googlemail.com> wrote:
>>>>> On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>>> On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan
>>>>>> <ramana.radhakrishnan@arm.com> wrote:
>>>>>>>
>>>>>>>>> Why isn't it just an indirect call in the cases that would require a GOT
>>>>>>>>> slot and a direct call otherwise ? I'm trying to work out what's so
>>>>>>>>> different on each target that mandates this to be in the target backend.
>>>>>>>>> Also it would be better to push the tests into gcc.dg if you can and
>>>>>>>>> check
>>>>>>>>> for the absence of a relocation so that folks at least see these as being
>>>>>>>>> UNSUPPORTED on their target.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> To be even more explicit, shouldn't this be handled similar to the way in
>>>>>>> which -fno-plt is handled in a target agnostic manner ? After all, if you
>>>>>>> can handle this for the command line, doing the same for a function which
>>>>>>> has been decorated with attribute((noplt)) should be simple.
>>>>>>
>>>>>> -fno-plt does not work for non-PIC code, having non-PIC code not use
>>>>>> PLT was my primary motivation.  Infact, if you go back in this thread,
>>>>>> I suggested to HJ if I should piggyback on -fno-plt.  I tried using
>>>>>> the -fno-plt implementation to do this by removing the flag_pic check
>>>>>> in calls.c, but that does not still work for non-PIC code.
>>>
>>> If you want __attribute__ ((noplt)) to work for non-PIC code, we
>>> should look to code it in the same place surely by making all
>>> __attribute__((noplt)) calls, indirect calls irrespective of whether
>>> it's fpic or not.
>>>
>>>
>>>>>
>>>>> You're missing my point, unless I'm missing something basic here - I
>>>>> should have been even more explicit and said -fPIC was a given in all
>>>>> this discussion.
>>>>>
>>>>> calls.c:229 has
>>>>>
>>>>> else if (flag_pic && !flag_plt && fndecl_or_type
>>>>>            && TREE_CODE (fndecl_or_type) == FUNCTION_DECL
>>>>>            && !targetm.binds_local_p (fndecl_or_type))
>>>>>
>>>>> why can't we merge the check in here for the attribute noplt ?
>>>>
>>>> We can and and please see this thread, that is the exact patch I proposed :
>>>> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html
>>>>
>>>> However, there was one caveat.  I want this working without -fPIC too.
>>>> non-PIC code also generates PLT calls and I want them eliminated.
>>>>
>>>>>
>>>>> If a new attribute is added to the "GNU language" in this case, why
>>>>> isn't this being treated in the same way as the command line option
>>>>> has been treated ? All this means is that we add an attribute and a
>>>>> command line option to common code and then not implement it in a
>>>>> proper target agnostic fashion.
>>>>
>>>> You are right.  This is the way I wanted it too but I also wanted the
>>>> attribute to work without PIC. PLT calls are generated without -fPIC
>>>> and -fPIE too and I wanted a solution for that.  On looking at the
>>>> code in more detail,
>>>>
>>>> * -fno-plt is made to work with -fPIC, is there a reason to not make
>>>> it work for non-PIC code?  I can remove the flag_pic check from
>>>> calls.c
>>>
>>> I don't think that's right, you probably have to allow that along with
>>> (flag_pic || (decl && attribute_no_plt (decl)) - however it seems odd
>>> to me that the language extension allows this but the flag doesn't.
>>>
>>>> * Then, I add the generic attribute "noplt" and everything is fine.
>>>>
>>>> There is just one caveat with the above approach, for x86_64
>>>> (*call_insn) will not generate indirect-calls for *non-PIC* code
>>>> because constant_call_address_operand in predicates.md will evaluate
>>>> to false.  This can be fixed appropriately in ix86_output_call_insn in
>>>> i386.c.
>>>
>>> Yes, targets need to massage that into place but that's essentially
>>> the mechanics of retaining indirect calls in each backend. -fno-plt
>>> doesn't work for ARM / AArch64 with optimizers currently (and I
>>> suspect on most other targets) because our predicates are too liberal,
>>> fixed by treating "noplt" or -fno-plt as the equivalent of
>>> -mlong-calls.
>>>
>>>>
>>>>
>>>> Is this alright?  Sorry for the confusion, but the primary reason why
>>>> I did not do it the way you suggested is because we wanted "noplt"
>>>> attribute to work for non-PIC code also.
>>>
>>> If that is the case, then this is a slightly more complicated
>>> condition in the same place. We then always have indirect calls for
>>> functions that are marked noplt and just have target generate this
>>> appropriately.
>>
>> I have now modified this patch.
>
> Thanks for taking care of this. I'll have a read through tomorrow
> morning when I'm at my normal work machine.
>
>>
>> This patch does two things:
>>
>> 1) Adds new generic function attribute "no_plt" that is similar in
>> functionality  to -fno-plt except that it applies only to calls to
>> functions that are marked  with this attribute.
>> 2) For x86_64, it makes -fno-plt(and the attribute) also work for
>> non-PIC code by  directly generating an indirect call via a GOT entry.
>
> I'm sorry I'm going to push back again for the same reason.

Let me describe the problem I am having in a little more detail:

For the PIC case, I think there is no confusion. Both of us agree on
what is being done. Attribute no_plt exactly shadows -fno-plt and is
completely target independent.

For the non-PIC case, this is where some target dependent portions are
needed.  This is because I simply cannot remove the flag_pic check in
calls.c and force the address onto a register. Lets say I did that
with this patch:

Index: calls.c
===================================================================
--- calls.c (revision 223720)
+++ calls.c (working copy)
@@ -226,8 +226,10 @@ prepare_call_address (tree fndecl_or_type, rtx fun
        && targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
       ? force_not_mem (memory_address (FUNCTION_MODE, funexp))
       : memory_address (FUNCTION_MODE, funexp));
-  else if (flag_pic && !flag_plt && fndecl_or_type
+  else if (fndecl_or_type
    && TREE_CODE (fndecl_or_type) == FUNCTION_DECL
+   && (!flag_plt
+       || lookup_attribute ("no_plt", DECL_ATTRIBUTES (fndecl_or_type)))
    && !targetm.binds_local_p (fndecl_or_type))
     {
       funexp = force_reg (Pmode, funexp);

what would the code look like for this example below in the non-PIC case:

__attribute__((no_plt))
extern int foo();

int main ()
{
  return foo();
}


Without -O2:

mov _Z3foov, %eax
call *%eax

The indirect call is there but this is wrong because this will force
the linker to still create a PLT entry for foo and use that address.
This is worse than calling the PLT directly as we end up calling the
PLT indirectly.

Now, with -O2:
call *_Z3foov

and again same story.  The linker creates a PLT entry for foo and
calls foo_plt indirectly.

What we really need to do in the non-PIC case, if we need a target
independent solution, is pretend that the call to foo is like a PIC
call when we see the attribute.  I looked at how to do this and the
change to me seems pretty hairy and that is why it seemed like it is
better to handle this in the target directly.

Thanks
Sri


>
> Other than forcing targets to tweak their call insn patterns, the act
> of generating the indirect call should remain in target independent
> code. Sorry, not having the same behaviour on all platforms for
> something like this is just a recipe for confusion.
>
> regards
> Ramana
>
>>
>> For PIC code, no_plt merely shadows the implementation of -fno-plt, no
>> surprises here.
>>
>> * c-family/c-common.c (no_plt): New attribute.
>> (handle_no_plt_attribute): New handler.
>> * calls.c (prepare_call_address): Check for no_plt
>> attribute.
>> * config/i386/i386.c (ix86_function_ok_for_sibcall): Check
>> for no_plt attribute.
>> (ix86_expand_call):  Ditto.
>> (nopic_no_plt_attribute): New function.
>> (ix86_output_call_insn): Output indirect call for non-pic
>> no plt calls.
>> * doc/extend.texi (no_plt): Document new attribute.
>> * testsuite/gcc.target/i386/noplt-1.c: New test.
>> * testsuite/gcc.target/i386/noplt-2.c: New test.
>> * testsuite/gcc.target/i386/noplt-3.c: New test.
>> * testsuite/gcc.target/i386/noplt-4.c: New test.
>>
>>
>> Please review.
>>
>> Thanks
>> Sri
>>
>>
>>>
>>> To be honest, this is trivial to implement in the ARM backend as one
>>> would just piggy back on the longcalls work - despite that, IMNSHO
>>> it's best done in a target independent manner.
>>>
>>> regards
>>> Ramana
>>>
>>>>
>>>> Thanks
>>>> Sri
>>>>
>>>>>
>>>>> regards
>>>>> Ramana
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> I am not familiar with PLT calls for other targets.  I can move the
>>>>>>>> tests to gcc.dg but what relocation are you suggesting I check for?
>>>>>>>
>>>>>>>
>>>>>>> Move the test to gcc.dg, add a target_support_no_plt function in
>>>>>>> testsuite/lib/target-supports.exp and mark this as being supported only on
>>>>>>> x86 and use scan-assembler to scan for PLT relocations for x86. Other
>>>>>>> targets can add things as they deem fit.
>>>>>>
>>>>>>>
>>>>>>> In any case, on a large number of elf/ linux targets I would have thought
>>>>>>> the absence of a JMP_SLOT relocation would be good enough to check that this
>>>>>>> is working correctly.
>>>>>>>
>>>>>>> regards
>>>>>>> Ramana
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Sri
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ramana
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Also I think the PLT calls have EBX in call fusage wich is added by
>>>>>>>>>>> ix86_expand_call.
>>>>>>>>>>>    else
>>>>>>>>>>>      {
>>>>>>>>>>>        /* Static functions and indirect calls don't need the pic
>>>>>>>>>>> register.  */
>>>>>>>>>>>        if (flag_pic
>>>>>>>>>>>            && (!TARGET_64BIT
>>>>>>>>>>>                || (ix86_cmodel == CM_LARGE_PIC
>>>>>>>>>>>                    && DEFAULT_ABI != MS_ABI))
>>>>>>>>>>>            && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
>>>>>>>>>>>            && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
>>>>>>>>>>>          {
>>>>>>>>>>>            use_reg (&use, gen_rtx_REG (Pmode,
>>>>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM));
>>>>>>>>>>>            if (ix86_use_pseudo_pic_reg ())
>>>>>>>>>>>              emit_move_insn (gen_rtx_REG (Pmode,
>>>>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM),
>>>>>>>>>>>                              pic_offset_table_rtx);
>>>>>>>>>>>          }
>>>>>>>>>>>
>>>>>>>>>>> I think you want to take that away from FUSAGE there just like we do
>>>>>>>>>>> for
>>>>>>>>>>> local calls
>>>>>>>>>>> (and in fact the code should already check flag_pic && flag_plt I
>>>>>>>>>>> suppose.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Done that now and patch attached.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Sri
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Honza
>>>>>>>>
>>>>>>>>
>>>>>>>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]