This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=
- From: Ramana Radhakrishnan <ramana dot radhakrishnan at arm dot com>
- To: Sriraman Tallam <tmsriram at google dot com>
- Cc: Jan Hubicka <hubicka at ucw dot cz>, "H.J. Lu" <hjl dot tools at gmail dot com>, Pedro Alves <palves at redhat dot com>, Michael Matz <matz at suse dot de>, David Li <davidxl at google dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 03 Jun 2015 15:27:11 +0100
- Subject: Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=
- Authentication-results: sourceware.org; auth=none
- References: <CAMe9rOrCh++A=jLOcDVXGM+UOyn31kuJ+R5ZmZkw8E_eHmCTHQ at mail dot gmail dot com> <CAMe9rOqVHuFbddkdKrkn1zaA9ASvT76sTn3enKcYo6K_4_Z77Q at mail dot gmail dot com> <CAAs8HmzRztBefZKx7KoB2LwvVWydRXDDdrw5FnexcphvXPR7gQ at mail dot gmail dot com> <20150529193552 dot GA52215 at kam dot mff dot cuni dot cz> <CAAs8Hmxg2a77HdJuedSkO8EkfROvSx_V2VOcZtWUAEiXLC3G6g at mail dot gmail dot com> <CAJA7tRYsMiq7rx34c=z6KwRdwYxxaeP6Z6qzA4XEwnJSMT7z=Q at mail dot gmail dot com> <CAAs8HmxB9NyJQHRxTLj4gKntDgwFfri0VvwSR6vfA1HDTpFHaQ at mail dot gmail dot com> <556C16B1 dot 5080606 at arm dot com> <CAAs8Hmy3jZDkE1hNmojx9rBBAZ9tjCYHXHevojMTAYHZ5kp8hA at mail dot gmail dot com> <CAJA7tRbHAHi7i1xboTZadrJLE_Ry628pwLot6f3wdK4KzawqCQ at mail dot gmail dot com> <CAAs8Hmy6gFfoKrVWJ2tCWQZ6Lkez2DCikWozVgwFcY-peDtOhw at mail dot gmail dot com> <CAJA7tRYUMvHt04m-+uM9irXRXC57iV9pP0jqQKoj9AJ+rwuo2g at mail dot gmail dot com> <CAAs8HmzcD4-t2bbxVL93PSU-1Mp-AZ=Qd-JN8mmjw6pC1Q+h9g at mail dot gmail dot com> <CAJA7tRYJxSGNyrrVw4JCBaUF2bgM0UaJ_+Md-DNLDdKZHUxAYg at mail dot gmail dot com> <CAAs8HmxdA5Fa7Lu4UzQS_nkFuQNW=4zvv21ucAS3iMQp=OM+8w at mail dot gmail dot com>
Hi Sriraman,
Thanks for the detailed explanation, that was useful.
I'm sorry I'm going to push back again for the same reason.
Let me describe the problem I am having in a little more detail:
For the PIC case, I think there is no confusion. Both of us agree on
what is being done. Attribute no_plt exactly shadows -fno-plt and is
completely target independent.
Agreed.
For the non-PIC case, this is where some target dependent portions are
needed. This is because I simply cannot remove the flag_pic check in
calls.c and force the address onto a register. Lets say I did that
with this patch:
Of-course I should have realized this earlier - sorry for being a pain.
We need to load the value from the GOT (or an equivalent position
independent manner) and that is entirely handled by the backends,
there's no easy interface to do this from the mid-end.
I tried a horrible hack in calls.c which was -
int old_flag_pic = flag_pic;
flag_pic = 1;
funexp = force_reg (Pmode, funexp);
flag_pic = old_flag_pic;
We then have to relax quite a lot of checks in a number of places across
backends to handle !flag_plt which ain't worth it.
I agree now that it will be much cleaner just to punt this into the
backend, so it may be worth noting that making this work properly for
the non-PIC case requires quite a degree of massaging in the backends.
Objections withdrawn.
Thanks,
Ramana
Index: calls.c
===================================================================
--- calls.c (revision 223720)
+++ calls.c (working copy)
@@ -226,8 +226,10 @@ prepare_call_address (tree fndecl_or_type, rtx fun
&& targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
? force_not_mem (memory_address (FUNCTION_MODE, funexp))
: memory_address (FUNCTION_MODE, funexp));
- else if (flag_pic && !flag_plt && fndecl_or_type
+ else if (fndecl_or_type
&& TREE_CODE (fndecl_or_type) == FUNCTION_DECL
+ && (!flag_plt
+ || lookup_attribute ("no_plt", DECL_ATTRIBUTES (fndecl_or_type)))
&& !targetm.binds_local_p (fndecl_or_type))
{
funexp = force_reg (Pmode, funexp);
what would the code look like for this example below in the non-PIC case:
__attribute__((no_plt))
extern int foo();
int main ()
{
return foo();
}
Without -O2:
mov _Z3foov, %eax
call *%eax
The indirect call is there but this is wrong because this will force
the linker to still create a PLT entry for foo and use that address.
This is worse than calling the PLT directly as we end up calling the
PLT indirectly.
Now, with -O2:
call *_Z3foov
and again same story. The linker creates a PLT entry for foo and
calls foo_plt indirectly.
What we really need to do in the non-PIC case, if we need a target
independent solution, is pretend that the call to foo is like a PIC
call when we see the attribute. I looked at how to do this and the
change to me seems pretty hairy and that is why it seemed like it is
better to handle this in the target directly.
Thanks
Sri
Other than forcing targets to tweak their call insn patterns, the act
of generating the indirect call should remain in target independent
code. Sorry, not having the same behaviour on all platforms for
something like this is just a recipe for confusion.
regards
Ramana
For PIC code, no_plt merely shadows the implementation of -fno-plt, no
surprises here.
* c-family/c-common.c (no_plt): New attribute.
(handle_no_plt_attribute): New handler.
* calls.c (prepare_call_address): Check for no_plt
attribute.
* config/i386/i386.c (ix86_function_ok_for_sibcall): Check
for no_plt attribute.
(ix86_expand_call): Ditto.
(nopic_no_plt_attribute): New function.
(ix86_output_call_insn): Output indirect call for non-pic
no plt calls.
* doc/extend.texi (no_plt): Document new attribute.
* testsuite/gcc.target/i386/noplt-1.c: New test.
* testsuite/gcc.target/i386/noplt-2.c: New test.
* testsuite/gcc.target/i386/noplt-3.c: New test.
* testsuite/gcc.target/i386/noplt-4.c: New test.
Please review.
Thanks
Sri
To be honest, this is trivial to implement in the ARM backend as one
would just piggy back on the longcalls work - despite that, IMNSHO
it's best done in a target independent manner.
regards
Ramana
Thanks
Sri
regards
Ramana
I am not familiar with PLT calls for other targets. I can move the
tests to gcc.dg but what relocation are you suggesting I check for?
Move the test to gcc.dg, add a target_support_no_plt function in
testsuite/lib/target-supports.exp and mark this as being supported only on
x86 and use scan-assembler to scan for PLT relocations for x86. Other
targets can add things as they deem fit.
In any case, on a large number of elf/ linux targets I would have thought
the absence of a JMP_SLOT relocation would be good enough to check that this
is working correctly.
regards
Ramana
Thanks
Sri
Ramana
Also I think the PLT calls have EBX in call fusage wich is added by
ix86_expand_call.
else
{
/* Static functions and indirect calls don't need the pic
register. */
if (flag_pic
&& (!TARGET_64BIT
|| (ix86_cmodel == CM_LARGE_PIC
&& DEFAULT_ABI != MS_ABI))
&& GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
&& ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
{
use_reg (&use, gen_rtx_REG (Pmode,
REAL_PIC_OFFSET_TABLE_REGNUM));
if (ix86_use_pseudo_pic_reg ())
emit_move_insn (gen_rtx_REG (Pmode,
REAL_PIC_OFFSET_TABLE_REGNUM),
pic_offset_table_rtx);
}
I think you want to take that away from FUSAGE there just like we do
for
local calls
(and in fact the code should already check flag_pic && flag_plt I
suppose.
Done that now and patch attached.
Thanks
Sri
Honza