This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Expand PIC calls without PLT with -fno-plt


On Mon, 22 Jun 2015, Jiong Wang wrote:
> Have done a quick experiment, -fno-plt doesn't work on AArch64.
> 
> it's because although this patch force the function address into register,
> but the combine pass runs later combine it back as AArch64 have defined such
> insn pattern.
> 
> For X86, it's not combined back. From the rtl dump, it's because the rtl pre
> pass has moved the address load instruction into another basic block and
> combine pass don't combine across basic blocks. Also, x86 backend has done
> some check on flag_plt in the new added ix86_nopic_noplt_attribute_p which
> could help generate correct insns.
> 
> What I can think of the fix on AArch64 is by restricting the call symbol
> under "flag_plt == true" only, so that call via register can't be combined
> into call symbol direct,
> 
> Or better to prohibit combine pass for such combining? as the generic fix on
> combine may fix other broken targets.

My colleagues at ISP RAS (CC'ed) have been looking on arm (and aarch64) no-plt
codegen.  We also saw the problem with the combine pass you describe.  I think
your description of why it's not observed on x86 is incorrect; the newly added
ix86_nopic_noplt_attribute_p should not have anything to do with that.  It's
just that the GOT load insn has a REG_EQUAL note, and the combine pass can use
it to replace the register in the indirect branch, producing a direct branch
to a symbol (i.e. a PLT jump).

Actually we are not hitting the same problem on x86 by pure luck.  Early RTL
passes manage to lose the REG_EQUAL note, so by the time combine runs, the
register annotation is lost.  It's possible to reproduce the arm/aarch64
problem on x86 with -fno-gcse and the following hack:

diff --git a/gcc/cse.c b/gcc/cse.c
index 2a33827..88cff96 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -6634,6 +6634,9 @@ cse_main (rtx_insn *f ATTRIBUTE_UNUSED, int nregs)
   int *rc_order = XNEWVEC (int, last_basic_block_for_fn (cfun));
   int i, n_blocks;
 
+  if (!flag_gcse)
+    return 0;
+
   df_set_flags (DF_LR_RUN_DCE);
   df_note_add_problem ();
   df_analyze ();

Regarding fixing the issue, I also think that combine pass might be a better
place (than the backends).  I'd appreciate comments from maintainers.


If you try disabling the REG_EQUAL note generation [*], you'll probably find a
performance regression on arm32 (and probably on aarch64 as well? we only
tried arm32 so far).  The main reason for that is that GCC emits pretty bad
code for a GOT load.  Instead of using two add instructions and one ldr for
the GOT slot access, like the PLT stubs do, it uses three(!) ldr instructions
and one add.  The first ldr is for loading the GOT address, and the second is
for the offset of the GOT slot.  As I understand, to fix that, GCC has to
learn using the GOT_PREL relocation type.

[*] To do that, we hacked arm legitimize_pic_address not to emit REG_EQUAL
note under !flag_plt.

Alexander


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]