[PATCH] Expand PIC calls without PLT with -fno-plt

Ramana Radhakrishnan ramana.gcc@googlemail.com
Tue Jun 23 08:41:00 GMT 2015


On Mon, Jun 22, 2015 at 7:11 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
> On Mon, 22 Jun 2015, Jiong Wang wrote:
>> Have done a quick experiment, -fno-plt doesn't work on AArch64.
>>
>> it's because although this patch force the function address into register,
>> but the combine pass runs later combine it back as AArch64 have defined such
>> insn pattern.
>>
>> For X86, it's not combined back. From the rtl dump, it's because the rtl pre
>> pass has moved the address load instruction into another basic block and
>> combine pass don't combine across basic blocks. Also, x86 backend has done
>> some check on flag_plt in the new added ix86_nopic_noplt_attribute_p which
>> could help generate correct insns.
>>
>> What I can think of the fix on AArch64 is by restricting the call symbol
>> under "flag_plt == true" only, so that call via register can't be combined
>> into call symbol direct,
>>
>> Or better to prohibit combine pass for such combining? as the generic fix on
>> combine may fix other broken targets.
>
> My colleagues at ISP RAS (CC'ed) have been looking on arm (and aarch64) no-plt
> codegen.  We also saw the problem with the combine pass you describe.  I think
> your description of why it's not observed on x86 is incorrect; the newly added
> ix86_nopic_noplt_attribute_p should not have anything to do with that.  It's
> just that the GOT load insn has a REG_EQUAL note, and the combine pass can use
> it to replace the register in the indirect branch, producing a direct branch
> to a symbol (i.e. a PLT jump).


>
> Actually we are not hitting the same problem on x86 by pure luck.  Early RTL
> passes manage to lose the REG_EQUAL note, so by the time combine runs, the
> register annotation is lost.  It's possible to reproduce the arm/aarch64
> problem on x86 with -fno-gcse and the following hack:
>
> diff --git a/gcc/cse.c b/gcc/cse.c
> index 2a33827..88cff96 100644
> --- a/gcc/cse.c
> +++ b/gcc/cse.c
> @@ -6634,6 +6634,9 @@ cse_main (rtx_insn *f ATTRIBUTE_UNUSED, int nregs)
>    int *rc_order = XNEWVEC (int, last_basic_block_for_fn (cfun));
>    int i, n_blocks;
>
> +  if (!flag_gcse)
> +    return 0;
> +
>    df_set_flags (DF_LR_RUN_DCE);
>    df_note_add_problem ();
>    df_analyze ();
>
> Regarding fixing the issue, I also think that combine pass might be a better
> place (than the backends).  I'd appreciate comments from maintainers.
>
>

Not on AArch64 the GOT slot can be accessed with a single PC relative
instruction followed by a load, thus I don't expect there to any more
work to be done in the AArch64 backend other than massaging this into
an indirect call in the "call" related patterns.

So you'd get something like

adrp x0, :got:a
ldr x0, [x0, :got_lo12:a]
blr [x0]

and in the tiny model

ldr x0, :got:a
blr [x0]

if your elf module is small enough.

> If you try disabling the REG_EQUAL note generation [*], you'll probably find a
> performance regression on arm32 (and probably on aarch64 as well?
> we only

IMHO disabling the REG_EQUAL note generation is the wrong way to go about this.

> tried arm32 so far).  The main reason for that is that GCC emits pretty bad
> code for a GOT load.  Instead of using two add instructions and one ldr for
> the GOT slot access, like the PLT stubs do, it uses three(!) ldr instructions
> and one add.  The first ldr is for loading the GOT address, and the second is
> for the offset of the GOT slot.  As I understand, to fix that, GCC has to
> learn using the GOT_PREL relocation type.

Irrespective of combine, as a first step we should fix the predicates
and the call expanders to prevent this sort of replacement in the
backends. Tightening the predicates in the call patterns will achieve
the same for you and then we can investigate the use of GOT_PREL. My
recollection of this is that you need to work out when it's more
beneficial to use GOT_PREL over GOT but it's been a while since I
looked in that area.

>
> [*] To do that, we hacked arm legitimize_pic_address not to emit REG_EQUAL
> note under !flag_plt.
>
> Alexander



More information about the Gcc-patches mailing list