[PATCH] s390: Add -fsplit-stack support

Marcin Kościelnicki koriakin@0x04.net
Tue Feb 2 20:11:00 GMT 2016

On 02/02/16 19:33, Ulrich Weigand wrote:
> Marcin Kościelnicki wrote:
>> Here we go.  I've also removed the "see below", since I don't really
>> see anything below...
> The "see below" refers to this code (which I agree isn't really obvious):
>      {
>        /* Generate a BAS instruction to serve as a function
>           entry intercept to facilitate the use of tracing
>           algorithms located at the branch target.  */
>        emit_insn (gen_prologue_tpf ());
> What is not explicitly called out here is that this tracing function
> actually refers to some hard registers, in particular r14, and assumes
> they still have the original contents as at function entry.
> That is why the prolog code avoid using r14 as temporary if the TPF
> tracing mechanism is in use.  Now I think this doesn't apply to r12,
> so this part of your patch should still be fine.  (In addition, TPF
> is not going to support split stacks --or indeed the Go language--
> anyway, so it doesn't really matter all that much.)

Very well, I'll improve the comment.
> I do have two other issues; sorry for bringing those up again although
> they've been discussed up in the past, but I still think we can find
> some improvements here ...
> The first is the question Andreas brought up, why we need the extra
> set of insns introduced by s390_reorg.  I think this may really have
> been necessary for the ESA case where data elements had to be intermixed
> into code at a specific location.  But since we no longer support ESA,
> we now just have a data block that can be placed anywhere.  For example,
> we could just have an insn (at any point in the prolog stream) that
> simply emits the full data block during final output, along the lines of
> (note: needs to be updated for SImode vs. DImode.):
> (define_insn "split_stack_data"
>    [(unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
>                       (match_operand 1 "bras_sym_operand" "X")
>                       (match_operand 2 "consttable_operand" "X")
>                       (match_operand 3 "consttable_operand" "X")]
>                      UNSPECV_SPLIT_STACK_DATA)]
>    ""
> {
>    switch_to_section (targetm.asm_out.function_rodata_section
>                        (current_function_decl));
>    output_asm_insn (\".align 3", operands);
>    (*targetm.asm_out.internal_label) (asm_out_file, \"L\",
>                                       CODE_LABEL_NUMBER (operands[0]));
>    output_asm_insn (\".quad %2\", operands);
>    output_asm_insn (\".quad %3\", operands);
>    output_asm_insn (\".quad %1-%0\", operands);
>    switch_to_section (current_function_section ());
>    return "";
> }
>    [(set_attr "length" "0")])
> Or possibly even cleaner, we can simply define the data block at the
> tree level as if it were an initialized global variable of a certain
> struct type, and just leave it to common code to emit it as usual.
> Then we just have the code bits, but I don't really see much
> difference between the split_stack_call and split_stack_sibcall
> patterns (apart from the data block), so if code flow is OK with
> the former insns, it should be OK with the latter too ..
> [ Or else, if there *are* code flow issues, the other alternative
> would be to emit the full call sequence, code and data, from a
> single insn pattern during final output.  This might have the extra
> benefit that the assembler sequence is fully fixed, and thus easier
> to detect in the linker.  ]
> Getting rid of the extra transformation in s390_reorg would not
> just remove a bunch of code from the back-end (always good!),
> it would also speed up compile time a bit.

When I wasn't using reorg, I had problems with gcc deleting the label in 
.rodata, since it wasn't used by any jump instruction.  I guess having a 
whole-block instruction that emits the label on its own should solve the 
issue, though - let's try that.
> The second issue I'm still not sure about is the magic nop marker
> for frameless functions.  In an earlier mail you wrote:
>> Both currently supported
>> architectures always emit split-stack code on every function.
> At least for rs6000 this doesn't appear to be true; in
> rs6000_expand_split_stack_prologue we have:
>    if (!info->push_p)
>      return;
> so it does nothing for frameless routines.
> Now on i386 we do indeed generate code for frameless routines;
> in fact, the *same* full stack check is generated as for any
> other routine.  Now I'm wondering: is there are reason why
> this check would be necessary (and there's simply a bug in
> the rs6000 implementation)?  Then we obviously should do the
> same on s390.

Try that on powerpc64(le):

$ cat a.c
#include <stdio.h>

void f(void) {

typedef void (*fptr)(void);

fptr g(void);

int main() {
         printf("%p\n", g());

$ cat b.c
void f(void);

typedef void (*fptr)(void);

fptr g(void) {
         return f;

$ gcc -O3 -fsplit-stack -c b.c
$ gcc -O3 -c a.c
$ gcc a.o b.o -fuse-ld=gold

I don't have a recent enough gcc for powerpc, but from what I've seen in 
the code, this should explode with a linker error.

Of course, mixing split-stack and non-split-stack code when function 
pointers are involved is sketchy anyway, so what's one more bug...

That said, for s390, we can avoid the above problem by checking the 
relocation in gold now that ESA paths are gone - for direct function 
calls (the only ones we care about), we should be seeing a relocation in 
brasl.  So I'll remove the nopmark thing and add proper recognition in gold.

> On the other hand, if rs6000 works fine *without* any code
> in frameless routines, why wouldn't that work for s390 too?
> Emitting a nop (that is always executed) still looks weird to me.
> Bye,
> Ulrich

More information about the Gcc-patches mailing list