[PATCH] s390: Add -fsplit-stack support

Ulrich Weigand uweigand@de.ibm.com
Tue Feb 2 18:34:00 GMT 2016

Marcin KoÃ
›cielnicki wrote:

> Here we go.  I've also removed the "see below", since I don't really
> see anything below...

The "see below" refers to this code (which I agree isn't really obvious):

      /* Generate a BAS instruction to serve as a function
         entry intercept to facilitate the use of tracing
         algorithms located at the branch target.  */
      emit_insn (gen_prologue_tpf ());

What is not explicitly called out here is that this tracing function
actually refers to some hard registers, in particular r14, and assumes
they still have the original contents as at function entry.

That is why the prolog code avoid using r14 as temporary if the TPF
tracing mechanism is in use.  Now I think this doesn't apply to r12,
so this part of your patch should still be fine.  (In addition, TPF
is not going to support split stacks --or indeed the Go language--
anyway, so it doesn't really matter all that much.)

I do have two other issues; sorry for bringing those up again although
they've been discussed up in the past, but I still think we can find
some improvements here ...

The first is the question Andreas brought up, why we need the extra
set of insns introduced by s390_reorg.  I think this may really have
been necessary for the ESA case where data elements had to be intermixed
into code at a specific location.  But since we no longer support ESA,
we now just have a data block that can be placed anywhere.  For example,
we could just have an insn (at any point in the prolog stream) that
simply emits the full data block during final output, along the lines of
(note: needs to be updated for SImode vs. DImode.):

(define_insn "split_stack_data"
  [(unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
                     (match_operand 1 "bras_sym_operand" "X")
                     (match_operand 2 "consttable_operand" "X")
                     (match_operand 3 "consttable_operand" "X")]
  switch_to_section (targetm.asm_out.function_rodata_section

  output_asm_insn (\".align 3", operands);
  (*targetm.asm_out.internal_label) (asm_out_file, \"L\",
                                     CODE_LABEL_NUMBER (operands[0]));
  output_asm_insn (\".quad %2\", operands);
  output_asm_insn (\".quad %3\", operands);
  output_asm_insn (\".quad %1-%0\", operands);

  switch_to_section (current_function_section ());
  return "";
  [(set_attr "length" "0")])

Or possibly even cleaner, we can simply define the data block at the
tree level as if it were an initialized global variable of a certain
struct type, and just leave it to common code to emit it as usual.

Then we just have the code bits, but I don't really see much
difference between the split_stack_call and split_stack_sibcall
patterns (apart from the data block), so if code flow is OK with
the former insns, it should be OK with the latter too ..

[ Or else, if there *are* code flow issues, the other alternative
would be to emit the full call sequence, code and data, from a
single insn pattern during final output.  This might have the extra
benefit that the assembler sequence is fully fixed, and thus easier
to detect in the linker.  ]

Getting rid of the extra transformation in s390_reorg would not
just remove a bunch of code from the back-end (always good!),
it would also speed up compile time a bit.

The second issue I'm still not sure about is the magic nop marker
for frameless functions.  In an earlier mail you wrote:

> Both currently supported 
> architectures always emit split-stack code on every function.

At least for rs6000 this doesn't appear to be true; in
rs6000_expand_split_stack_prologue we have:

  if (!info->push_p)

so it does nothing for frameless routines.

Now on i386 we do indeed generate code for frameless routines;
in fact, the *same* full stack check is generated as for any
other routine.  Now I'm wondering: is there are reason why
this check would be necessary (and there's simply a bug in
the rs6000 implementation)?  Then we obviously should do the
same on s390.

On the other hand, if rs6000 works fine *without* any code
in frameless routines, why wouldn't that work for s390 too?

Emitting a nop (that is always executed) still looks weird to me.


  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain

More information about the Gcc-patches mailing list