This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFA: Add Epiphany port


Quoting Richard Henderson <rth@redhat.com>:

(define_special_predicate "any_gpr_operand"
  (match_code "subreg,reg")
{
  return gpr_operand (op, mode);
})

(define_special_predicate "any_gpr_operand" (match_operand 0 "gpr_operand"))

though, I'm not sure what this achieves...  while any_gpr_operand
will ignore its mode, passing on that same mode to gpr_operand wont.

It does get rid of the warning in the md file about a 'missing' mode, when I really want to match registers in various modes. When you look at the SH port, you'll see that I have used this trick of having an "any_*" predicate to silence a warning about using VOIDmode in a couple of places.

(define_insn_and_split "move_frame"
  [(set (match_operand:SI 0 "gpr_operand" "=r")
        (match_operand:SI 1 "register_operand" "r"))
   (clobber (reg:CC CC_REGNUM))]
  "operands[1] == frame_pointer_rtx || operands[1] == arg_pointer_rtx"

It looks to me that there are several places that could be tidied up if you have a frame_register_operand predicate to use here instead of testing operands[] in the extra_constraints field.

There are two mentions of frame_pointer_rtx, and three of hard_frame_pointer_rtx, in the entirety of the md file.
So I don't think having such a predicate - which would have to do
different things before, during, and after reload, and would need to
be grokked before understanding the md file - would make the port
easier to understand.
It could make it easier for some optimizers to manipulate these patterns.
OTOH, judging from past performance, I'm not sure that that would be a
good thing. There are inherent contradictions in the assumptions
different compiler passes make regarding frame pointer based addresses
on RISC machines - e.g. we don't actually know the valid add offset or addressing ranges off the frame / arg pointer until after they have
been eliminated.
A few weeks of performance and regression testing, in a toolchain that has
the currently known correctness issues fixed, could provide some data to
make an informed decision.
So, while your suggestion might have merit, I can't pursue it at the moment.


;; If the frame pointer elimination offset is zero, we'll use this pattern.
(define_insn_and_split "*move_frame_1"
  [(set (match_operand:SI 0 "gpr_operand" "=r")
        (match_operand:SI 1 "register_operand" "r"))
   (clobber (reg:CC CC_REGNUM))]
  "(reload_in_progress || reload_completed)
   && (operands[1] == stack_pointer_rtx
       || operands[1] == hard_frame_pointer_rtx)"
  "#"
  "&& 1"
  [(set (match_dup 0) (match_dup 1))])

Duplicate of the above? I really don't see how they differ...

The first is for a pre-reload frame pointer reference, which is subject to register elimination. The original idea is that it morphs into an add by the effect of register elimination. Two years I later added the ..._and_split part for the case that there is a zero offset. I can't remeber why I did it, or if there was some actual problem I was trying to solve. Looking at it now conjunction with frame_move_1, I think I probably just added the splitter because I thought it should have one, forgetting about frame_move_1.

The second is for a reload_in_progress / reload_completed frame pointer reference. I introduced it as a define_insn_and_split at the same time as
in introduced move_frame as an insn pattern.


move_frame is supposed to have precedence in insn recognition above
*movsi_insn, which in turn is supposed to have precedence above
*move_frame_1

Also, combining these two patterns into a single insn pattern would give
reload more freedom to change operands into each other (when it doesn't
re-recognize insns), which is unwanted in this case.

I suppose I should remove again the _and_split part from frame_move, and
put a comment there that the post-reload/elimination recognition and
splitting is in *frame_move_1

;; reload uses gen_addsi2 because it doesn't understand the need for
;; the clobber.
(define_peephole2
  [(set (match_operand:SI 0 "gpr_operand" "")
        (match_operand:SI 1 "const_int_operand" ""))
   (parallel [(set (match_dup 0)
                   (plus:SI (match_dup 0)
                            (match_operand:SI 2 "gpr_operand")))
              (clobber (reg:CC UNKNOWN_REGNUM))])]

I'd like to understand all of this a little more.


Given that reload *would* generate an add3 (with no clobber), and
add-without-clobber isn't ordinarily a valid insn, why not just go
ahead and use that as your reload pattern instead of this
clobber placeholder?

Having add3 without clobber would mean it's a free-for-all for the optimizers to generate it. Which would be disastrous for code speed and size. Also, it would be harder to make sure the rescanning peephole2 terminates when there is a pattern that looks so simple but is so wrong.

Wouldn't a bare plus pattern with that same reload_in_progress || reload_completed
check work just as well for all the pattern matching you want to do
during peephole analysis?

It would stop the pre-reload optimizers of goofing up, but not the post-reload ones.

  int scratch = (0x17
                 ^ (true_regnum (operands[0]) & 1)
                 ^ (true_regnum (operands[1]) & 2)
                 ^ (true_regnum (operands[2]) & 4));
  asm_fprintf (asm_out_file, \"\tstr r%d,[sp,#0]\n\", scratch);
  asm_fprintf (asm_out_file, \"\tmovfs r%d,status\n\", scratch);
  output_asm_insn (\"add %0,%1,%2\", operands);
  asm_fprintf (asm_out_file, \"\tmovts status,r%d\n\", scratch);
  asm_fprintf (asm_out_file, \"\tldr r%d,[sp,#0]\n\", scratch);
  return \"\";

It does seem like you'd do well to split this pattern. If you do, then you'll
automatically get the right changes to debugging unwind info across this.

The only possible unwind issue I see here is if an interrupt or exception triggers inside this sequence. And that is assuming that we can unwind the interrupt / exception otherwise. I think the impact on debuggability of this pattern is minimal. It is not supposed to be emitted frequently, it is only there as a last resort when all optimization attemts fail. In fact, I have yet to see a testcase where that happens at -O2 or higher. If the integer flags register is free, we split. If the integer flags register is set in the previous insn, and data flow allows to re-order, we split. If we can find any free register to save the flags into, we split.

A test for epilogue_completed in the split condition should be sufficient to
wait until after peephole2 has finished, so that you don't interfere with the
transformations you want there.

Actually, peephole2 runs after thread_prologue_and_epilogue.


If we found / made another suitable variable to make the splitter
conditional on, I suppose it could help debugging at -O0 .  In the above
stated corner case.  There is the drawback that it makes it less easy to
spot these sequences when they are split.  OTOH I could make the splitter
optional, and/or have an optional warning when it triggers.
But I couldn't sufficiently test such a change with mainline
GCC at the moment, because there are too many unresolved other issues.
And testing it in another branch and then re-merging to mainline is
likely to miss the stage1 window.
For now, I can add a comment about the possible benefit of a splitter.

I'm also interested in hearing about how well this whole scheme works in
practice, as opposed to merely waiting until after reload to split and flags
users.

"and flags users"? I think I get the general idea what you are trying to express, but I can't reconstruct what exactly you wanted to say.

 There are certainly lots of other ports that are in the same boat
with respect to only having a flags-clobbering add.

Well, actually, the Epiphany has two different flags registers, and a biased stack pointer and post-modify can be used to do add a constant to the (value of the) stack pointer without clobbering any flags, so the potential for peephole2 patterns to fix things up is probably above average.

(define_insn_and_split "*recipsf2_1"
[(match_parallel 4 "float_operation"
[(set (match_operand:SF 0 "gpr_operand" "=r,r")
(div:SF (match_operand:SF 1 "const_float_1_operand" "")
(match_operand:SF 2 "move_src_operand" "rU16m,rU16mCal")))
(use (match_operand:SI 3 "move_src_operand" "rU16m,rU16mCal"))

How to you prevent a post-reload copy propagation pass from putting things back just the way before you split them? It seems to me that's the primary reason to use specific register constraints here.

They don't know how to put back the hard reg clobbers of (reg:SF 0) and (reg:SI 1) . I suppose they might do that in the future, then I'd have to find a safer way to avoid the recombination. But I can't test that till the copy propagation has learned this new trick, so for the time being I think I better leave the pattern alone.

Well, if it did get recombined, it's would just be split again later.
But I'll change the output pattern to "#" to make sure this will
always happen.

(define_insn "fmadd"
  [(match_parallel 4 "float_operation"
     [(set (match_operand:SF 0 "gpr_operand" "=r")
           (fma:SF (match_operand:SF 2 "gpr_operand" "%r")
                   (match_operand:SF 3 "gpr_operand" "r")
                   (match_operand:SF 1 "gpr_operand" "0")))
      (clobber (reg:CC_FP CCFP_REGNUM))])]

Presumably the strange operand ordering is left over from your port-specific builtins? Also, the % is extraneous since the constraints are identical.

True. And as I have removed the last use of gen_fmadd / gen_fmsub, I can remove the expanders.

; combiner pattern, also used by vector combiner pattern
(define_expand "maddsf"
  [(parallel
     [(set (match_operand:SF 0 "gpr_operand" "=r")
           (plus:SF (mult:SF (match_operand:SF 1 "gpr_operand" "%r")
                             (match_operand:SF 2 "gpr_operand" "r"))
                    (match_operand:SF 3 "gpr_operand" "0")))
      (clobber (reg:CC_FP CCFP_REGNUM))])]
  "TARGET_FUSED_MADD")

I suspect these aren't needed anymore. Anything the combiner could have found should be able to be found by the SSA optimizers.

I'm not sure how well this work with vectorized code. There are splitters from vectorized versions of these patterns. Generating vectorized code is also still pretty hit-and-miss-and-miss, there are lots of conditions where it gives up for modest SIMD mode sizes even though a vectorization would be possible.

I suppose I should put a comment on these patterns to review their usefulness
once we can get consistent vectorization.

(define_expand "one_cmplsi2"

...


Why not combine these?  I'm pretty sure that expand_binop will try the xor
solution all on its own.

Actually, it's expand_unop, and it doesn't. Left to its own devices, it will generate a libcall instead.


Of course, if gpr_operand were to include -1 when m1reg is available, and
you swapped all the constraints to something other than "r", you get even
this for free.

That makes sense. IIRC I did something similar to that that for on the SH64. Except it's a lot of trouble to swap out all the constraints (reload still needs r, so all affected constraint strings become longer), so there was a separate predicate to be used in the insns where this was profitable. AFAICT the opportunities for -1 to be useful are likewise there, but limited - xor, mul, shift. OTOH there might be more constants later than can be available in a register. so I'm thinking of "ggpr_operand" for general gpr operand (well, there's two general in there, but gpr_or_reg_cst_operand seems too long) which is gpr_operand or any constant that's available in a fixed register. Then rRcf (Rcf : non-regclass Register constraint: constant in fixed register) can be the associated constraint.

(define_insn "*mov<mode>cc_insn"
[(set (match_operand:WMODE 0 "gpr_operand" "=r")
(if_then_else:WMODE (match_operator 3 "proper_comparison_operator"
[(match_operand 4 "cc_operand") (const_int 0)])
(match_operand:WMODE 1 "nonmemory_operand" "r")
(match_operand:WMODE 2 "gpr_operand" "0")))]

If there's a good reason for nonmemory_operand and not gpr_operand here, you should add a big comment. It looks like a mistake.

I dimly remember tweaking stuff to make the movcc operands match, but if it was this, or even that architecture, I'm not sure. It might also have to do with the three-insn capacity of combine. At any rate, it looses cse / loop hoisting opportunities. So if there is a good reason to allow immediates, that's best rediscovered so it can be properly documented. Or the optimizers improved to make that unnecessary.


you should add a big comment. It looks like a mistake.

static bool
epiphany_frame_pointer_required (void)
{
  return cfun->calls_alloca;

Isn't this automatic?


epiphany-load-combiner.o : $(srcdir)/config/epiphany/epiphany-load-combiner.c

The pass didn't work, so I removed it before submission. I might add it back later when I can get it to work and do something useful, but it's not ready for gcc 4.7 . I've already removed this vestige in t-epiphany while back.

#define IMM16(X)     ((unsigned)(X) <= 0xFFFF)
#define IMM5(X)      ((unsigned)(X) <= 0x1F)

These need to be unsigned HOST_WIDE_INT, at minimum. Preferably no cast at all and compare vs 0 as well.

Compare vs. 0 gives warnings when the input type is unsigned. There is a reason IN_RANGE uses the cast to unsigned HOST_WIDE_INT.

IN_RANGE might actually be the better choice for these tests.

I'll over your points and prepare a new version now...


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]