This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFA: Add Epiphany port

From: Joern Rennecke <amylaar at spamcop dot net>
To: Richard Henderson <rth at redhat dot com>
Cc: "Joseph S. Myers" <joseph at codesourcery dot com>, Rainer Orth <ro at CeBiTec dot Uni-Bielefeld dot DE>, jeremy dot bennett at embecosm dot com, Andrew Pinski <pinskia at gmail dot com>, gcc-patches at gcc dot gnu dot org
Date: Fri, 04 Nov 2011 20:00:00 -0400
Subject: Re: RFA: Add Epiphany port
References: <20111103140816.trwz063nj4ks4gos-nzlynne@webmail.spamcop.net> <CA+=Sn1m0wx7RfS=JaW9=J010SZnf4Ax4mqM--ON3YzUcOfwvTQ@mail.gmail.com> <20111103150455.6mrpu054z8csc00c-nzlynne@webmail.spamcop.net> <4EB2F254.6080802@redhat.com> <20111103182234.22r0qdtuluog4wkc-nzlynne@webmail.spamcop.net> <Pine.LNX.4.64.1111032328590.31547@digraph.polyomino.org.uk> <20111104015458.f7hzapyfqck4gw84-nzlynne@webmail.spamcop.net> <Pine.LNX.4.64.1111041223430.19818@digraph.polyomino.org.uk> <20111104094354.cl551itv9c8c4g0o-nzlynne@webmail.spamcop.net> <4EB42BDB.3060703@redhat.com>

Quoting Richard Henderson <rth@redhat.com>:

(define_special_predicate "any_gpr_operand"
  (match_code "subreg,reg")
{
  return gpr_operand (op, mode);
})


(define_special_predicate "any_gpr_operand"
  (match_operand 0 "gpr_operand"))

though, I'm not sure what this achieves...  while any_gpr_operand
will ignore its mode, passing on that same mode to gpr_operand wont.


It does get rid of the warning in the md file about a 'missing' mode, when
I really want to match registers in various modes.
When you look at the SH port, you'll see that I have used this trick of
having an "any_*" predicate to silence a warning about using VOIDmode in
a couple of places.

(define_insn_and_split "move_frame"
  [(set (match_operand:SI 0 "gpr_operand" "=r")
        (match_operand:SI 1 "register_operand" "r"))
   (clobber (reg:CC CC_REGNUM))]
  "operands[1] == frame_pointer_rtx || operands[1] == arg_pointer_rtx"


It looks to me that there are several places that could be tidied up
if you have a frame_register_operand predicate to use here instead of
testing operands[] in the extra_constraints field.

There are two mentions of frame_pointer_rtx, and three of hard_frame_pointer_rtx, in the entirety of the md file. So I don't think having such a predicate - which would have to do different things before, during, and after reload, and would need to be grokked before understanding the md file - would make the port easier to understand. It could make it easier for some optimizers to manipulate these patterns. OTOH, judging from past performance, I'm not sure that that would be a good thing. There are inherent contradictions in the assumptions different compiler passes make regarding frame pointer based addresses on RISC machines - e.g. we don't actually know the valid add offset or addressing ranges off the frame / arg pointer until after they have been eliminated. A few weeks of performance and regression testing, in a toolchain that has the currently known correctness issues fixed, could provide some data to make an informed decision. So, while your suggestion might have merit, I can't pursue it at the moment.

;; If the frame pointer elimination offset is zero, we'll use this pattern.
(define_insn_and_split "*move_frame_1"
  [(set (match_operand:SI 0 "gpr_operand" "=r")
        (match_operand:SI 1 "register_operand" "r"))
   (clobber (reg:CC CC_REGNUM))]
  "(reload_in_progress || reload_completed)
   && (operands[1] == stack_pointer_rtx
       || operands[1] == hard_frame_pointer_rtx)"
  "#"
  "&& 1"
  [(set (match_dup 0) (match_dup 1))])

Duplicate of the above? I really don't see how they differ...


The first is for a pre-reload frame pointer reference, which is subject to
register elimination.  The original idea is that it morphs into an
add by the effect of register elimination.  Two years I later added the
..._and_split part for the case that there is a zero offset.
I can't remeber why I did it, or if there was some actual problem I was
trying to solve.  Looking at it now conjunction with frame_move_1, I think
I probably just added the splitter because I thought it should have one,
forgetting about frame_move_1.

The second is for a reload_in_progress / reload_completed frame pointer reference. I introduced it as a define_insn_and_split at the same time as in introduced move_frame as an insn pattern.

move_frame is supposed to have precedence in insn recognition above
*movsi_insn, which in turn is supposed to have precedence above
*move_frame_1

Also, combining these two patterns into a single insn pattern would give
reload more freedom to change operands into each other (when it doesn't
re-recognize insns), which is unwanted in this case.

I suppose I should remove again the _and_split part from frame_move, and
put a comment there that the post-reload/elimination recognition and
splitting is in *frame_move_1

;; reload uses gen_addsi2 because it doesn't understand the need for
;; the clobber.
(define_peephole2
  [(set (match_operand:SI 0 "gpr_operand" "")
        (match_operand:SI 1 "const_int_operand" ""))
   (parallel [(set (match_dup 0)
                   (plus:SI (match_dup 0)
                            (match_operand:SI 2 "gpr_operand")))
              (clobber (reg:CC UNKNOWN_REGNUM))])]

I'd like to understand all of this a little more.

Given that reload *would* generate an add3 (with no clobber), and
add-without-clobber isn't ordinarily a valid insn, why not just go
ahead and use that as your reload pattern instead of this
clobber placeholder?


Having add3 without clobber would mean it's a free-for-all for the optimizers
to generate it.  Which would be disastrous for code speed and size.
Also, it would be harder to make sure the rescanning peephole2 terminates
when there is a pattern that looks so simple but is so wrong.

Wouldn't a bare plus pattern with that same reload_in_progress || reload_completed check work just as well for all the pattern matching you want to do during peephole analysis?


It would stop the pre-reload optimizers of goofing up, but not the
post-reload ones.

  int scratch = (0x17
                 ^ (true_regnum (operands[0]) & 1)
                 ^ (true_regnum (operands[1]) & 2)
                 ^ (true_regnum (operands[2]) & 4));
  asm_fprintf (asm_out_file, \"\tstr r%d,[sp,#0]\n\", scratch);
  asm_fprintf (asm_out_file, \"\tmovfs r%d,status\n\", scratch);
  output_asm_insn (\"add %0,%1,%2\", operands);
  asm_fprintf (asm_out_file, \"\tmovts status,r%d\n\", scratch);
  asm_fprintf (asm_out_file, \"\tldr r%d,[sp,#0]\n\", scratch);
  return \"\";

It does seem like you'd do well to split this pattern. If you do, then you'll automatically get the right changes to debugging unwind info across this.


The only possible unwind issue I see here is if an interrupt or exception
triggers inside this sequence.  And that is assuming that we can unwind
the interrupt / exception otherwise.
I think the impact on debuggability of this pattern is minimal.  It is
not supposed to be emitted frequently, it is only there as a last resort
when all optimization attemts fail.  In fact, I have yet to see a testcase
where that happens at -O2 or higher.  If the integer flags register is free,
we split.  If the integer flags register is set in the previous insn, and data
flow allows to re-order, we split.  If we can find any free register
to save the flags into, we split.

A test for epilogue_completed in the split condition should be sufficient to
wait until after peephole2 has finished, so that you don't interfere with the
transformations you want there.

Actually, peephole2 runs after thread_prologue_and_epilogue.

If we found / made another suitable variable to make the splitter
conditional on, I suppose it could help debugging at -O0 .  In the above
stated corner case.  There is the drawback that it makes it less easy to
spot these sequences when they are split.  OTOH I could make the splitter
optional, and/or have an optional warning when it triggers.
But I couldn't sufficiently test such a change with mainline
GCC at the moment, because there are too many unresolved other issues.
And testing it in another branch and then re-merging to mainline is
likely to miss the stage1 window.
For now, I can add a comment about the possible benefit of a splitter.

I'm also interested in hearing about how well this whole scheme works in
practice, as opposed to merely waiting until after reload to split and flags
users.


"and flags users"?   I think I get the general idea what you are trying to
express, but I can't reconstruct what exactly you wanted to say.

 There are certainly lots of other ports that are in the same boat
with respect to only having a flags-clobbering add.


Well, actually, the Epiphany has two different flags registers,
and a biased stack pointer and post-modify can be used to do add
a constant to the (value of the) stack pointer without clobbering
any flags, so the potential for peephole2 patterns to fix things up
is probably above average.

(define_insn_and_split "*recipsf2_1" [(match_parallel 4 "float_operation" [(set (match_operand:SF 0 "gpr_operand" "=r,r") (div:SF (match_operand:SF 1 "const_float_1_operand" "") (match_operand:SF 2 "move_src_operand" "rU16m,rU16mCal"))) (use (match_operand:SI 3 "move_src_operand" "rU16m,rU16mCal"))
How to you prevent a post-reload copy propagation pass from putting things
back just the way before you split them?  It seems to me that's the primary
reason to use specific register constraints here.


They don't know how to put back the hard reg clobbers of
(reg:SF 0) and (reg:SI 1) .
I suppose they might do that in the future, then I'd have to find a safer
way to avoid the recombination.  But I can't test that till the copy
propagation has learned this new trick, so for the time being I think
I better leave the pattern alone.

Well, if it did get recombined, it's would just be split again later.
But I'll change the output pattern to "#" to make sure this will
always happen.

(define_insn "fmadd"
  [(match_parallel 4 "float_operation"
     [(set (match_operand:SF 0 "gpr_operand" "=r")
           (fma:SF (match_operand:SF 2 "gpr_operand" "%r")
                   (match_operand:SF 3 "gpr_operand" "r")
                   (match_operand:SF 1 "gpr_operand" "0")))
      (clobber (reg:CC_FP CCFP_REGNUM))])]


Presumably the strange operand ordering is left over from your port-specific
builtins?  Also, the % is extraneous since the constraints are identical.


True.  And as I have removed the last use of gen_fmadd / gen_fmsub, I can
remove the expanders.

; combiner pattern, also used by vector combiner pattern
(define_expand "maddsf"
  [(parallel
     [(set (match_operand:SF 0 "gpr_operand" "=r")
           (plus:SF (mult:SF (match_operand:SF 1 "gpr_operand" "%r")
                             (match_operand:SF 2 "gpr_operand" "r"))
                    (match_operand:SF 3 "gpr_operand" "0")))
      (clobber (reg:CC_FP CCFP_REGNUM))])]
  "TARGET_FUSED_MADD")


I suspect these aren't needed anymore.  Anything the combiner could have
found should be able to be found by the SSA optimizers.


I'm not sure how well this work with vectorized code.  There are splitters
from vectorized versions of these patterns.
Generating vectorized code is also still pretty hit-and-miss-and-miss,
there are lots of conditions where it gives up for modest SIMD mode sizes
even though a vectorization would be possible.

I suppose I should put a comment on these patterns to review their usefulness
once we can get consistent vectorization.

(define_expand "one_cmplsi2"

...

Why not combine these?  I'm pretty sure that expand_binop will try the xor
solution all on its own.


Actually, it's expand_unop, and it doesn't.  Left to its own devices, it
will generate a libcall instead.

Of course, if gpr_operand were to include -1 when m1reg is available, and
you swapped all the constraints to something other than "r", you get even
this for free.


That makes sense.  IIRC I did something similar to that that for on the SH64.
Except it's a lot of trouble to swap out all the constraints (reload still
needs r, so all affected constraint strings become longer), so there was a
separate predicate to be used in the insns where this was profitable.
AFAICT the opportunities for -1 to be useful are likewise there, but limited -
xor, mul, shift.
OTOH there might be more constants later than can be available in a register.
so I'm thinking of "ggpr_operand" for general gpr operand (well, there's
two general in there, but gpr_or_reg_cst_operand seems too long) which
is gpr_operand or any constant that's available in a fixed register.
Then rRcf (Rcf : non-regclass Register constraint: constant in fixed register)
can be the associated constraint.

(define_insn "*mov<mode>cc_insn" [(set (match_operand:WMODE 0 "gpr_operand" "=r") (if_then_else:WMODE (match_operator 3 "proper_comparison_operator" [(match_operand 4 "cc_operand") (const_int 0)]) (match_operand:WMODE 1 "nonmemory_operand" "r") (match_operand:WMODE 2 "gpr_operand" "0")))]
If there's a good reason for nonmemory_operand and not gpr_operand here,
you should add a big comment.  It looks like a mistake.


I dimly remember tweaking stuff to make the movcc operands match, but if
it was this, or even that architecture, I'm not sure.  It might also have
to do with the three-insn capacity of combine.  At any rate, it looses
cse / loop hoisting opportunities.
So if there is a good reason to allow immediates, that's best rediscovered
so it can be properly documented.  Or the optimizers improved to make that
unnecessary.

you should add a big comment. It looks like a mistake.
static bool
epiphany_frame_pointer_required (void)
{
  return cfun->calls_alloca;
Isn't this automatic?

epiphany-load-combiner.o : $(srcdir)/config/epiphany/epiphany-load-combiner.c


The pass didn't work, so I removed it before submission.
I might add it back later when I can get it to work and do something
useful, but it's not ready for gcc 4.7 .
I've already removed this vestige in t-epiphany while back.

#define IMM16(X)     ((unsigned)(X) <= 0xFFFF)
#define IMM5(X)      ((unsigned)(X) <= 0x1F)


These need to be unsigned HOST_WIDE_INT, at minimum.
Preferably no cast at all and compare vs 0 as well.


Compare vs. 0 gives warnings when the input type is unsigned.
There is a reason IN_RANGE uses the cast to unsigned HOST_WIDE_INT.

IN_RANGE might actually be the better choice for these tests.

I'll over your points and prepare a new version now...

Follow-Ups:
- Re: RFA: Add Epiphany port
  - From: Richard Henderson
- P.S.: Re: RFA: Add Epiphany port
  - From: Joern Rennecke

References:
- RFA: Add Epiphany port
  - From: Joern Rennecke
- Re: RFA: Add Epiphany port
  - From: Andrew Pinski
- Re: RFA: Add Epiphany port
  - From: Joern Rennecke
- Re: RFA: Add Epiphany port
  - From: Richard Henderson
- Re: RFA: Add Epiphany port
  - From: Joern Rennecke
- Re: RFA: Add Epiphany port
  - From: Joseph S. Myers
- Re: RFA: Add Epiphany port
  - From: Joern Rennecke
- Re: RFA: Add Epiphany port
  - From: Joseph S. Myers
- Re: RFA: Add Epiphany port
  - From: Joern Rennecke
- Re: RFA: Add Epiphany port
  - From: Richard Henderson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]