This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [SH] PR 53976 - Add RTL pass to eliminate clrt, sett insns
- From: Oleg Endo <oleg dot endo at t-online dot de>
- To: Steven Bosscher <stevenb dot gcc at gmail dot com>
- Cc: gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 21 Nov 2013 12:22:10 +0100
- Subject: Re: [SH] PR 53976 - Add RTL pass to eliminate clrt, sett insns
- Authentication-results: sourceware.org; auth=none
- References: <1384975779 dot 2438 dot 119 dot camel at yam-132-YW-E178-FTW> <CABu31nPbcThrjzPN49E=VO3BrZjz5noaE95gjj-vx=0ikPj6+g at mail dot gmail dot com>
Steven,
Thanks for the feedback. I've committed the original patch as-is, but
I'm happy to improve it with follow up patches.
On Thu, 2013-11-21 at 00:04 +0100, Steven Bosscher wrote:
> On Wed, Nov 20, 2013 at 8:29 PM, Oleg Endo wrote:
> >
> > The attached patch adds another SH specific RTL pass which is supposed
> > to optimize clrt and sett instruction usage. As a start, it currently
> > just eliminates redundant clrt and sett instructions in cases where the
> > T bit value is known. However, I'm planning on extending it a little in
> > order to e.g. hoist clrt/sett insns out of loops etc.
>
> +#include <vector>
Is there something wrong with <vector> ?
> > +#define log_msg(...)\
> > + do { if (dump_file != NULL) fprintf (dump_file, __VA_ARGS__); } while (0)
>
> Is that valid C++98, a varags macro?
No it's not. It's C99 / C++11. However, most compilers support
__VA_ARGS__. If it causes too much trouble I'll remove it of course
(sh_treg_combine.cc would also be affected).
Having to write "if (dump_file ...)" stuff over and over again is
annoying and impacts the readability of code in my opinion. So if the
__VA_ARGS__ usage has to go, there should be some more or less
alternative. I think other passes would also benefit from that.
BTW something std::ostream like would be nice for logging, like...
log_msg ("updated ccreg mode: ");
log_rtx (m_ccreg);
log_msg ("\n\n");
... could be something like
log (<< "updated ccreg mode: " << m_ccreg << "\n\n");
where log is:
#define log(expr)\
do { if (dump_file != NULL) <logging ostream ref> expr; } while (0)
Since there are various ways of printing an rtx (print_rtl,
print_rtl_single), this could be done with a wrapper object around the
logged rtx:
log (<< "got insn: \n" << rtx::log_single (my_insn) << "\n");
or iomanip style (requires custom ostream):
log (<< "got insn: \n" << rtx::log_single << my_insn << "\n");
... but I'm not sure how others think about that. If this is of
interest I could do some work in that direction.
> > + // Notice that the CFG might be invalid at late RTL stages and
> > + // BLOCK_FOR_INSN might return null. Thus the basic block are recorded
> > + // here while traversing them.
> > + basic_block bb;
>
> You insert your pass just before sched2. The CFG is perfectly fine at
> that stage. So you shouldn't need this. (And if you would need to
> record bb, then this solution wouldn't be GC-safe).
Why is that? AFAIK GC is done after the pass has finished? The pass
doesn't keep any state beyond the current pass execution.
>
> BLOCK_FOR_INSN will only be NULL for things that are not inside a
> basic block (some notes, deleted labels, etc.).
>
> That all said and done: AFAICT you don't actually use BLOCK_FOR_INSN
> anywhere :-)
Sorry for the confusion. Initially I had it inserted right before
machine dependent reorg, at which point I was getting nullptr from
BLOCK_FOR_INSN all over the place. So not using it was the fix.
However, only after I've noticed that SH's machine dependent reorg
leaves the basic block structure in a 'broken' state (PR 59189) and thus
I moved the pass before sched2.
I'll try using BLOCK_FOR_INSN again.
>
> > + for (edge_iterator ei = ei_start (bb->preds); !ei_end_p (ei); ei_next (&ei))
>
> FOR_EACH_EDGE
>
> Declaring the edge_iterator inside the for() is not a good argument
> against FOR_EACH_EDGE.
But that was the only reason! ;)
(I've done the same in sh_treg_combine.cc)
Can we ...
> Of course, brownie points are up for grabs for
> the brave soul daring enough to make edge iterators be proper C++
> iterators... ;-)
... fix this first and use the SH RTL passes as examples. Then migrate
other existing code? Or do you insist on FOR_EACH_EDGE usage for the
time being?
> > + if (pred_bb->index == ENTRY_BLOCK)
>
> I used to dislike this idiom of checking bb->index against fixed block
> numbers. But now that ENTRY_BLOCK_PTR and EXIT_BLOCK_PTR actually
> require two pointer dereferences (cfun->cfg->...) it's the better
> option.
>
> /me adds to TODO list...
Actually I don't need any of that. I will re-test the following
(although it's quite obvious):
Index: gcc/config/sh/sh_optimize_sett_clrt.cc
===================================================================
--- gcc/config/sh/sh_optimize_sett_clrt.cc (revision 205191)
+++ gcc/config/sh/sh_optimize_sett_clrt.cc (working copy)
@@ -309,9 +309,6 @@
std::vector<ccreg_value>& values_out,
basic_block prev_visited_bb) const
{
- if (start_insn == NULL_RTX)
- return;
-
log_msg ("looking for ccreg values in [bb %d]\n", bb->index);
for (rtx i = start_insn; i != NULL_RTX && i != PREV_INSN (BB_HEAD (bb));
@@ -376,9 +373,6 @@
for (edge_iterator ei = ei_start (bb->preds); !ei_end_p (ei); ei_next (&ei))
{
basic_block pred_bb = ei_edge (ei)->src;
- if (pred_bb->index == ENTRY_BLOCK)
- continue;
-
pred_bb_count += 1;
find_last_ccreg_values (BB_END (pred_bb), pred_bb, values_out, bb);
}
>
> > +::remove_ccreg_dead_unused_notes (std::vector<ccreg_value>& values) const
>
> Maybe put a generic version of this in rtlanal.c, next to
> remove_reg_equal_equiv_notes_for_regno?
Yes. In fact I wanted to propose moving some of the helper functions
from sh_treg_combine.cc and sh_optimize_sett_clrt.cc to rtl.h /
rtlanal.c at some point in time.
> On the whole: The pass will have worst-case run time quadratic in the
> number of basic blocks, due to the recursion in
> find_last_ccreg_values. You'll probably want to limit the depth of the
> iteration somehow (hard-coded depth, common dominator, first
> post-dominated block, whatever...) or sooner-or-later you'll have a PR
> assigned to you for a test case that makes compile time explode in
> this pass...
A PR with a test case would be nice. In the worst case it'll have to be
an iteration limit, but I can also imagine that caching results
(hash_table) for bb subtrees could be useful here, couldn't it?
Cheers,
Oleg