This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [4.5] Find more autoinc addressing for induction variables


Steven Bosscher wrote:
But in any case I think the patch should not go onto the trunk in its
current form for other technical reasons.
And these would be...?

Some examples:

Ok, these were actually useful comments. Thanks, I'll look into those. I'm pretty sure I thought about the reg_next_ thing when I wrote the code, but it's been a while and I can't remember.


(4) You explicitly only allow FORM_POST_INC. Why? (Well, probably
because it is the only interesting case for bfin, but you know what I
mean ;-)

As I said in my previous mail, this is the one most likely to actually occur in a loop - which is the case this code is supposed to optimize, a biv which has memory references based on it, and is incremented at the end of the loop.


(5) Have you looked at compile time of the painful test cases from the
PRs about the cost of use-def and def-use chains? Comparing just the
test suite and compiler itself is apparently not enough, or we would
not have had those PRs in the first place (for fwprop, Paulo and I
looked at the testsuite, bootstrap times, nillrock, spec2k, etc. and
we never saw the problem). See http://gcc.gnu.org/PR33928 as an
example. From the time report:

Which file in that PR? I've been trying with direct.i and direct-fft-recursive.i and I'm having trouble reproducing any kind of bad performance.


 df reaching defs      :  73.40 (21%) usr   6.51 (24%) sys  79.91
(21%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:  37.99 (11%) usr   9.94 (37%) sys  47.92
(12%) wall       0 kB ( 0%) ggc

Okay, so we need to decide if we're going to reject every patch that tries to make use of this infrastructure, and simply delete the code if that's the conclusion. It makes no sense to have it in gcc if we can't actually use it because it's too expensive.


I'll see whether I can achieve a similar effect by modifying tree-ssa-loop-ivopts instead. Maybe I can add new candidates that get incremented in suitable basic blocks other than just the last one. Probably Zdenek will then tell me that computations are quadratic or worse in the number of candidates.

And you have not proven that your proposed patch is correct and
beneficial, and that there is no easier way to implement the checks
you are doing.

I wasn't aware that formal proofs of correctness are now a requirement, and it's often difficult to prove the nonexistance of something.


I am just trying to think about how this can be implemented with
standard cfg analysis instead of your CFG walks.  If you don't want me
to try and help, just say so and I'll just shut up.

If you want to help, that's welcome, but please try to think about the suggestions you make. See below.


OK, so perhaps it is not worth it (would deserve a comment, still).
And adding FORM_POST_ADD should not be very difficult

An induction variable would never have FORM_POST_ADD, would it?


And for the code example:

x = &y[0];
for (i = 0; i < n; i++) {
 for (j = 0; j < m; j++) {
  foo (*x);
  if (bar ())
    break;
  x++;
 }
}

==> something like this:

     A
     |
     D <-----\
     |       |
     B <---\ |
     |     | |
  /--G     | |
  |  |     | |
  |  C ----/ |
  |  |       |
  \->F ------/
     |


The post_inc would not happen because C does not post-dominate B.


Looking forward to your next counter-example. I was never good at
picturing all the possible situations :-)

The fact that I can still use the same counter-example shows that you're not really thinking about it, and that's what I find somewhat unhelpful.


Move the initialization into the outer loop and you'll find that C still does not post-dominate B, but autoincrement is possible.

I briefly thought that if B and C are in the same loop, then B dom C may be a sufficient condition, but that fails, again, with the same counterexample. However, it would work for a solution based on the tree loop optimizer where we know that the biv we're generating will be initialized before the loop.


Bernd -- This footer brought to you by insane German lawmakers. Analog Devices GmbH Wilhelm-Wagenfeld-Str. 6 80807 Muenchen Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368 Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]