This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: Faster for_each_rtx-like iterators

From: Jeff Law <law at redhat dot com>
To: gcc-patches at gcc dot gnu dot org, rdsandiford at googlemail dot com
Date: Fri, 09 May 2014 00:18:45 -0600
Subject: Re: RFC: Faster for_each_rtx-like iterators
Authentication-results: sourceware.org; auth=none
References: <87k39x48gu dot fsf at talisman dot default>

On 05/07/14 14:52, Richard Sandiford wrote:

I noticed for_each_rtx showing up in profiles and thought I'd have a go
at using worklist-based iterators instead.  So far I have three:

   FOR_EACH_SUBRTX: iterates over const_rtx subrtxes of a const_rtx
   FOR_EACH_SUBRTX_VAR: iterates over rtx subrtxes of an rtx
   FOR_EACH_SUBRTX_PTR: iterates over subrtx pointers of an rtx *

with FOR_EACH_SUBRTX_PTR being the direct for_each_rtx replacement.

Yea, for_each_rtx is a bit of a workhorse, so I'm not surprised it'sshowing up on some profiles.

I've locally replaced all for_each_rtx calls in the generic code with
these iterators and they make things reproducably faster.  The speed-up
on full --enable-checking=release ./cc1 and ./cc1plus times is only about 1%,
but maybe that's enough to justify the churn.

I'd consider that enough to justify the churn, especially if it's animprovement we're seeing consistently rather than just something that'shelping pathological cases.


Implementation-wise, the main observation is that most subrtxes are part
of a single contiguous sequence of "e" fields.  E.g. when compiling an
oldish combine.ii on x86_64-linux-gnu with -O2, we iterate over the
subrtxes of 7,636,542 rtxes.  Of those:

Yea, makes sense. One could possibly carry this a step further andrealize that we could specialize the walkers. ie, rather than indexingthe rtx_code to get the rtx format, then iterate over that to find the'e' fields, if we have an PLUS, then damn it, we ought to know how towalk pretty directly.


(A) 4,459,135 (58.4%) are leaf rtxes with no "e" or "E" fields,
(B) 3,133,875 (41.0%) are rtxes with a single block of "e" fields and
                       no "E" fields, and
(C)    43,532 (00.6%) are more complicated.

(A) is really a special case of (B) in which the block has zero length.
Those are the only two cases that really need to be handled inline.
The implementation does this by having a mapping from an rtx code to the
bounds of its "e" sequence, in the form of a start index and count.

Right. So my thought above us just extending this a bit and realizingthat we don't need an index/count for most RTXs. That may (or may not)be worth the extra pain. Your call whether or not to investigate thatfurther.


Does this look like it's worth pursuing?  If so, is it OK to start
submitting now or should it wait until 4.9.1 is out?  If it is OK,
I plan to convert the ports too and get rid of for_each_rtx.

I think it's definitely worth pursuing. I suspect the release managerswould definitely want to wait for 4.9.1 :-)


jeff

Follow-Ups:
- Re: RFC: Faster for_each_rtx-like iterators
  - From: Richard Biener
- Re: RFC: Faster for_each_rtx-like iterators
  - From: Richard Sandiford

References:
- RFC: Faster for_each_rtx-like iterators
  - From: Richard Sandiford

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]