This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Mostly rewrite genrecog

From: Jeff Law <law at redhat dot com>
To: gcc-patches at gcc dot gnu dot org, richard dot sandiford at arm dot com
Date: Tue, 28 Apr 2015 17:12:56 -0600
Subject: Re: Mostly rewrite genrecog
Authentication-results: sourceware.org; auth=none
References: <87egn5yis1 dot fsf at e105548-lin dot cambridge dot arm dot com>

On 04/27/2015 04:20 AM, Richard Sandiford wrote:

I think it's been the case for a while that parallel builds of GCC tend
to serialise around the compilation of insn-recog.c, especially with
higher --enable-checking settings.  This patch tries to speed that
up by replacing most of genrecog with a new algorithm.

I think the main problems with the current code are:

1. Vector architectures have added lots of new instructions that have
    a similar shape and differ only in mode, code or unspec number.
    The current algorithm doesn't have any way of factoring out those
    similarities.

2. When matching a particular instruction, the current code examines
    everything about a SET_DEST before moving on to the SET_SRC.  This has
    two subproblems:

    2a. The destination of a SET isn't very distinctive.  It's usually
        just a register_operand, a memory_operand, a nonimmediate_operand
        or a flags register.  We therefore tend to backtrack to the
        SET_DEST a lot, oscillating between groups of instructions with
        the same kind of destination.

    2b. Backtracking through predicate checks is relatively expensive.
        It would be good to narrow down the "shape" of the instruction
        first and only then check the predicates.  (The backtracking is
        expensive in terms of insn-recog.o compile time too, both because
        we need to copy into argument registers and out of the result
        register, and because it adds more sites where spills are needed.)

3. The code keeps one local variable per rtx depth, so it ends up
    loading the same rtx many times over (mostly when backtracking).
    This is very expensive in rtl-checking builds because each XEXP
    includes a code check and a line-specific failure call.

    In principle the idea of having one local variable per depth
    is good.  But it was originally written that way when all optimisations
    were done at the rtl level and I imagine each local variable mapped
    to one pseudo register.  These days the statements that reload the
    value needed on backtracking lead to many more SSA names and phi
    statements than you'd get with just a single variable per position
    (loaded once, so naturally SSA).  There is still the potential benefit
    of avoiding having sibling rtxes live at once, but fixing (2) above
    reduces that problem.

Also, the code is all goto-based, which makes it rather hard to step through.

The patch deals with these as follows:

1. Detect subpatterns that differ only by mode, code and/or integer
    (e.g. unspec number) and split them out into a common routine.

2. Match the "shape" of the instruction first, in terms of codes,
    integers and vector lengths, and only then check the modes, predicates
    and dups.  When checking the shape, handle SET_SRCs before SET_DESTs.
    In practice this seems to greatly reduce the amount of backtracking.

3. Have one local variable per rtx position.  I tested the patch with
    and without the change and it helped a lot with rtl-checking builds
    without seeming to affect release builds much either way.

As far as debuggability goes, the new code avoids gotos and just
uses "natural" control flow.

The headline stat is that a stage 3 --enable-checking=yes,rtl,df
build of insn-recog.c on my box goes from 7m43s to 2m2s (using the
same stage 2 compiler).  The corresponding --enable-checking=release
change is from 49s to 24s (less impressive, as expected).

The patch seems to speed up recog.  E.g. the time taken to build
fold-const.ii goes from 6.74s before the patch to 6.69s after it;
not a big speed-up, but reproducible.

[ big snip ]

I don't see anything that makes me think we don't want this :-)

It's interesting to note that the regressions in loadable size of arelease-checking insn-recog aren't any mainstream ports. Hell, I'dprobably claim they're all fringe ports (iq2000, m68k, mcore,microblaze, pdp11, rx)

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-none-eabi.
Also tested by building the testsuite for each of the targets above
and making sure there were no assembly differences.  Made sure that no
objects in spec2k6 changed for aarch64-linux-gnu (except for perlbench
perl.o and cactusADM datestamp.o, where the differences are timestamps).
OK to install?

To be very blunt, I'm probably not capable of reviewing the new code.You're going to know it best and you should probably own it.

Given your history with gcc and attention to detail, I'm comfortablewith approving this knowing you'll deal with any issues as they arise.

The one thing I would ask is that you make sure to include the recentlyadded matching constraint checking bits in genrecog. I'm assuming allthe older sanity checks that have been in genrecog for a longer periodof time have been kept.


Jeff

Follow-Ups:
- Re: Mostly rewrite genrecog
  - From: Richard Sandiford

References:
- Mostly rewrite genrecog
  - From: Richard Sandiford

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]