This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Mostly rewrite genrecog
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: gcc-patches List <gcc-patches at gcc dot gnu dot org>, richard dot sandiford at arm dot com
- Date: Thu, 30 Apr 2015 14:39:52 +0800
- Subject: Re: Mostly rewrite genrecog
- Authentication-results: sourceware.org; auth=none
- References: <87egn5yis1 dot fsf at e105548-lin dot cambridge dot arm dot com>
On Mon, Apr 27, 2015 at 6:20 PM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> I think it's been the case for a while that parallel builds of GCC tend
> to serialise around the compilation of insn-recog.c, especially with
> higher --enable-checking settings. This patch tries to speed that
> up by replacing most of genrecog with a new algorithm.
>
> I think the main problems with the current code are:
>
> 1. Vector architectures have added lots of new instructions that have
> a similar shape and differ only in mode, code or unspec number.
> The current algorithm doesn't have any way of factoring out those
> similarities.
>
> 2. When matching a particular instruction, the current code examines
> everything about a SET_DEST before moving on to the SET_SRC. This has
> two subproblems:
>
> 2a. The destination of a SET isn't very distinctive. It's usually
> just a register_operand, a memory_operand, a nonimmediate_operand
> or a flags register. We therefore tend to backtrack to the
> SET_DEST a lot, oscillating between groups of instructions with
> the same kind of destination.
>
> 2b. Backtracking through predicate checks is relatively expensive.
> It would be good to narrow down the "shape" of the instruction
> first and only then check the predicates. (The backtracking is
> expensive in terms of insn-recog.o compile time too, both because
> we need to copy into argument registers and out of the result
> register, and because it adds more sites where spills are needed.)
>
> 3. The code keeps one local variable per rtx depth, so it ends up
> loading the same rtx many times over (mostly when backtracking).
> This is very expensive in rtl-checking builds because each XEXP
> includes a code check and a line-specific failure call.
>
> In principle the idea of having one local variable per depth
> is good. But it was originally written that way when all optimisations
> were done at the rtl level and I imagine each local variable mapped
> to one pseudo register. These days the statements that reload the
> value needed on backtracking lead to many more SSA names and phi
> statements than you'd get with just a single variable per position
> (loaded once, so naturally SSA). There is still the potential benefit
> of avoiding having sibling rtxes live at once, but fixing (2) above
> reduces that problem.
>
> Also, the code is all goto-based, which makes it rather hard to step through.
>
> The patch deals with these as follows:
>
> 1. Detect subpatterns that differ only by mode, code and/or integer
> (e.g. unspec number) and split them out into a common routine.
>
> 2. Match the "shape" of the instruction first, in terms of codes,
> integers and vector lengths, and only then check the modes, predicates
> and dups. When checking the shape, handle SET_SRCs before SET_DESTs.
> In practice this seems to greatly reduce the amount of backtracking.
>
> 3. Have one local variable per rtx position. I tested the patch with
> and without the change and it helped a lot with rtl-checking builds
> without seeming to affect release builds much either way.
>
> As far as debuggability goes, the new code avoids gotos and just
> uses "natural" control flow.
>
> The headline stat is that a stage 3 --enable-checking=yes,rtl,df
> build of insn-recog.c on my box goes from 7m43s to 2m2s (using the
> same stage 2 compiler). The corresponding --enable-checking=release
> change is from 49s to 24s (less impressive, as expected).
>
> The patch seems to speed up recog. E.g. the time taken to build
> fold-const.ii goes from 6.74s before the patch to 6.69s after it;
> not a big speed-up, but reproducible.
>
> Here's a comparison of the number of lines of code in insn-recog.c
> before and after the patch on one target per config/ CPU:
>
> aarch64-linux-gnueabi 115526 38169 : 33.04%
> alpha-linux-gnu 24479 10740 : 43.87%
> arm-linux-gnueabi 169208 67759 : 40.04%
> avr-rtems 55647 22127 : 39.76%
> bfin-elf 13928 6498 : 46.65%
> c6x-elf 29928 13324 : 44.52%
> cr16-elf 2650 1419 : 53.55%
> cris-elf 18669 7257 : 38.87%
> epiphany-elf 19308 6131 : 31.75%
> fr30-elf 2204 1112 : 50.45%
> frv-linux-gnu 13541 5950 : 43.94%
> h8300-elf 19584 9327 : 47.63%
> hppa64-hp-hpux11.23 18299 8549 : 46.72%
> ia64-linux-gnu 37629 17101 : 45.45%
> iq2000-elf 2752 1609 : 58.47%
> lm32-elf 1536 872 : 56.77%
> m32c-elf 10040 4145 : 41.28%
> m32r-elf 4436 2307 : 52.01%
> m68k-linux-gnu 15739 8147 : 51.76%
> mcore-elf 4816 2577 : 53.51%
> mep-elf 67335 15929 : 23.66%
> microblaze-elf 2656 1587 : 59.75%
> mips-linux-gnu 54543 24186 : 44.34%
> mmix 2597 1487 : 57.26%
> mn10300-elf 6384 3294 : 51.60%
> moxie-elf 1311 659 : 50.27%
> msp430-elf 6054 2382 : 39.35%
> nds32le-elf 5953 3152 : 52.95%
> nios2-linux-gnu 3735 2143 : 57.38%
> pdp11 2137 1157 : 54.14%
> powerpc-eabispe 109322 40582 : 37.12%
> powerpc-linux-gnu 82976 32192 : 38.80%
> rl78-elf 5321 2432 : 45.71%
> rx-elf 14454 7534 : 52.12%
> s390-linux-gnu 48487 20454 : 42.18%
> sh-linux-gnu 104087 41820 : 40.18%
> sparc-linux-gnu 21912 10509 : 47.96%
> spu-elf 19937 8182 : 41.04%
> tilegx-elf 15412 6970 : 45.22%
> tilepro-elf 11998 5479 : 45.67%
> v850-elf 8725 4438 : 50.87%
> vax-netbsdelf 4537 2410 : 53.12%
> visium-elf 15190 7224 : 47.56%
> x86_64-darwin 346396 119593 : 34.52%
> xstormy16-elf 4660 2229 : 47.83%
> xtensa-elf 2799 1514 : 54.09%
>
> Here's the loadable size of insn-recog.o in an --enable-checking=release
> build on an x86_64-linux-gnu box:
>
> aarch64-linux-gnueabi 443955 298026 : 67.13%
> alpha-linux-gnu 97194 80893 : 83.23%
> arm-linux-gnueabi 782325 632248 : 80.82%
> avr-rtems 226827 159763 : 70.43%
> bfin-elf 52563 42376 : 80.62%
> c6x-elf 112512 90142 : 80.12%
> cr16-elf 10446 10006 : 95.79%
> cris-elf 74771 52634 : 70.39%
> epiphany-elf 87577 52284 : 59.70%
> fr30-elf 8041 7713 : 95.92%
> frv-linux-gnu 53699 47543 : 88.54%
> h8300-elf 70708 66274 : 93.73%
> hppa64-hp-hpux11.23 71597 57484 : 80.29%
> ia64-linux-gnu 147286 130632 : 88.69%
> iq2000-elf 11002 11774 : 107.02%
> lm32-elf 5894 5798 : 98.37%
> m32c-elf 36563 28855 : 78.92%
> m32r-elf 17252 16910 : 98.02%
> m68k-linux-gnu 58248 59781 : 102.63%
> mcore-elf 17708 18948 : 107.00%
> mep-elf 314466 150771 : 47.95%
> microblaze-elf 10257 10534 : 102.70%
> mips-linux-gnu 230991 191155 : 82.75%
> mmix 10782 10678 : 99.04%
> mn10300-elf 24035 22802 : 94.87%
> moxie-elf 4622 4198 : 90.83%
> msp430-elf 21707 16539 : 76.19%
> nds32le-elf 22041 19444 : 88.22%
> nios2-linux-gnu 15595 13238 : 84.89%
> pdp11 7630 8254 : 108.18%
> powerpc-eabispe 430816 308481 : 71.60%
> powerpc-linux-gnu 317738 248534 : 78.22%
> rl78-elf 18904 16329 : 86.38%
> rx-elf 55015 56632 : 102.94%
> s390-linux-gnu 190584 148961 : 78.16%
> sh-linux-gnu 408446 307927 : 75.39%
> sparc-linux-gnu 91016 80640 : 88.60%
> spu-elf 80387 69151 : 86.02%
> tilegx-elf 63815 49977 : 78.32%
> tilepro-elf 51763 39252 : 75.83%
> v850-elf 32812 28462 : 86.74%
> vax-netbsdelf 18350 18259 : 99.50%
> visium-elf 56872 46790 : 82.27%
> x86_64-darwin 1306182 883169 : 67.61%
> xstormy16-elf 17044 14430 : 84.66%
> xtensa-elf 10780 9678 : 89.78%
>
> The same for --enable-checking=yes,rtl:
>
> aarch64-linux-gnueabi 1790272 507488 : 28.35%
> alpha-linux-gnu 440058 193826 : 44.05%
> arm-linux-gnueabi 2845568 1299337 : 45.66%
> avr-rtems 885672 294387 : 33.24%
> bfin-elf 280606 142836 : 50.90%
> c6x-elf 486345 259256 : 53.31%
> cr16-elf 46626 20044 : 42.99%
> cris-elf 426813 144414 : 33.84%
> epiphany-elf 353078 122166 : 34.60%
> fr30-elf 40414 21042 : 52.07%
> frv-linux-gnu 259550 111335 : 42.90%
> h8300-elf 355199 158411 : 44.60%
> hppa64-hp-hpux11.23 340584 149009 : 43.75%
> ia64-linux-gnu 661364 293710 : 44.41%
> iq2000-elf 41123 26709 : 64.95%
> lm32-elf 20370 14781 : 72.56%
> m32c-elf 174344 62000 : 35.56%
> m32r-elf 74357 41773 : 56.18%
> m68k-linux-gnu 275733 117445 : 42.59%
> mcore-elf 85180 48018 : 56.37%
> mep-elf 1450168 376020 : 25.93%
> microblaze-elf 44189 26295 : 59.51%
> mips-linux-gnu 876650 375753 : 42.86%
> mmix 49882 25363 : 50.85%
> mn10300-elf 128148 66768 : 52.10%
> moxie-elf 23388 9011 : 38.53%
> msp430-elf 114200 34426 : 30.15%
> nds32le-elf 101416 73677 : 72.65%
> nios2-linux-gnu 58799 29825 : 50.72%
> pdp11 32836 18557 : 56.51%
> powerpc-eabispe 1976098 626942 : 31.73%
> powerpc-linux-gnu 1510652 526841 : 34.88%
> rl78-elf 93675 40538 : 43.28%
> rx-elf 279748 137284 : 49.07%
> s390-linux-gnu 857009 316494 : 36.93%
> sh-linux-gnu 2154337 806571 : 37.44%
> sparc-linux-gnu 367682 169019 : 45.97%
> spu-elf 341945 135629 : 39.66%
> tilegx-elf 235480 112103 : 47.61%
> tilepro-elf 246231 104137 : 42.29%
> v850-elf 158028 72875 : 46.12%
> vax-netbsdelf 85057 37578 : 44.18%
> visium-elf 257148 103331 : 40.18%
> x86_64-darwin 5514235 1721777 : 31.22%
> xstormy16-elf 83456 46128 : 55.27%
> xtensa-elf 52652 29880 : 56.75%
>
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-none-eabi.
> Also tested by building the testsuite for each of the targets above
> and making sure there were no assembly differences. Made sure that no
> objects in spec2k6 changed for aarch64-linux-gnu (except for perlbench
> perl.o and cactusADM datestamp.o, where the differences are timestamps).
> OK to install?
>
> Thanks,
> Richard
>
> PS. I've attached the new genrecog.c since the diff version is unreadable.
>
>
> gcc/
> * Makefile.in (build/genrecog.o): Depend on inchash.h.
> (build/genrecog$(build_exeext): Depend on build/hash-table.o and
> build/inchash.o
> * genrecog.c: Rewrite most of the code except for the third page.
>
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in 2015-04-27 10:42:57.783191573 +0100
> +++ gcc/Makefile.in 2015-04-27 10:43:42.878643078 +0100
> @@ -2527,7 +2527,8 @@ build/genpeep.o : genpeep.c $(RTL_BASE_H
> build/genpreds.o : genpreds.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \
> coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h $(OBSTACK_H)
> build/genrecog.o : genrecog.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \
> - coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h
> + coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h \
> + $(HASH_TABLE_H) inchash.h
> build/genhooks.o : genhooks.c $(TARGET_DEF) $(C_TARGET_DEF) \
> $(COMMON_TARGET_DEF) $(BCONFIG_H) $(SYSTEM_H) errors.h
> build/genmddump.o : genmddump.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \
> @@ -2559,6 +2560,8 @@ genprog = $(genprogerr) check checksum c
> # These programs need libs over and above what they get from the above list.
> build/genautomata$(build_exeext) : BUILD_LIBS += -lm
>
> +build/genrecog$(build_exeext) : build/hash-table.o build/inchash.o
> +
> # For stage1 and when cross-compiling use the build libcpp which is
> # built with NLS disabled. For stage2+ use the host library and
> # its dependencies.
>
Hi Richard,
I noticed that this patch caused ICE for gcc.target/arm/mmx-2.c on
arm-none-linux-gnueabi. Could you please have a look at it?
The log message is as below,
/projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c: In function 'foo':
/projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c:166:1:
error: unrecognizable insn:
(insn 541 540 542 2 (set (reg:V4HI 512 [ D.4809 ])
(vec_merge:V4HI (vec_select:V4HI (reg:V4HI 510 [ D.4806 ])
(parallel [
(const_int 2 [0x2])
(const_int 0 [0])
(const_int 3 [0x3])
(const_int 1 [0x1])
]))
(vec_select:V4HI (reg:V4HI 511 [ D.4806 ])
(parallel [
(const_int 0 [0])
(const_int 2 [0x2])
(const_int 1 [0x1])
(const_int 3 [0x3])
]))
(const_int 5 [0x5])))
/projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c:159 -1
(nil))
/projects/.../src/gcc/gcc/testsuite/gcc.target/arm/mmx-2.c:166:1:
internal compiler error: in extract_insn, at recog.c:2341
0xa42d2a _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
/projects/.../src/gcc/gcc/rtl-error.c:110
0xa42d59 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/projects/.../src/gcc/gcc/rtl-error.c:118
0xa15ff7 extract_insn(rtx_insn*)
/projects/.../src/gcc/gcc/recog.c:2341
0x7ffb42 instantiate_virtual_regs_in_insn
/projects/.../src/gcc/gcc/function.c:1598
0x7ffb42 instantiate_virtual_regs
/projects/.../src/gcc/gcc/function.c:1966
0x7ffb42 execute
/projects/.../src/gcc/gcc/function.c:2015
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
GCC is configured with
gcc/configure --target=arm-none-linux-gnueabi --prefix=
--with-sysroot=... --enable-shared --disable-libsanitizer
--disable-libssp --disable-libmudflap
--with-plugin-ld=arm-none-linux-gnueabi-ld --enable-checking=yes
--enable-languages=c,c++,fortran --with-gmp=... --with-mpfr=...
--with-mpc=... --with-isl=... --with-cloog=... --with-arch=armv7-a
--with-fpu=vfpv3-d16 --with-float=softfp --with-arch=armv7-a
Thanks,
bin