This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC, ARM] later split of symbol_refs
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: Dmitry Melnik <dm at ispras dot ru>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Ramana Radhakrishnan <Ramana dot Radhakrishnan at arm dot com>, "julian at codesourcery dot com" <julian at codesourcery dot com>, "jie at codesourcery dot com" <jie at codesourcery dot com>, "leitz at ispras dot ru" <leitz at ispras dot ru>, "abel at ispras dot ru" <abel at ispras dot ru>
- Date: Wed, 27 Jun 2012 16:53:43 +0100
- Subject: Re: [RFC, ARM] later split of symbol_refs
- References: <4FEB1F9C.40004@ispras.ru>
On 27/06/12 15:58, Dmitry Melnik wrote:
> Hi,
>
> We'd like to note about CodeSourcery's patch for ARM backend, from which
> GCC mainline can gain 4% on SPEC2K INT:
> http://cgit.openembedded.org/openembedded/plain/recipes/gcc/gcc-4.5/linaro/gcc-4.5-linaro-r99369.patch
> (also the patch is attached).
>
> Originally, we noticed that GNU Go works 6% faster on cortex-a8 with
> -fno-gcse. After profiling we found that this is most likely caused by
> cache misses when accessing global variables. GCC generates ldr
> instructions for them, while this can be avoided by emitting movt/movw
> pair for such cases. RTL expressions for these instructions is high_ and
> lo_sum. Currently, symbol_ref expands as high_ and lo_sum but then
> cprop1 decides that this is redundant and merges them into one load insn.
>
> The problem was also found by Linaro community:
> https://bugs.launchpad.net/gcc-linaro/+bug/886124 .
> Also there is a patch from codesourcery (attached), which was ported to
> linaro gcc 4.5, but is missing in later linaro releases.
> This patch makes split of symbol_refs at the later stage (after cprop),
> instead of generating movt/movw at expand.
>
> It fixed our test case on GNU Go. Also we tested it on SPEC2K INT (ref)
> with GCC 4.8 snapshot from May 12, 2012 on cortex-a9 with -O2 and -mthumb:
>
> Base Base Base Peak Peak Peak
> Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
> ---------- -------- -------- -------- -------- -------- -------
> 164.gzip 1400 492 284 1400 497 282 -0.70%
> 175.vpr 1400 433 323 1400 458 306 -5.26%
> 176.gcc 1100 203 542 1100 198 557 2.77%
> 181.mcf 1800 529 340 1800 528 341 0.29%
> 186.crafty 1000 261 383 1000 256 391 2.09%
> 197.parser 1800 709 254 1800 701 257 1.18%
> 252.eon 1300 219 594 1300 202 644 8.42%
> 253.perlbmk 1800 389 463 1800 367 490 5.83%
> 254.gap 1100 259 425 1100 236 467 9.88%
> 255.vortex 1900 498 382 1900 442 430 12.57%
> 256.bzip2 1500 452 332 1500 424 354 6.63%
> 300.twolf 3000 916 328 3000 853 352 7.32%
> SPECint_base2000 376
> SPECint2000 391 3.99%
>
>
> SPEC2K INT grows by 4% (up to 12.5% on vortex; vpr slowdown is likely
> because of big variance on this test).
>
> Similarly, there are gains of 3-4% without -mthumb on cortex-a9 and on
> cortex-a8 (thumb2 and ARM modes).
>
> This patch can be applied to current trunk and passes regtest
> successfully on qemu-arm.
> Maybe it will be good to have it in trunk?
> If everybody agrees, we can take care of committing it.
>
> --
> Best regards,
> Dmitry
>
>
> gcc-4.5-linaro-r99369.patch
>
Please update the ChangeLog entry (it's not appropriate to mention
Sourcery G++) and add a comment as Steven has suggested.
Otherwise OK.
R.