[RFC, ARM] later split of symbol_refs

Wed Jun 27 15:08:00 GMT 2012

Hi,

We'd like to note about CodeSourcery's patch for ARM backend, from which 
GCC mainline can gain 4% on SPEC2K INT: 
http://cgit.openembedded.org/openembedded/plain/recipes/gcc/gcc-4.5/linaro/gcc-4.5-linaro-r99369.patch 
(also the patch is attached).

Originally, we noticed that GNU Go works 6% faster on cortex-a8 with 
-fno-gcse.  After profiling we found that this is most likely caused by 
cache misses when accessing global variables.  GCC generates ldr 
instructions for them, while this can be avoided by emitting movt/movw 
pair for such cases. RTL expressions for these instructions is high_ and 
lo_sum.  Currently, symbol_ref expands as high_ and lo_sum but then 
cprop1 decides that this is redundant and merges them into one load insn.

The problem was also found by Linaro community: 
https://bugs.launchpad.net/gcc-linaro/+bug/886124 .
Also there is a patch from codesourcery (attached), which was ported to 
linaro gcc 4.5, but is missing in later linaro releases.
This patch makes split of symbol_refs at the later stage (after cprop), 
instead of generating movt/movw at expand.

It fixed our test case on GNU Go.  Also we tested it on SPEC2K INT (ref) 
with GCC 4.8 snapshot from May 12, 2012 on cortex-a9 with -O2 and -mthumb:

             Base      Base      Base      Peak      Peak      Peak
Benchmarks  Ref Time  Run Time   Ratio    Ref Time  Run Time  Ratio
----------  --------  --------  --------  --------  -------- -------
164.gzip    1400      492       284     1400       497       282  -0.70%
175.vpr     1400      433       323     1400       458       306  -5.26%
176.gcc     1100      203       542     1100       198       557   2.77%
181.mcf     1800      529       340     1800       528       341   0.29%
186.crafty  1000      261       383     1000       256       391   2.09%
197.parser  1800      709       254     1800       701       257   1.18%
252.eon     1300      219       594     1300       202       644   8.42%
253.perlbmk 1800      389       463     1800       367       490   5.83%
254.gap     1100      259       425     1100       236       467   9.88%
255.vortex  1900      498       382     1900       442       430  12.57%
256.bzip2   1500      452       332     1500       424       354   6.63%
300.twolf   3000      916       328     3000       853       352   7.32%
SPECint_base2000                376
SPECint2000                                                  391   3.99%

SPEC2K INT grows by 4% (up to 12.5% on vortex; vpr slowdown is likely 
because of big variance on this test).

Similarly, there are gains of 3-4% without -mthumb on cortex-a9 and on 
cortex-a8 (thumb2 and ARM modes).

This patch can be applied to current trunk and passes regtest 
successfully on qemu-arm.
Maybe it will be good to have it in trunk?
If everybody agrees, we can take care of committing it.

--
Best regards,
   Dmitry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcc-4.5-linaro-r99369.patch
Type: text/x-diff
Size: 1455 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20120627/5795b792/attachment.bin>