This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Enhance reload_cse_move2add


On 07/08/2010 01:02 AM, Jeff Law wrote:
On 07/02/2010 02:26 AM, Jie Zhang wrote:

I have not benchmarked it on x86 using EEMBC. We use SPEC2000 for benchmarking on x86 here. I need to ask if it's possible to setup EEMBC for x86 in our environment.

It's not easy for me to run EEMBC on x86. We have no dedicated x86
hardware for running EEMBC.
Even if you just run it a handful of times on whatever your development
workstation happens to be and it's only one test that's of particular
interest (idctrn01). Basically this one test is a clear outlier on ARM,
so I'd like to see if that behaviour shows up on x86 as well.

I just tested idctrn01 on my pentium-m laptop. My patch does not change the performance of idctrn01 on i386. I did some investigation. It's because i386 can directly use "symbol_ref + offset" as the address in load instructions. So there is no optimization opportunity that my patch can utilize. That's the difference between RISC and CISC.

I also tried to test my patch on PowerPC. It does not make any difference either on PowerPC. It's because when reload pass inserts instructions to load "symbol_ref + offset" into a register, it uses two instructions instead of one:

(set (reg:SI 23 23)
     (high:SI (const:SI (plus:SI (symbol_ref:SI ("*.LANCHOR0"))
                                 (const_int 336 [0x150])))))

(set (reg:SI 23 23)
     (lo_sum:SI (reg:SI 23 23)
                (const:SI (plus:SI (symbol_ref:SI ("*.LANCHOR0"))
                                   (const_int 336 [0x150])))))

Currently my patch cannot optimize for this case. But it can be enhanced to handle this with some effort.

If you can get size #s, that would be interesting too -- I'd expect
this
to be a small size improvement independent of the processor
architecture. Runtime performance I don't have a good feel for -- I can
easily envision cases where it's going to be better and others where
it's going to be worse.

I use 5 iterations and test -O2 and -O3 this time. It have been running
for 24 hours. It might need one or two more hours to complete. So I have
to report the results tomorrow.


Here are the new results. 'Before' is before applying my patch, 'After' is after applying my patch.
Thanks. Overall it looks like a wash (with the exception of idctrn01 on
arm).

Did you get the codesize #s? I'd expect them to show a small, but
consistent improvement which ought to be enough to push the patch from a
wash to a clear improvement on both x86 & arm.

I collected the code size data from SPEC2000 on AMD64. My patch does not change code size for any test whether -O2 or -O3, except

Code Size
=========
Test  -O2		Before	After	Change
----------------------------------------------
172.mgrid		14962	14976	    14

Code Size
Test  -O3		Before	After	Change
----------------------------------------------
171.swim		19231	19247	    16


For EEMBC on ARM, my patch saves some bytes in the following 4 tests for code. No changes on other tests. No changes on data/bss size on all tests.


Code Size
=========
Test  -O3		Before	After	Change
----------------------------------------------
automotive/aifftr01	86444	86396	   -48
automotive/bitmnp01	60129	60112	   -17
automotive/idctrn01	84812	84557	  -255
office/text01		72685	72668	   -17


Regards, -- Jie Zhang CodeSourcery


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]