[PATCH] Enhance reload_cse_move2add

Thu Jul 1 18:26:00 GMT 2010

On 07/02/2010 02:10 AM, Jeff Law wrote:
> On 06/30/10 11:13, Jie Zhang wrote:
>> On 07/01/2010 12:47 AM, Jeff Law wrote:
>>> On 06/30/10 01:45, Jie Zhang wrote:
>>>> Currently reload_cse_move2add can transform
>>>>
>>>> (set (REGX) (CONST_INT A))
>>>> ...
>>>> (set (REGX) (CONST_INT B))
>>>>
>>>> to
>>>>
>>>> (set (REGX) (CONST_INT A))
>>>> ...
>>>> (set (REGX) (plus (REGX) (CONST_INT B-A)))
>>>>
>>>> This patch enhances it to be able to transform
>>>>
>>>> (set (REGX) (CONST (PLUS (SYMBOL_REF) (CONST_INT A))))
>>>> ...
>>>> (set (REGY) (CONST (PLUS (SYMBOL_REF) (CONST_INT B))))
>>>>
>>>> to
>>>>
>>>> (set (REGX) (CONST (PLUS (SYMBOL_REF) (CONST_INT A))))
>>>> ...
>>>> (set (REGY) (CONST (PLUS (REGX) (CONST_INT B-A))))
>>>>
>>>>
>>>> Benchmarking using EEMBC on ARM Cortex-A8 shows performance
>>>> improvement on one test:
>>>>
>>>> idctrn01: 6%
>>> Was this a size or runtime performance improvement?
>>>
>> This is a runtime performance improvement.
> That's quite a surprise. Just for giggles, does x86 show any change on
> idctrn01? I realize it's an eembc benchmark, but if it's that sensitive
> to this optimization, we ought to see some change in behaviour for x86
> as well.
>
I have not benchmarked it on x86 using EEMBC. We use SPEC2000 for 
benchmarking on x86 here. I need to ask if it's possible to setup EEMBC 
for x86 in our environment.

>
>>
>>>>
>>>> Benchmarking using SPEC2000 on AMD Athlon64 X2 3800+ shows 0.4%
>>>> regression on CINT2000 and 0.1% improvement on CFP2000.
>>>>
>>>> Bootstrapped and regression tested on x86_64.
>>> Any thoughts on why spec2k showed a regression and was it a size or
>>> runtime regression?
>>>
>> I'm not sure what caused the regressions. I'm redoing the
>> benchmarking. This time I do it without X and shut down running
>> servers as much as I can. Hope this can remove measuring error.
> Strongly advised (shut down X and as many services as possible, turn off
> speed scaling in the processor, etc).
>
Yes, I did all of these this time.

> If you can get size #s, that would be interesting too -- I'd expect this
> to be a small size improvement independent of the processor
> architecture. Runtime performance I don't have a good feel for -- I can
> easily envision cases where it's going to be better and others where
> it's going to be worse.
>
I use 5 iterations and test -O2 and -O3 this time. It have been running 
for 24 hours. It might need one or two more hours to complete. So I have 
to report the results tomorrow.


Thanks,
-- 
Jie Zhang
CodeSourcery