This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: IVOPT improvement patch


On Tue, May 11, 2010 at 7:27 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Tue, May 11, 2010 at 1:34 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, May 11, 2010 at 8:35 AM, Xinliang David Li <davidxl@google.com> wrote:
>>> Hi, IVOPT has been one of the main area of complaints from gcc users
>>> and it is often shutdown or user is forced to use inline assembly to
>>> write key kernel loops. The following (resulting from the
>>> investigation of many user complaints) summarize some of the key
>>> problems:
>>>
>>> 1) Too many induction variables are used and advanced addressing mode
>>> is not fully taken advantage of. On latest Intel CPU, the increased
>>> loop size (due to iv updates) can have very large negative impact on
>>> performance, e.g, when LSD and uop macro fusion get blocked. The root
>>> cause of the problem is not at the cost model used in IVOPT, but in
>>> the algorithm in finding the 'optimal' assignment from iv candidates
>>> to uses.
>>>
>>> 2) Profile information is not used in cost estimation (e.g. computing
>>> cost of loop variants)
>>>
>>> 3) For replaced IV (original) that are only live out of the loop (i.e.
>>> there are no uses inside loop), the rewrite of the IV occurs inside
>>> the loop which usually results in code more expensive than the
>>> original iv update statement -- and it is very difficult for later
>>> phases to sink down the computation outside the loop (see PR31792).
>>> The right solution is to materialize/rewrite such ivs directly outside
>>> the loop (also to avoid introducing overlapping live ranges)
>>>
>>> 4) iv update statement sometimes block the forward
>>> propagation/combination of the memory ref operation (depending the
>>> before IV value) ?with the loop branch compare. Simple minded
>>> propagation will lead to overlapping live range and addition copy/move
>>> instruction to be generated.
>>>
>>> 5) In estimating the global cost (register pressure), the registers
>>> resulting from LIM of invariant expressions are not considered
>>>
>>> 6) IN MEM_REF creation, loop variant and invariants may be assigned to
>>> the same part -- which is essentially a re-association blocking LIM
>>>
>>> 7) Intrinsic calls that are essentially memory operations are not
>>> recognized as uses.
>>
>> 8) Replacement pointer induction variables do not inherit alias-information
>> pessimizing MEM_REF memory operations.
>
>
> This is a good one. Is there an existing mechanism for the update?

Yes, there is duplicate_ssa_name_ptr_info which is for example
used by the vectorizer for its induction variables.

>>
>>> The attached patch handles all the problems above except for 7.
>>>
>>>
>>> Bootstrapped and regression tested on linux/x86_64.
>>>
>>> The patch was not tuned for SPEC, but SPEC testing was done.
>>> Observable improvements : gcc 4.85%, vpr 1.53%, bzip2 2.36%, and eon
>>> 2.43% (Machine CPU: Intel Xeon E5345/2.33Ghz, m32mode).
>>
>> Can you split the patch into pieces and check SPEC numbers also
>> for 64bit operation? ?I assume that maybe powerpc people want to
>> check the performance impact as well.
>
> On the same machine with m64, eon improves 1.8%, others up and downs
> are less ?than 1%.

Thanks for checking.

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]