[google gcc-4_8] Tree Loop Unrolling - Relax code size increase with -O2

Xinliang David Li davidxl@google.com
Tue Jan 28 01:41:00 GMT 2014



On Mon, Jan 27, 2014 at 5:02 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Hi David,
>    I had to fix a couple of tests. I have attached the patch with the
> fixed tests. The fixes are simple. The tests fail due to two reasons:
> 1) Tests like bmi2-pext32-1a.c fail because the vectorize loop is
> unrolled and directive { "scan-assembler-times "bmi2_pext_si3" 1  }
> fails because bmi2_pext_si3 occurs more than once. This is fixed by
> changing the directive to scan-assembler
> 2) Tests like bmi2-bzhi64-1a.c fail because the unrolled loop no
> longer needs the bzhi instruction as this gets folded into a constant
> since the value is now known for each iteration. In order for this
> test to make sense, I disabled the unrolling in O2 by setting the code
> size growth to zero via option --param
> max-default-completely-peeled-insns=0".
> All the  fixes fell into one of the above two patterns with one
> exception, pr53265.c. Loop unrolling exposed the array out of bounds
> access which is now caught.
> Ok to commit?
> Thanks
> Sri
> On Tue, Jan 21, 2014 at 4:51 PM, Xinliang David Li <davidxl@google.com> wrote:
>> ok.
>> David
>> On Tue, Jan 21, 2014 at 4:46 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> On Tue, Jan 21, 2014 at 2:49 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> I think it might be better to introduce a new parameter for  max peel
>>>> insn at O2 (e.g, call it MAX_O2_COMPLETELY_PEEL_INSN or
>>>> MAX_DEFAULT_...), and use the same logic in your patch to override the
>>>> MAX_COMPLETELY_PEELED_INSN parameter at O2).
>>>> By so doing, we don't need to have a hard coded factor of 2.
>>> Patch attached with that change.
>>> Sri
>>>> In the longer run, we really need better cost/benefit analysis, but
>>>> that is independent.
>>>> David
>>>> On Tue, Jan 21, 2014 at 1:49 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>>> Hi,
>>>>>      Currently, tree unrolling pass(cunroll) does not allow any code
>>>>> size growth in O2 mode.  Code size growth is permitted only if O3 or
>>>>> funroll-loops/fpeel-loops is used. I have created  a patch to allow
>>>>> partial code size increase in O2 mode. With funroll-loops the maximum
>>>>> allowed code growth is 400 unrolled insns. I have set it to 200
>>>>> unrolled insns in O2 mode.  This patch improves an image processing
>>>>> benchmark by 20%. It improves most benchmarks by 1-2%. The code size
>>>>> increase is <1% for all the benchmarks except the image processing
>>>>> benchmark which increases by 6% (perf improves by 20%).
>>>>>      I am working on getting this patch reviewed for trunk. Here is
>>>>> the disussion on this:
>>>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02643.html  I have
>>>>> incorporated the comments on making the patch simpler. I will
>>>>> follow-up on that patch to trunk by also getting data on limiting
>>>>> complete peeling with O2.
>>>>> Is this ok for the google branch?
>>>>> Thanks
>>>>> Sri

More information about the Gcc-patches mailing list