The performance data for two different implementation of new security feature -ftrivial-auto-var-init

Mon Jan 18 16:12:28 GMT 2021


> On Jan 18, 2021, at 7:09 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>>> D will keep all initialized aggregates as aggregates and live which
>>>>> means stack will be allocated for it.  With A the usual optimizations
>>>>> to reduce stack usage can be applied.
>>>> 
>>>> I checked the routine “poverties::bump_map” in 511.povray_r since it
>>>> has a lot stack increase 
>>>> due to implementation D, by examine the IR immediate before RTL
>>>> expansion phase.  
>>>> (image.cpp.244t.optimized), I found that we have the following
>>>> additional statements for the array elements:
>>>> 
>>>> void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double
>>>> * normal)
>>>> {
>>>> …
>>>> double p3[3];
>>>> double p2[3];
>>>> double p1[3];
>>>> float colour3[5];
>>>> float colour2[5];
>>>> float colour1[5];
>>>> …
>>>> # DEBUG BEGIN_STMT
>>>> colour1 = .DEFERRED_INIT (colour1, 2);
>>>> colour2 = .DEFERRED_INIT (colour2, 2);
>>>> colour3 = .DEFERRED_INIT (colour3, 2);
>>>> # DEBUG BEGIN_STMT
>>>> MEM <double> [(double[3] *)&p1] = p1$0_144(D);
>>>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D);
>>>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D);
>>>> p1 = .DEFERRED_INIT (p1, 2);
>>>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1]
>>>> # DEBUG p1$0 => D#12
>>>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B]
>>>> # DEBUG p1$1 => D#11
>>>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B]
>>>> # DEBUG p1$2 => D#10
>>>> MEM <double> [(double[3] *)&p2] = p2$0_109(D);
>>>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D);
>>>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D);
>>>> p2 = .DEFERRED_INIT (p2, 2);
>>>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2]
>>>> # DEBUG p2$0 => D#9
>>>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B]
>>>> # DEBUG p2$1 => D#8
>>>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B]
>>>> # DEBUG p2$2 => D#7
>>>> MEM <double> [(double[3] *)&p3] = p3$0_256(D);
>>>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D);
>>>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D);
>>>> p3 = .DEFERRED_INIT (p3, 2);
>>>> ….
>>>> }
>>>> 
>>>> I guess that the above “MEM <double>….. = …” are the ones that make the
>>>> differences. Which phase introduced them?
>>> 
>>> Looks like SRA. But you can just dump all and grep for the first occurrence. 
>> 
>> Yes, looks like that SRA is the one:
>> 
>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1] = p1$0_195(D);
>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D);
>> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D);
> 
> I realise no-one was suggesting otherwise, but FWIW: SRA could easily
> be extended to handle .DEFERRED_INIT if that's the main source of
> excess stack usage.  A single .DEFERRED_INIT of an aggregate can
> be split into .DEFERRED_INITs of individual components.

Thanks a lot for the suggestion,
I will study the code of SRA to see how to do this and then see whether this can resolve the issue.
> 
> In other words, the investigation you're doing looks like the right way
> of deciding which passes are worth extending to handle .DEFERRED_INIT.
Yes, with the study so far, looks like the major issue with the .DERERRED_INIT approach is the stack size increase.
Hopefully after resolving this issue, we will be done.

Qing

> 
> Thanks,
> Richard