How to traverse all the local variables that declared in the current routine?

Wed Dec 9 16:18:30 GMT 2020

> On Dec 9, 2020, at 9:12 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Wed, Dec 9, 2020 at 4:04 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> 
>> 
>> On Dec 9, 2020, at 2:23 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Tue, Dec 8, 2020 at 8:54 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> 
>> 
>> On Dec 8, 2020, at 1:40 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Mon, Dec 7, 2020 at 5:20 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> 
>> 
>> On Dec 7, 2020, at 1:12 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Fri, Dec 4, 2020 at 5:19 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> 
>> 
>> On Dec 4, 2020, at 2:50 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>> 
>> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>> 
>> 
>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> 
>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> Another issue is, in order to check whether an auto-variable has initializer, I plan to add a new bit in “decl_common” as:
>> /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
>> unsigned decl_is_initialized :1;
>> 
>> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
>> #define DECL_IS_INITIALIZED(NODE) \
>> (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
>> 
>> set this bit when setting DECL_INITIAL for the variables in FE. then keep it
>> even though DECL_INITIAL might be NULLed.
>> 
>> 
>> For locals it would be more reliable to set this flag during gimplification.
>> 
>> Do you have any comment and suggestions?
>> 
>> 
>> As said above - do you want to cover registers as well as locals?  I'd do
>> the actual zeroing during RTL expansion instead since otherwise you
>> have to figure youself whether a local is actually used (see expand_stack_vars)
>> 
>> Note that optimization will already made have use of "uninitialized" state
>> of locals so depending on what the actual goal is here "late" may be too late.
>> 
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>> X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>> X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>> X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>> variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>> if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>> of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
>> 
>> 
>> The question is whether it's in line of peoples expectation that
>> explicitely zero-initialized code behaves differently from
>> implicitely zero-initialized code with respect to optimization
>> and secondary side-effects (late diagnostics, latent bugs, etc.).
>> 
>> Introducing a new concept like .DEFERRED_INIT is much more
>> heavy-weight than an explicit zero initializer.
>> 
>> 
>> What exactly you mean by “heavy-weight”? More difficult to implement or much more run-time overhead or both? Or something else?
>> 
>> The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the current -Wuninitialized analysis untouched and also pass
>> the “uninitialized” info from source code level to “pass_expand”.
>> 
>> 
>> Well, "untouched" is a bit oversimplified.  You do need to handle
>> .DEFERRED_INIT as not
>> being an initialization which will definitely get interesting.
>> 
>> 
>> Yes, during uninitialized variable analysis pass, we should specially handle the defs with “.DEFERRED_INIT”, to treat them as uninitializations.
>> 
>> If we want to keep the current -Wuninitialized analysis untouched, this is a quite reasonable approach.
>> 
>> However, if it’s not required to keep the current -Wuninitialized analysis untouched, adding zero-initializer directly during gimplification should
>> be much easier and simpler, and also smaller run-time overhead.
>> 
>> 
>> As for optimization I fear you'll get a load of redundant zero-init
>> actually emitted if you can just rely on RTL DSE/DCE to remove it.
>> 
>> 
>> Runtime overhead for -fauto-init=zero is one important consideration for the whole feature, we should minimize the runtime overhead for zero
>> Initialization since it will be used in production build.
>> We can do some run-time performance evaluation when we have an implementation ready.
>> 
>> 
>> Note there will be other passes "confused" by .DEFERRED_INIT.  Note
>> that there's going to be other
>> considerations - namely where to emit the .DEFERRED_INIT - when
>> emitting it during gimplification
>> you can emit it at the start of the block of block-scope variables.
>> When emitting after gimplification
>> you have to emit at function start which will probably make stack slot
>> sharing inefficient because
>> the deferred init will cause overlapping lifetimes.  With emitting at
>> block boundary the .DEFERRED_INIT
>> will act as code-motion barrier (and it itself likely cannot be moved)
>> so for example invariant motion
>> will no longer happen.  Likewise optimizations like SRA will be
>> confused by .DEFERRED_INIT which
>> again will lead to bigger stack usage (and less optimization).
>> 
>> 
>> Yes, looks like  that the inserted “.DEFERRED_INIT” function calls will negatively impact tree optimizations.
>> 
>> 
>> But sure, you can try implement a few variants but definitely
>> .DEFERRED_INIT will be the most
>> work.
>> 
>> 
>> How about implement the following two approaches and compare the run-time cost:
>> 
>> A.  Insert the real initialization during gimplification phase.
>> B.  Insert the .DEFERRED_INIT during gimplification phase, and then expand this call to real initialization during expand phase.
>> 
>> The Approach A will have less run-time overhead, but will mess up the current uninitialized variable analysis in GCC.
>> The Approach B will have more run-time overhead, but will keep the current uninitialized variable analysis in GCC.
>> 
>> And then decide which approach we will go with?
>> 
>> What’s your opinion on this?
>> 
>> 
>> Well, in the end you have to try.  Note for the purpose of stack slot
>> sharing you do want the
>> instrumentation to happen during gimplification.
>> 
>> Another possibility is to materialize .DEFERRED_INIT earlier than
>> expand, for example shortly
>> after IPA optimizations to avoid pessimizing loop transforms and allow
>> SRA.  At the point you
>> materialize the inits you could run the late uninit warning pass
>> (which would then be earlier
>> than regular but would still see the .DEFERRED_INIT).
>> 
>> 
>> If we put the “materializing .DEFERRED_INIT” phase earlier as you suggested above,
>> the late uninitialized warning pass has to be moved earlier in order to utilize the “.DEFERRED_INIT”.
>> Then we might miss some opportunities for the late uninitialized warning. I think that this is not we really
>> want.
>> 
>> 
>> While users may be happy to pay some performance stack usage is
>> probably more critical
>> 
>> 
>> So, which pass is for computing the stack usage?
>> 
>> 
>> There is no pass doing that, stack slot assignment and sharing (when
>> lifetimes do
>> not overlap) is done by RTL expansion.
>> 
>> 
>> Okay. I see.
>> 
>> 
>> (just thinking of the kernel) so not regressing there should be as
>> important as preserving
>> uninit warnings (which I for practical purposes see not important at
>> all - people can do
>> "debug" builds without -fzero-init).
>> 
>> 
>> Looks like that the major issue with the “.DERERRED_INIT” approach is:  the new inserted calls to internal const function
>> might inhibit some important tree optimizations.
>> 
>> So, I am thinking again the following another approach I raised in the very beginning:
>> 
>> During gimplification phase, mark the DECL for an auto variable without initialization as “no_explicit_init”, then maintain this
>> “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, add zero-iniitiazation for all DECLs that are
>> marked with “no_explicit_init”.
>> 
>> This approach will not have the issue to interrupt tree optimizations, however, I guess that “maintaining this “no_explicit_init” bit
>> might be very difficult?
>> 
>> Do you have any comments on this approach?
>> 
>> 
>> As said earlier you'll still get optimistic propagation bypassing the
>> still missing
>> implicit zero init.  Maybe that's OK - you don't get "garbage" but you'll get
>> some other defined value.
>> 
>> 
>> There is another approach:
>> 
>> During gimplification phase, adding the real initialization to the uninitialized variables, but mark these initializations as “artificial_init”.
>> Then update the uninitialized analysis phase to handle these initializations marked with “artificial_init” specially as Non-initialization to
>> keep the uninitialized warnings.
>> 
>> Then we should be able to get the maximum optimization and also keep the uninitialized warning at the same time.
>> 
>> Microsoft compiler seems used this approach: (https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/)
>> 
>> "
>> Does InitAll Break Static Analysis?
>> 
>> Static analysis is incredibly useful at letting developers know they forgot to initialize before use.
>> 
>> The InitAll feature indicates if a variable assignment was caused by InitAll to both PREfast and the compiler backend (both of which have uninitialized warnings). This allows the analysis tools to ignore InitAll variable assignments for the purposes of these warnings. With InitAll enabled, a developer will still receive static analysis warnings if they forget to initialize a variable even if InitAll forcibly initializes it for them.
>> 
>> “
>> 
>> Any comment on this?
> 
> You have to try.  Bits to implement are adjusting the uninit pass and
> maining the annotation
> as well as making sure to not elide the real init because there's a
> 'fake' init (we have redundant
> store elimination which works in this direction for example just to name one).

Okay, I see. 
> 
> Richard.
> 
>> As said, you have to implement a few options and compare.
>> 
>> 
>> Yes, I will do that, just make sure which approaches we should implement and compare first.

The following are the approaches I will implement and compare:

Our final goal is to keep the uninitialized warning and minimize the run-time performance cost.

A. Adding real initialization during gimplification, not maintain the uninitialized warnings.
B. Adding real initialization during gimplification, marking them with “artificial_init”. 
     Adjusting uninitialized pass, maintaining the annotation, making sure the real init not
     Deleted from the fake init. 
C.  Marking the DECL for an uninitialized auto variable as “no_explicit_init” during gimplification,
      maintain this “no_explicit_init” bit till after pass_late_warn_uninitialized, or till pass_expand, 
      add real initialization for all DECLs that are marked with “no_explicit_init”.
D. Adding .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to
     real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”.

In the above, approach A will be the one that have the minimum run-time cost, will be the base for the performance
comparison. 

I will implement approach D then, this one is expected to have the most run-time overhead among the above list, but
Implementation should be the cleanest among B, C, D. Let’s see how much more performance overhead this approach
will be. If the data is good, maybe we can avoid the effort to implement B, and C. 

If the performance of D is not good, I will implement B or C at that time.

Let me know if you have any comment or suggestions.

Thanks.

Qing