This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: -Wuninitialized issues
- From: Diego Novillo <dnovillo at redhat dot com>
- To: gcc at gcc dot gnu dot org
- Cc: law at redhat dot com, Mark Mitchell <mark at codesourcery dot com>
- Date: Tue, 1 Nov 2005 11:06:49 -0500
- Subject: Re: -Wuninitialized issues
- References: <4365CA5E.3070401@codesourcery.com> <1130802583.19967.122.camel@localhost.localdomain>
On Monday 31 October 2005 18:49, Jeffrey A Law wrote:
> Thoughts?
>
I'm not sure this would buy you much better precision. I was tinkering
with PR 18501 a few days ago. This is one of those cases where the
optimizers (CCP in this case) remove the code that we were supposed to
warn about. It goes something like this (all variables are locals):
bitmap_print_value_set ()
{
# first_1 = PHI <first_2(0), first_4(3)>;
<L3>:;
D.1286_3 = bmp_iter_set ();
if (D.1286_3 != 0) goto <L0>; else goto <L4>;
<L0>:;
if (first_1 == 0) goto <L1>; else goto <L2>;
<L1>:;
something ();
<L2>:;
first_4 = 0;
goto <L3>;
<L4>:;
return;
}
To prevent losing location information for the warning, I had modified the
propagation engine to warn as it folded the expression away. In this
case, we were missing the warning because constant propagation was merging
an UNDEFINED value (first_2) with a CONSTANT (first_4 == 0), which yields
the constant because we explicitly ignore undefined values on locals.
So, as we fold 'if (first_1 == 0)' into 'if (1)', the patch emits the
warning about the possibly uninitialized value of 'first'. My approach
improved precision of the warning because we are able to indicate exactly
what statement is doing the uninitialized reference.
However, there are two huge problems with the approach: (1) it is
intrusive. Passes are required to keep track of their actions, in this
case, CCP would inform the propagation engine that it had called the merge
operator with an uninitialized value.
(2) it introduces false positives:
sub ()
{
i_3 = 0;
j_4 = 0;
# k_2 = PHI <k_5(0), k_9(1)>;
# i_1 = PHI <i_3(0), i_10(1)>;
<L1>:;
D.1285_6 = i_1 | j_4;
if (D.1285_6 == 0) goto <L0>; else goto <L2>;
<L0>:;
k_9 = 10;
i_10 = sub ();
goto <L1>;
<L2>:;
return k_2;
}
Again, when CCP evaluates k_2 = PHI <k_5, k_9>, it will merge the undefined
name k_5 with the constant value k_9 == 10. So, when we replace 'return
k_2' with 'return 10' in <L2>, we will warn that 'k' may be used
uninitialized. However, in this case the loop is guaranteed to execue at
least once, so the undefined value k_5 is never actually used by the
program.
Unless I'm misunderstanding your approach, it will have the same problem.
The early pass will have marked the PHI node 'k_2 = PHI <k_5, k_9>' as
problematic. Depending on when the second pass runs, it will not have a
chance of marking it as problematic, and we will emit the warning.
The thing here is that both test cases are structurally similar. In both
cases we are disregarding an undefined value during optimization. We
don't actually know whether that undefined value will be executed or not.
In the first test case, it will be. In the second test case, it will not
(k_2 is guaranteed to get its value from k_9 the first time).
A more robust solution would have to involve some kind of symbolic
processing on our part. Mechanically noticing whether we see undefined
names in PHI nodes will always have this problem.
We could implement some variant of Gated SSA to record predicates in PHI
nodes. Uses of potentially uninitialized names would have to be analyzed
for two things:
(1) if the use is protected by a predicate with the same value number as
the predicate protecting a real definition, then you don't need to warn.
Assume x_0 is undefined and that pred_5 and pred_9 have the same value
number.
if (pred_5)
x_5 = 3;
x_6 = PHI <x_0, x_5>
...
if (pred_9)
... = x_6;
(2) if we can guarantee that at least one of the PHI arguments with real
definitions will execute at least once, then we don't need to warn. This
is the case in the second test case I described above.
Note, however, that this approach is not foolproof either and I suspect it
would not be trivial to implement. I think it would allow us to not be at
the mercy of the order in which optimizations are executed, but it can be
easily confused in the presence of aliasing and other nasties that prevent
you from computing good value numbers or execution paths. I suspect that
in the extreme, this problem can be reduced to the stopping problem.