Bug 89501

Summary: Odd lack of warning about missing initialization
Product: gcc Reporter: Linus Torvalds <torvalds>
Component: middle-endAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED DUPLICATE    
Severity: normal CC: jeffreyalaw, ncm
Priority: P3 Keywords: diagnostic
Version: 8.2.1   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed:
Bug Depends on:    
Bug Blocks: 24639    
Attachments: You can compile this to see a lack of warning.

Description Linus Torvalds 2019-02-25 22:33:26 UTC
Created attachment 45820 [details]
You can compile this to see a lack of warning.

We had a simple kernel patch that introduced a stupid bug due to an uninitialized variable, and while we got it all sorted out and the fix was trivial, people were surprised by the lack of warning for the uninitialized case.

I'm adding a test-case as an attachment that largely matches the kernel code that didn't warn. But it boils down to a pattern of

     int ret;  /* UNINITIALIZED */

     if (somecondition) {
          ret = functioncall(x);
          if (ret)
               return ret;
     }
     .. some more work ..
     return ret;   /* Possibly uninitialized return value! */

What I *suspect* happens is

 (a) gcc sees that there is only one assignment to "ret"

 (b) in the same basic block as the assignment, there is a test against "ret" being nonzero that goes out.

and what I think happens is that (a) causes gcc to consider that assignment to be the defining assignment (which makes all kinds of sense in an SSA world), and then (b) means that gcc decides that clearly "ret" has to be zero in any case that doesn't go out due to the if-test.

So it turns out that gcc almost by mistake generates code that works (and doesn't warn about it, exactly because it works), even though the source code was clearly buggy. 

The attached test-case is stupid but intentionally made to be as close to the kernel source case as possible. With it, I can do:

Look, ma: no warning:

   gcc -O2 -S -Wall test.c

but this gives the expected warning:

   gcc -O2 -S -Wall -DHIDE_PROBLEM test.c

regardless, this is not a huge problem for us, but I decided to make a report since we have a test case, and maybe somebody gets excited about it.

Thanks,

            Linus
Comment 1 Andrew Pinski 2019-02-25 23:06:20 UTC
I think it comes down to the same issue as PR 18501.
Comment 2 Linus Torvalds 2019-02-25 23:32:26 UTC
(In reply to Andrew Pinski from comment #1)
> I think it comes down to the same issue as PR 18501.

Very possibly the same issue in just a different guise.

NOTE! I have in the meantime verified that yes, it does seem to be about the pattern

   int x;

   if (somecondition) {
      x = something();
      if (x != XYZ)
         return x;
   }

   return x;

where gcc seems to turn the "if (x != XYZ) return x" to mean that "x" clearly _has_ to be XYZ elsewhere.

If I change my kernel-based test-case to do

    if (ret != 1)
        return ret;

instead of the original

    if (ret)
        return ret;

then gcc will actually generate code that ends with

	movl	$1, %eax
	popq	%rbp
	popq	%r12
	ret

ie it will basically consider "ret" to be initialized to that value "1", even if the basic block that assigned it was never actually executed.

Knowing how SSA works, I'm not entirely surprised, but obviously if you'd like to see the warning about buggy source code, it's less than optimal.

Anyway, this shouldn't be a high priority, but it does strike me as a potentially fairly common pattern that people might be missing warnings for.

              Linus
Comment 3 Jeffrey A. Law 2019-02-25 23:35:36 UTC
Yup.  It's the same as 18501.  We meet UNDEFINED and [0,0] resulting in [0,0] and nothing ever causes reevaluation of the PHI.  Things are working as "expected".    

My approach from 2005 would almost certainly address this instance since we'd see the uninitialized use in the early uninit pass, and later see it went away.  It'd ultimately generate indicating there was an uninitialized use of "ret" that was optimized away.

But I'm not inclined to rip the scabs of those old wounds and reopen that discussion (for the umpteenth time)
Comment 4 Jeffrey A. Law 2019-02-25 23:36:41 UTC
Dup.  18501 is the canonical bug for this issue.

*** This bug has been marked as a duplicate of bug 18501 ***
Comment 5 Richard Biener 2019-02-26 09:39:05 UTC
(In reply to Jeffrey A. Law from comment #3)
> Yup.  It's the same as 18501.  We meet UNDEFINED and [0,0] resulting in
> [0,0] and nothing ever causes reevaluation of the PHI.  Things are working
> as "expected".    

And this meeting helps us avoid bogus warnings for cases where GCC has
difficulties proving dead code paths are actually dead ...  In fact
to preserve uninit warnings we avoid doing the same things for copies,
thus when meeting [i, i] with UNDEFINED which in fact causes some
false positives to appear...  (and with my GCC-is-an-optimizing-compiler-not-a-static-analysis-tool hat on I'd wanted to change that more than once,
but testsuite regressions prevented me from doing that)

Uninit warnings are hard ;)
Comment 6 Linus Torvalds 2019-02-26 16:27:39 UTC
(In reply to Richard Biener from comment #5)
> 
> And this meeting helps us avoid bogus warnings for cases where GCC has
> difficulties proving dead code paths are actually dead ... 

Ack. I do see the difficulty. We already disable some of the "may-be-uninitialized" warnings in the kernel because they end up being too noisy for some configurations.

NP. If you guys figure something out, it will be good. If not, we'll survive.
Comment 7 Jeffrey A. Law 2019-02-26 16:30:25 UTC
It's reliably the case that a false positive uninit warning also represents a failure to optimize something.  So we've got significant incentives to deeply analyze and look for fixes.  So feel free to pass examples of those along.
Comment 8 Linus Torvalds 2019-02-26 16:50:37 UTC
(In reply to Jeffrey A. Law from comment #7)
> It's reliably the case that a false positive uninit warning also represents
> a failure to optimize something.  So we've got significant incentives to
> deeply analyze and look for fixes.  So feel free to pass examples of those
> along.

Well, most of it is due to interactions with *other* issues entirely.

For example, when we enable GCOV for coverage checking, we have to disable tree-loop-im, because of excessive stack usage due to gcc bugzilla 69702.

And disabling that optimization then leads to bogus "might be used uninitialized" warnings.

We have a few other cases like that. Eg we can't use -Os without also disabling -Wmaybe-uninitialized etc.

Some of the cases may be historical (ie maybe you've fixed the issues that cause us to do that in newer versions), but for various distro reasons we end up supporting old compilers for a _looong_ time.

We not that long ago ended up increasing the minimum required gcc version for the kernel to gcc-4.6.

So we have this odd "we love using new features" fight with "oops, but we end up having people using relatively ancient compiler versions".
Comment 9 ncm 2019-03-04 19:03:22 UTC
What I don't understand is why it doesn't optimize away the check on (somecondition), since it is assuming the code in the dependent block always runs.
Comment 10 Linus Torvalds 2019-03-04 19:11:27 UTC
(In reply to ncm from comment #9)
> What I don't understand is why it doesn't optimize away the check on
> (somecondition), since it is assuming the code in the dependent block always
> runs.

No, it very much doesn't assume that. The 'somecondition' really is dynamic.

What happens is simply that because gcc sees only a single assignment to the variable, and that assignment is then limited by the subsequent value test to a single value, gcc will just say "ok, any other place where that variable is used, just use the known single value".

And it does that whether the 'if (somecondition)' path was taken or not.

It's a perfectly valid optimization. In fact, it's a lovely optimization, I'm not at all complaining about the code generation.

It's just that as part of that (quite reasonable) optimization it also optimized away the whole "oops, there wasn't really a valid initialization in case the if-statement wasn't done".

Obviously that's undefined behavior, and the optimization is valid regardless, but the lack of warning means that we didn't see that we had technically undefined behavior that the compiler has silently just fixed up for us.

I think the cause of this all is quite reasonable and understandable, and I also see why gcc really does want to throw away the undefined case entirely (because otherwise you can get into the reverse situation where you warn unnecessarily, because gcc isn't smart enough to see that some undefined case will never ever actually happen). 

Plus I assume it simplifies things a lot to just not even have to track the undefined case at all. You can just track "ok, down this path we have a known value for this SSA, and we don't need to keep track of any inconvenient phi nodes etc".
Comment 11 Jeffrey A. Law 2019-03-04 23:19:41 UTC
WRT c#9.  Linus is right.  THe condition is dynamic and we don't want to remove it in this circumstance.

More generally we  have considered whether or not we could eliminate the control dependent path which leads to undefined behavior.  But you have to be careful  because statements on the path prior to the actual undefined behavior may have observable side effects.

So what we do in those cases is isolate the path  via block copying and CFG manipulations (usually it's just a subset of paths to a particular statement which result in undefined behavior).  Once the path is isolated we can replace the statement which triggers the undefined behavior with a trap.  That in turn allows some CFG simplifications, const/copy propagation and usually further dead code elimination.

Right now we do this just for NULLs that are dereferenced and division by zero.  But I can see doing it for out of bounds array indexing, bogus mem* calls and perhaps other cases of undefined behavior.  ie, if you have something like

char foo[10];

if (shouldn't happen)
  indx = -1;
else
  indx = whatever ();
something = foo[indx];
... more work ...

This kind of idiom occurs fairly often when -1 is used to represent a can't happen case and you've got abstraction layers that get eliminated via inlining.  Path isolation would turn that into something like this:

if (shouldn't happen) {
  indx = -1
  trap;
} else {
  indx = whatever ();
  something = foo[indx];
}
... more work ...

Which will simplify in various useful ways.



--

If we were to apply those concepts to this example we'd conceptually turn the code into something like this:

if (somecondition) {
    ret = functioncall(x);
    if (ret)
      return ret;
}
... some more work ...
trap
Comment 12 Linus Torvalds 2019-03-04 23:48:56 UTC
(In reply to Jeffrey A. Law from comment #11)
> 
> More generally we  have considered whether or not we could eliminate the
> control dependent path which leads to undefined behavior.  But you have to
> be careful  because statements on the path prior to the actual undefined
> behavior may have observable side effects.

Note that for the kernel, we consider those kinds of "optimizations" completely and utterly wrong-headed, and will absolutely refuse to use a compiler that does things like that.

It's basically the compiler saying "I don't care what you meant, I can do anything I want, and that means I will screw the code up on purpose".

I will personally switch the kernel immediately to clang the moment we cannot turn off idiotic broken behavior like that.

We currently already disable 'strict-aliasing', 'strict-overflow' and 'delete-null-pointer-checks'. Because compilers that intentionally create garbage code are evil and wrong.

Compiler standards bodies that add even more of them (the C aliasing rules being the prime example) should be ashamed of themselves.

And compiler people that think it's ok to say "it's undefined, so we do anything we damn well like" shouldn't be working with compilers IMNSHO.

If you know something is undefined,  you warn about it. You don't silently generate random code that doesn't match the source code just because some paper standard says you "can".
Comment 13 ncm 2019-03-05 00:54:23 UTC
What I am getting is that the compiler is leaving that permitted optimization -- eliminating the inode check -- on the table. It is doing that not so much because it would make Linus angry, but as an artifact of the particular optimization processes used in Gcc at the moment. Clang, or some later release of Gcc or Clang, or even this Gcc under different circumstances, might choose differently.

But maybe there are some flavors of UB, among which returning uninitialized variables might be the poster child, that you don't ever want to use to drive some kinds of optimizations. Maybe Gcc's process has that baked in.