Bug 78685 - -Og generates too many "<optimized out>"s
Summary: -Og generates too many "<optimized out>"s
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: debug (show other bugs)
Version: 6.2.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL: https://gcc.gnu.org/ml/gcc-patches/20...
Keywords:
: 68836 85059 86582 (view as bug list)
Depends on:
Blocks: ohgee
  Show dependency treegraph
 
Reported: 2016-12-05 17:04 UTC by Paul Eggert
Modified: 2019-11-04 07:23 UTC (History)
6 users (show)

See Also:
Host: x86-64
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-12-06 00:00:00


Attachments
preprocessed C program illustrating -Og problem (175 bytes, text/plain)
2016-12-05 17:04 UTC, Paul Eggert
Details
proof of concept patch (1.61 KB, patch)
2018-06-28 12:05 UTC, Tom de Vries
Details | Diff
[debug] Add -fkeep-vars-live (4.26 KB, patch)
2018-07-02 10:35 UTC, Tom de Vries
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Eggert 2016-12-05 17:04:17 UTC
Created attachment 40253 [details]
preprocessed C program illustrating -Og problem

Emacs developers are having trouble using -Og to debug Emacs, and so are still using -O0; see, for example:

http://lists.gnu.org/archive/html/emacs-devel/2016-12/msg00199.html

I reproduced the problem and came up with a small program (attached) that illustrates it. Here's a sample transcript on Fedora 24 x86-64, which uses gcc (GCC) 6.2.1 20160916 (Red Hat 6.2.1-2):

$ gcc -Og -g3 Ogbug.i
$ gdb a.out
GNU gdb (GDB) Fedora 7.11.1-86.fc24
...
Reading symbols from a.out...done.
(gdb) b call_debugger
Breakpoint 1 at 0x4004d6: file Ogbug.i, line 6.
(gdb) r
Starting program: /home/eggert/junk/a.out 

Breakpoint 1, call_debugger (x=3) at Ogbug.i:6
6	  v = x;
(gdb) bt
#0  call_debugger (x=3) at Ogbug.i:6
#1  0x0000000000400519 in apply_lambda (fun=1, args=2, count=<optimized out>)
    at Ogbug.i:14
#2  0x0000000000400547 in main (argc=<optimized out>, argv=<optimized out>)
    at Ogbug.i:22


It's hard to debug a program with all those "<optimized out>"s getting in the way.
Comment 1 Richard Biener 2016-12-06 11:00:35 UTC
I think it shows that the call-site-parameter stuff doesn't work across one
indirection.  In call_debugger we want count at the call site but that's
only available at its call site it seems.

For main() we're unlucky with the register allocation and as the original
args are not spilled we're lost (and libc or the crtstuff doesn't have any
call-site param stuff - not sure if that would help in this case).

So yes, -Og does optimize.

Confirmed (for the testcase).
Comment 2 Jakub Jelinek 2016-12-06 11:30:31 UTC
If -Og does not force user variables/parameters to stack nor forces an artificial use of the var at the end of their scope, then it will always happen, the rest is purely best effort, see if the var value can be computed from something else and punt if it can't.  The problem is mainly register allocation, if the var is seen by RA as dead after certain point, then the RA doesn't try to keep it alive in some register or memory.
In particular on this testcase, DW_OP_GNU_entry_value is emitted (as last resort) for the count variable, and similarly for argc in main (again, it isn't used after the function call).
So, in this particular case it might help if glibc tried to provide better debug info for __libc_start_main and/or _start (if the argc and/or argv values can be easily computed; though, e.g. computing argc from argv array might not be correct, because main could have changed it).
For the potential -Og change to be more useful, it would mean adding some artificial stmts at the end of scope of user vars (e.g. where we poison vars for -fsanitize-user-after-scope) that would not expand into a real assembly insn(s), but something like (use (var)) and would convince the RA to keep them alive somewhere.
Comment 3 rguenther@suse.de 2016-12-06 11:38:52 UTC
On Tue, 6 Dec 2016, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78685
> 
> --- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> If -Og does not force user variables/parameters to stack nor forces an
> artificial use of the var at the end of their scope, then it will always
> happen, the rest is purely best effort, see if the var value can be computed
> from something else and punt if it can't.  The problem is mainly register
> allocation, if the var is seen by RA as dead after certain point, then the RA
> doesn't try to keep it alive in some register or memory.
> In particular on this testcase, DW_OP_GNU_entry_value is emitted (as last
> resort) for the count variable, and similarly for argc in main (again, it isn't
> used after the function call).
> So, in this particular case it might help if glibc tried to provide better
> debug info for __libc_start_main and/or _start (if the argc and/or argv values
> can be easily computed; though, e.g. computing argc from argv array might not
> be correct, because main could have changed it).
> For the potential -Og change to be more useful, it would mean adding some
> artificial stmts at the end of scope of user vars (e.g. where we poison vars
> for -fsanitize-user-after-scope) that would not expand into a real assembly
> insn(s), but something like (use (var)) and would convince the RA to keep them
> alive somewhere.

I guess that wouldn't stop at calls but if we want to avoid <optimized 
out> for locals we could add artificial uses for all vars where they
go out of scope...

Sth orthogonal to -Og, -fkeep-vars-live=N with some level, default
to N > 0 for -Og maybe.

Of course it will likely pessimize code as I don't see how we can
easily compute whether var-tracking might reverse compute a vars
value from sth else.
Comment 4 Jakub Jelinek 2016-12-06 11:44:52 UTC
(In reply to rguenther@suse.de from comment #3)
> Sth orthogonal to -Og, -fkeep-vars-live=N with some level, default
> to N > 0 for -Og maybe.
> 
> Of course it will likely pessimize code as I don't see how we can
> easily compute whether var-tracking might reverse compute a vars
> value from sth else.

Yes, trying to guess whether var-tracking will be able to figure something out is pretty much impossible, perhaps except for easiest cases e.g. where one var is some other user var + const or something similar.  And sure, it would pessimize code a lot.
Comment 5 Paul Eggert 2016-12-21 18:25:34 UTC
Just to clarify: 'main' (in the sample program) is just an example. The problems developers are seeing when debugging Emacs almost all involve functions other than 'main'.

It should be OK for -Og to optimize significantly less than it does now, so long as -Og remains better than -O0. As things stand, -Og is pretty much useless for its stated purpose because GDB so often cannot display values of locals, and I expect this partly explains why -Og is so rarely used in practice.

In Emacs, developers use -O0 for debugging, but this can be reeeaally slow because -O0 does not inline and Emacs relies heavily on small inlined functions. Although Emacs works around this problem by using macros instead of functions, such workarounds have obvious drawbacks. For Emacs, it would be nice if -Og did not discard locals, but continued to inline.
Comment 6 rguenther@suse.de 2017-01-04 09:15:20 UTC
On Wed, 21 Dec 2016, eggert at gnu dot org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78685
> 
> --- Comment #5 from Paul Eggert <eggert at gnu dot org> ---
> Just to clarify: 'main' (in the sample program) is just an example. The
> problems developers are seeing when debugging Emacs almost all involve
> functions other than 'main'.
> 
> It should be OK for -Og to optimize significantly less than it does now, so
> long as -Og remains better than -O0. As things stand, -Og is pretty much
> useless for its stated purpose because GDB so often cannot display values of
> locals, and I expect this partly explains why -Og is so rarely used in
> practice.
> 
> In Emacs, developers use -O0 for debugging, but this can be reeeaally slow
> because -O0 does not inline and Emacs relies heavily on small inlined
> functions. Although Emacs works around this problem by using macros instead of
> functions, such workarounds have obvious drawbacks. For Emacs, it would be nice
> if -Og did not discard locals, but continued to inline.

It doesn't really discard them, it just doesn't preserve their values 
at points they are no longer necessary...

I presume you really want not only the inlining to happen but also
at least _some_ of the optimization triggered by it as well (for Emacs).
This is somewhat in conflict with the desire to preserve every value
of every variable at any point in time during their lifetime ...

But yes, for the testcase I can see what you expect and the expectation
is of course reasonable, it's just somewhat hard to implement.  I guess
making every user variable set an association barrier might be a start.
Comment 7 Jakub Jelinek 2018-03-24 11:35:49 UTC
*** Bug 85059 has been marked as a duplicate of this bug. ***
Comment 8 Tom de Vries 2018-06-28 12:05:25 UTC
Created attachment 44333 [details]
proof of concept patch

I ran into the same problem with guality test-case pr54200.c, which fails for Og.

The relevant part of the test-case is:
...
int __attribute__((noinline,noclone))
foo (int z, int x, int b)
{
  if (x == 1)
    {
      bar ();
      return z;
    }
  else
    {
      int a = (x + z) + b;
      return a; /* { dg-final { gdb-test 20 "z" "3" } } */
    }
}
...

The problem is that the '(x + z) + b' calculation has a temporary register which  gets allocated the register that holds 'z', so when we get to the gdb-test line, z is no longer available.

Using this patch I managed to print the correct value of z at the gdb-test line.

The patch uses clobbers in gimple to mark the out-of-scope point, purely because that's similar to what was already done for local array variables, and I thought that was the fastest path to getting a proof of concept working. It's more accurate to model this as some sort of use in gimple, and doing so may prevent gimple optimizations which wreck debug info, but perhaps that's not necessary, I suppose that depends on which optimizations are enabled in Og.

Anyway, at expand we emit a use for the clobber which seems to do the trick.
Comment 9 Eric Gallager 2018-06-28 14:46:36 UTC
(In reply to Tom de Vries from comment #8)
> Created attachment 44333 [details]
> proof of concept patch
> 
> I ran into the same problem with guality test-case pr54200.c, which fails
> for Og.
> 
> The relevant part of the test-case is:
> ...
> int __attribute__((noinline,noclone))
> foo (int z, int x, int b)
> {
>   if (x == 1)
>     {
>       bar ();
>       return z;
>     }
>   else
>     {
>       int a = (x + z) + b;
>       return a; /* { dg-final { gdb-test 20 "z" "3" } } */
>     }
> }
> ...
> 
> The problem is that the '(x + z) + b' calculation has a temporary register
> which  gets allocated the register that holds 'z', so when we get to the
> gdb-test line, z is no longer available.
> 
> Using this patch I managed to print the correct value of z at the gdb-test
> line.
> 
> The patch uses clobbers in gimple to mark the out-of-scope point, purely
> because that's similar to what was already done for local array variables,
> and I thought that was the fastest path to getting a proof of concept
> working. It's more accurate to model this as some sort of use in gimple, and
> doing so may prevent gimple optimizations which wreck debug info, but
> perhaps that's not necessary, I suppose that depends on which optimizations
> are enabled in Og.
> 
> Anyway, at expand we emit a use for the clobber which seems to do the trick.

Please submit the patch to the gcc-patches mailing list for review! Thanks.
Comment 10 Richard Biener 2018-06-29 11:10:44 UTC
(In reply to Tom de Vries from comment #8)
> Created attachment 44333 [details]
> proof of concept patch
> 
> I ran into the same problem with guality test-case pr54200.c, which fails
> for Og.
> 
> The relevant part of the test-case is:
> ...
> int __attribute__((noinline,noclone))
> foo (int z, int x, int b)
> {
>   if (x == 1)
>     {
>       bar ();
>       return z;
>     }
>   else
>     {
>       int a = (x + z) + b;
>       return a; /* { dg-final { gdb-test 20 "z" "3" } } */
>     }
> }
> ...
> 
> The problem is that the '(x + z) + b' calculation has a temporary register
> which  gets allocated the register that holds 'z', so when we get to the
> gdb-test line, z is no longer available.
> 
> Using this patch I managed to print the correct value of z at the gdb-test
> line.
> 
> The patch uses clobbers in gimple to mark the out-of-scope point, purely
> because that's similar to what was already done for local array variables,
> and I thought that was the fastest path to getting a proof of concept
> working. It's more accurate to model this as some sort of use in gimple, and
> doing so may prevent gimple optimizations which wreck debug info, but
> perhaps that's not necessary, I suppose that depends on which optimizations
> are enabled in Og.
> 
> Anyway, at expand we emit a use for the clobber which seems to do the trick.

An interesting idea but I think this will cause excessive spilling and
come at a compile-time cost.  The idea of -Og was to have performance
closer to -O1 here and with this we approach -O0 in assigning stack slots
to all user variables?

This is a general conflict of interest of course.

Your patch shouldn't prevent any optimization at the GIMPLE level
(like completely eliding local variables) but definitely they'll keep
things live at RTL level.  Given for the testcase at hand the issue
is "suboptimal" choice of register allocation I wonder if we can
adjust the RA / DF to consider liveness to extend always up to the
next sequence point.

That is, it is reasonable to lose track of z in

  int a = (x + z) + b;

but only after x + z is computed.  So a different approach would be
to see this as an issue with the way debug consumers "step" now that
we emit column debug information?

That said, starting with GCC 8 we now have # DEBUG BEGIN_STMT
markers:

  <bb 4> [local count: 856416479]:
  # DEBUG BEGIN_STMT
  _1 = x_4(D) + z_7(D);
  a_9 = _1 + b_8(D);
  # DEBUG a => a_9
  # DEBUG BEGIN_STMT

they are also present on RTL as

(debug_insn 21 20 0 (debug_marker) "t.c":11 -1
     (nil))

which means we _could_ somehow use those for code-generation (in DF analysis
to extend lifetimes somehow).  Of course if we do we have to emit those
always to not get code-gen differences -g vs. -g0.

Maybe the RA could also be generally changed to use a not-LRU style
preferencing of available registers to choose from.

In the end nothing saves us from "spilling" things if we really want to
be able to access all final values of registers up to the point they
go out of scope...

So another idea would be instead of like -O0 assigning the main location
to the stack we spill no longer used vars during LRA?  We'd still have to
somehow know until when that spill slot needs to live (and afterwards can
be re-used).
Comment 11 Fredrik Tolf 2018-06-29 14:14:49 UTC
(In reply to Richard Biener from comment #10)
> That is, it is reasonable to lose track of z in
> 
>   int a = (x + z) + b;
> 
> but only after x + z is computed.

Just for the record, I disagree that it's okay to lose z after that. As I explained in #85059, not having access to variable values after they've been consumed is one of the absolute main reasons I never use -Og.
Comment 12 Tom de Vries 2018-07-02 10:35:47 UTC
Created attachment 44343 [details]
[debug] Add -fkeep-vars-live

(In reply to Richard Biener from comment #10)
> (In reply to Tom de Vries from comment #8)
> > Created attachment 44333 [details]
> > proof of concept patch
> > 
> > I ran into the same problem with guality test-case pr54200.c, which fails
> > for Og.
> > 
> > The relevant part of the test-case is:
> > ...
> > int __attribute__((noinline,noclone))
> > foo (int z, int x, int b)
> > {
> >   if (x == 1)
> >     {
> >       bar ();
> >       return z;
> >     }
> >   else
> >     {
> >       int a = (x + z) + b;
> >       return a; /* { dg-final { gdb-test 20 "z" "3" } } */
> >     }
> > }
> > ...
> > 
> > The problem is that the '(x + z) + b' calculation has a temporary register
> > which  gets allocated the register that holds 'z', so when we get to the
> > gdb-test line, z is no longer available.
> > 
> > Using this patch I managed to print the correct value of z at the gdb-test
> > line.
> > 
> > The patch uses clobbers in gimple to mark the out-of-scope point, purely
> > because that's similar to what was already done for local array variables,
> > and I thought that was the fastest path to getting a proof of concept
> > working. It's more accurate to model this as some sort of use in gimple, and
> > doing so may prevent gimple optimizations which wreck debug info, but
> > perhaps that's not necessary, I suppose that depends on which optimizations
> > are enabled in Og.
> > 
> > Anyway, at expand we emit a use for the clobber which seems to do the trick.
> 
> An interesting idea but I think this will cause excessive spilling and
> come at a compile-time cost.  The idea of -Og was to have performance
> closer to -O1 here and with this we approach -O0 in assigning stack slots
> to all user variables?
> 

Agreed, there will be a compile time cost. I've moved the functionality to an orthogonal option -fkeep-vars-live (as suggested in comment 3), to keep Og as is.  Also, I agree there will be more spilling, but I'd hope that memory usage still would be less than for O0.

> This is a general conflict of interest of course.
> 
> Your patch shouldn't prevent any optimization at the GIMPLE level
> (like completely eliding local variables) 

I've fixed that now, in the sense that it generates uses rather than clobbers at gimple level.

> but definitely they'll keep
> things live at RTL level.  Given for the testcase at hand the issue
> is "suboptimal" choice of register allocation I wonder if we can
> adjust the RA / DF to consider liveness to extend always up to the
> next sequence point.
> 
> That is, it is reasonable to lose track of z in
> 
>   int a = (x + z) + b;
> 
> but only after x + z is computed.

Right, that could maybe be a way to improve on Og.

But the trade-off I'm going for currently with "Og -fkeep-vars-live" is: O0-like debug experience, slower compilation than Og, faster execution than -O0.

> So a different approach would be
> to see this as an issue with the way debug consumers "step" now that
> we emit column debug information?
> 
> That said, starting with GCC 8 we now have # DEBUG BEGIN_STMT
> markers:
> 
>   <bb 4> [local count: 856416479]:
>   # DEBUG BEGIN_STMT
>   _1 = x_4(D) + z_7(D);
>   a_9 = _1 + b_8(D);
>   # DEBUG a => a_9
>   # DEBUG BEGIN_STMT
> 
> they are also present on RTL as
> 
> (debug_insn 21 20 0 (debug_marker) "t.c":11 -1
>      (nil))
> 
> which means we _could_ somehow use those for code-generation (in DF analysis
> to extend lifetimes somehow).  Of course if we do we have to emit those
> always to not get code-gen differences -g vs. -g0.
> 
> Maybe the RA could also be generally changed to use a not-LRU style
> preferencing of available registers to choose from.
> 
> In the end nothing saves us from "spilling" things if we really want to
> be able to access all final values of registers up to the point they
> go out of scope...
> 
> So another idea would be instead of like -O0 assigning the main location
> to the stack we spill no longer used vars during LRA?  We'd still have to
> somehow know until when that spill slot needs to live (and afterwards can
> be re-used).

To stay in terms of the current patch, that could mean we mark artificial uses as such, and then make the compiler handle them smarter than regular uses.

[ Also, this version of the patch adds insertion of nops to prevent empty debug ranges, which allows me to get the value of 'a' at the 'return a'  mentioned above. ]
Comment 13 Tom de Vries 2018-07-10 12:23:08 UTC
(In reply to Tom de Vries from comment #12)
> Created attachment 44343 [details]

> [debug] Add fkeep-vars-live

> Guality testing status: Og -fkeep-vars-live gives better results than Og for
> pr54200.c and pr54970.c, but worse results here:
> ...
> FAIL: gcc.dg/guality/csttest.c  -Og -fkeep-vars-live -DPREVENT_OPTIMIZATION  \
>       line 29 n == -(0x17ULL << 50)
> FAIL: gcc.dg/guality/csttest.c  -Og -fkeep-vars-live -DPREVENT_OPTIMIZATION  \
>       line 55 j == -(0x17ULL << 31)
> ...

Looks like an independent problem, filed as PR86455 - "var-tracking mishandles pre_dec".
Comment 14 Tom de Vries 2018-07-19 08:08:06 UTC
*** Bug 86582 has been marked as a duplicate of this bug. ***
Comment 15 Eric Gallager 2019-01-20 01:12:53 UTC
(In reply to Tom de Vries from comment #13)
> (In reply to Tom de Vries from comment #12)
> > Created attachment 44343 [details]
> 
> > [debug] Add fkeep-vars-live
> 
> > Guality testing status: Og -fkeep-vars-live gives better results than Og for
> > pr54200.c and pr54970.c, but worse results here:
> > ...
> > FAIL: gcc.dg/guality/csttest.c  -Og -fkeep-vars-live -DPREVENT_OPTIMIZATION  \
> >       line 29 n == -(0x17ULL << 50)
> > FAIL: gcc.dg/guality/csttest.c  -Og -fkeep-vars-live -DPREVENT_OPTIMIZATION  \
> >       line 55 j == -(0x17ULL << 31)
> > ...
> 
> Looks like an independent problem, filed as PR86455 - "var-tracking
> mishandles pre_dec".

That's been fixed now... time to give this one another try?
Comment 16 Eric Gallager 2019-03-13 05:52:33 UTC
*** Bug 68836 has been marked as a duplicate of this bug. ***
Comment 17 Eric Gallager 2019-07-04 04:36:33 UTC
Richard Sandiford had a series of patches radically overhauling how -Og works in general that might affect this bug:
https://gcc.gnu.org/ml/gcc-patches/2019-06/msg01392.html
cc-ing him so he can comment on if it does in fact affect this bug.
Comment 18 rsandifo@gcc.gnu.org 2019-07-06 10:04:57 UTC
(In reply to Eric Gallager from comment #17)
> Richard Sandiford had a series of patches radically overhauling how -Og
> works in general that might affect this bug:
> https://gcc.gnu.org/ml/gcc-patches/2019-06/msg01392.html
> cc-ing him so he can comment on if it does in fact affect this bug.

Yeah, part 2 of the series fixes this PR:
https://gcc.gnu.org/ml/gcc-patches/2019-06/msg01394.html