Bug 54971 - SRA pessimizes debug info by not creating debug stmts for fields without replacements
Summary: SRA pessimizes debug info by not creating debug stmts for fields without repl...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: debug (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords: wrong-debug
Depends on:
Blocks:
 
Reported: 2012-10-18 09:48 UTC by Jakub Jelinek
Modified: 2021-08-16 05:25 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-10-27 00:00:00


Attachments
Untested patch (2.80 KB, patch)
2012-10-19 16:01 UTC, Martin Jambor
Details | Diff
gcc48-pr54971-incremental.patch (2.02 KB, patch)
2012-10-22 16:19 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2012-10-18 09:48:05 UTC
Consider the testcase in PR54970, or the following one.

int
main ()
{
  int a[] = { 1, 2, 3 };
  int *p = a + 2;
  int *q = a + 1;
  asm volatile ("nop");
  *p += 10;
  asm volatile ("nop");
  *q += 10;
  asm volatile ("nop");
  __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
  asm volatile ("nop");
  *p += 10;
  asm volatile ("nop");
  *q += 10;
  asm volatile ("nop");
  return 0;
}

For a[1] and a[2] replacements are created and we eventually get corret debug info, tracking the value changes of those array elements.
But for a[0] SRA decides not to create a replacement, as it seems to be only assigned and never read (in PR54970 testcase just once, in the above testcase twice).
Would it be possible for MAY_HAVE_DEBUG_STMTS to create also replacements for those fields/elements that are only ever assigned, and at the places they used to be assigned instead of setting the replacement to the value it should have
create a debug bind stmt?  In the above testcase that would be at the place
of former a[0] = 1; add DEBUG a$0 => 1 stmt (where a$0's DECL_DEBUG_EXPR would be a[0]), and at the point of the MEM[&a, 0] = MEM[something, 0] assignment
DEBUG a$0 => 4 ?
Comment 1 Jakub Jelinek 2012-10-18 13:35:05 UTC
From quick skimming of tree-sra.c, I'd say we could add another bool flag like grp_to_be_replaced (say grp_to_be_debug_replaced), and in the else block of
  if (allow_replacements && scalar && !root->first_child
      && (root->grp_hint
          || ((root->grp_scalar_read || root->grp_assignment_read)
              && (root->grp_scalar_write || root->grp_assignment_write))))
do something like
    if (MAY_HAVE_DEBUG_STMTS && allow_replacements
        && scalar && !root->first_child
        && (root->grp_scalar_write || root->grp_assignment_write))
      root->grp_to_be_debug_replaced = 1;
(in addition to what the else block already does) and then in the actual modification of stmts for grp_to_be_debug_replaced prepend a debug stmt before the statement (or in the middle of statements).

For the two stmts that modify such grp_to_be_debug_replaced = 1; access,
  a[0] = 1;
and
  MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})&D.1719];
we would have:
  DEBUG a$0 => 1         // Newly added stmt
  a[0] = 1;
and
  SR.2_14 = 4;
  # DEBUG SR.2 => SR.2_14
  SR.3_13 = 5;
  # DEBUG SR.3 => SR.3_13
  SR.4_12 = 6;
  # DEBUG SR.4 => SR.4_12
  # DEBUG a$0 => SR.2_14           // Newly added stmt
  a$1_9 = SR.3_13;
  # DEBUG a$1 => a$1_9
  a$2_4 = SR.4_12;
  # DEBUG a$2 => a$2_4

Martin, is that possible, does it make sense and where are all the places in the sra_modify_* etc. code that would need to be tweaked?
Comment 2 Martin Jambor 2012-10-18 17:37:25 UTC
I already have a work-in-progress patch based on your suggestions that
works for the testcase but need to think a bit more about less obvious
cases that might happen.  However, I have to leave the office now and
so will continue tomorrow.  I will probably have some questions about
what I can put to the debug statements.

To answer your question, apart from the change in
analyze_access_subtree you described, all code checking the
grp_to_be_replaced flag that might be looking at a LHS has to be
extended to look at this flag too.  But as I said, I already have a
WIP patch.

Thanks.
Comment 3 Martin Jambor 2012-10-19 16:01:20 UTC
Created attachment 28493 [details]
Untested patch

I'm currently bootstrapping and testing this patch.
Comment 4 Martin Jambor 2012-10-22 14:55:35 UTC
Unfortunately, the patch causes -fcompare-debug issues.  The problem
is that with it we create some declarations only when producing debug
info which can affect UIDs which then changes other stuff.

I tried producing DEBUG_EXPR_DECLs instead of regular decls but
SET_DECL_DEBUG_EXPR does not accept DEBUG_EXPR_DECLs as an argument,
it insists on VAR_DECLs.  I tried removing the checking restriction
but it turned out that using DEBUG_EXPR_DECLs does not fix the
testcase.  Jakub, do you think that can be fixed or are
DEBUG_EXPR_DECLs a completely different thing that principally canno
be used for this purpose?

Another alternative is to build full-fledged replacement decls even
when not producing debug info, even though we'd never use it.  That
seems slightly wasteful but can be done.  (Some reworking of
replacement building would be probably required so that we get rid of
lazy replacement building which is no longer necessary).
Comment 5 Jakub Jelinek 2012-10-22 15:09:05 UTC
Can you say what -fcompare-debug failures you saw (or was it a bootstrap problem already)?
Generally, differences in DECL_UIDs between -g and -g0 should be ok as long as the decls corresponding to decls built at -g0 sort by DECL_UID the same with -g0 and -g, and there should be no differences in between SSA_NAME_VERSION values between -g0 and -g.
Comment 6 Martin Jambor 2012-10-22 15:14:13 UTC
(In reply to comment #5)
> Can you say what -fcompare-debug failures you saw (or was it a bootstrap
> problem already)?

Bootstrap actually passes. I's gcc.dg/pr46571.c that fails. If I just
emit the decl without using it, it passes too.
Comment 7 Jakub Jelinek 2012-10-22 16:19:18 UTC
Created attachment 28510 [details]
gcc48-pr54971-incremental.patch

Incremental patch that makes the pr46571.c testcase pass.
The primary problem was passing non-NULL prefix to create_tmp_var* for something
that is called solely if MAY_HAVE_DEBUG_STMTS, because create_tmp_var_name
uses a global counter for all temp variables, thus if it is incremented with -g
and not with -g0, it resulted e.g. in ivopts creating different var names,
which, unlike DECL_UIDs, should be the same.

I guess the SR.<NUM> names are still useful for debugging of SRA and later passes (in the unlikely case where a fancy name isn't assigned, if there is fancy name, there is no point in creating a SR.* name), but if we do that, it is better to create it only for replacements used in non-debug code (which is why I've moved the assignment of DECL_NAME if it doesn't have fancy name).
And I've moved also the dump_file printout, because otherwise it would be confusing if we printed we've created a D.12345 replacement and subsequently immediately used SR.123 for it's name instead.

I think we shouldn't be using create_tmp_var at all for the debug only
replacements, as it calls gimple_add_tmp_var and preferrably the local decls
should be the same between -g and -g0 too.
Comment 8 Jakub Jelinek 2012-10-22 19:59:44 UTC
With your patch and my incremental patch on top of it bootstrap/regtest
passed on both x86_64-linux and i686-linux btw.
Comment 9 Martin Jambor 2012-10-26 09:32:51 UTC
Thanks a lot for looking into the miscompare, I simplified your fix a
little, though.  I'd prefer to have get_access_replacement small and
avoid the extra parameter, we can use the grp_to_be_debug_replaced
instead.  I also omitted the DECL_SEEN_IN_BIND_EXPR_P test because I
though it just might never be set on a newly created replacement decl.

The patch is at
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02396.html
Comment 10 Martin Jambor 2012-10-26 16:13:08 UTC
Author: jamborm
Date: Fri Oct 26 16:13:00 2012
New Revision: 192848

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192848
Log:
2012-10-26  Martin Jambor  <mjambor@suse.cz>

	PR debug/54971
	* tree-sra.c (struct access): New flag grp_to_be_debug_replaced.
	(dump_access): Dump the new flag.
	(analyze_access_subtree): Set the new flag when appropriate.
	(create_access_replacement): Handle debug replacements differently.
	(generate_subtree_copies): Handle the grp_to_be_debug_replaced flag.
	(init_subtree_with_zero): Likewise.
	(sra_modify_expr): Likewise.
	(load_assign_lhs_subreplacements): Likewise.
	(sra_modify_assign): Likewise.



Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-sra.c
Comment 11 Jakub Jelinek 2012-10-26 19:19:30 UTC
Author: jakub
Date: Fri Oct 26 19:19:25 2012
New Revision: 192860

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192860
Log:
	PR debug/54970
	* cfgexpand.c (expand_debug_expr): Expand &MEM_REF[&var, n]
	as DEBUG_IMPLICIT_PTR + n if &var expands to DEBUG_IMPLICIT_PTR.
	* tree-sra.c (create_access_replacement): Allow also MEM_REFs
	with ADDR_EXPR first operand in DECL_DEBUG_EXPR expressions.
	* var-tracking.c (track_expr_p): Handle MEM_REFs in DECL_DEBUG_EXPR
	expressions.
	* dwarf2out.c (add_var_loc_to_decl): Likewise.

	PR debug/54971
	* gcc.dg/guality/pr54970.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/guality/pr54970.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c
    trunk/gcc/dwarf2out.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-sra.c
    trunk/gcc/var-tracking.c
Comment 12 Jakub Jelinek 2012-10-27 17:30:26 UTC
I've checked in a testcase for this, unfortunately it fails on i686-linux
with -Os -m32.
The problem is that for a[0] there is no struct access created at all, as before esra we have a accesses solely of the form:
  a = *.LC0;
...
  _5 = MEM[(int *)&a + 8B];
...
  MEM[(int *)&a + 8B] = _6;
...
  _8 = MEM[(int *)&a + 4B];
...
  MEM[(int *)&a + 4B] = _9;
...
  D.1370 = *.LC1;
  MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})&D.1370];
...
  _13 = MEM[(int *)&a + 8B];
...
  MEM[(int *)&a + 8B] = _14;
...
  _16 = MEM[(int *)&a + 4B];
...
  MEM[(int *)&a + 4B] = _17;
...
  a ={v} {CLOBBER};

i.e. there is no a[0] or MEM[(int *)&a] access, only two whole aggregate assignments.

I wonder if SRA could create the the extra hole accesses if there are any whole aggregate writes, at least if there aren't too many (preferrably with sizes based on the underlying type fields/elements, and not for any padding).
BTW, even for a[1] and a[2] at -Os -m32 there is a problem, on the memcpy we end up with:
  D.1370 = *.LC1;
  MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})&D.1370];
  a$4_3 = MEM[(int[3] *)&D.1370 + 4B];
  # DEBUG a$4 => a$4_3
  a$8_4 = MEM[(int[3] *)&D.1370 + 8B];
  # DEBUG a$8 => a$8_4
Comment 13 Jakub Jelinek 2012-10-29 08:36:03 UTC
So, beyond the creation of new debug only accesses for whole struct writes into hole if there aren't too many holes, I wonder if SRA doesn't have infrastructure to do aggregate assignment propagation (which could help with the rest of the -Os -m32 issues on the committed testcase, but even for code generation on say):
struct A { int a, b, c, d, e, f, g, h; } z;
struct B { struct A a, b, c, d, e, f, g, h; } x, y;

void
foo (void)
{
  struct A a = { 1, 2, 3, 4, 5, 6, 7, 8 };
  struct A b = a;
  struct A c = b;
  struct A d = c;
  struct A e = d;
  z = e;
}

void
bar (void)
{
  struct B a = x;
  struct B b = a;
  struct B c = b;
  struct B d = c;
  struct B e = d;
  y = e;
}

Here, with -Os both routines result in terrible inefficient code, as total scalarization is not performed and even for these simple cases where there is one whole aggregate store and one whole aggregate read that is dominated by the store SRA (nor any other optimization pass, but IMHO SRA has best infrastructure for that) doesn't attempt to optimize by doing just y = x; (and b = x; c = x; d = x; e = x; that would be DCEd away).  With -O2 only the second routine generates terrible code.  While this testcase is artificial, the checked in testcase shows at least one level of extra aggregate assignment happens e.g. with compound literals.
Comment 14 Richard Biener 2012-10-29 15:19:53 UTC
I think it has something like that, but it only is effective if there is any
scalarization and the intermediate copies are turned into dead code this way.

I still think we should write aggregates only accessed whole (and
not address taken) into SSA form ;)
Comment 15 Jakub Jelinek 2012-10-29 15:25:36 UTC
The intermediate copies should be DSE-able, as shown when the #c14 testcases are changed to have a resp. x on all RHSs.
Comment 16 Jakub Jelinek 2012-11-05 14:36:52 UTC
Author: jakub
Date: Mon Nov  5 14:36:47 2012
New Revision: 193162

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193162
Log:
	PR debug/54970
	PR debug/54971
	* gcc.dg/guality/pr54970.c: Use NOP instead of "NOP" in inline-asm.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/guality/pr54970.c
Comment 17 Martin Jambor 2013-01-04 14:41:08 UTC
(In reply to comment #13)
> So, beyond the creation of new debug only accesses for whole struct writes into
> hole if there aren't too many holes, I wonder if SRA doesn't have
> infrastructure to do aggregate assignment propagation (which could help with
> the rest of the -Os -m32 issues on the committed testcase, but even for code
> generation on say):

No, it does not have any infrastructure for that, it looks at each
statement in isolation and e.g. at the moment it has no way of knowing
that the value of b is the same as value of a.  The propagation-like
effect it can achieve is only done by always splitting small non
bit-field structures into pieces and let the SSA propagation work on
them.  One issue can be that we do not even attempt that with arrays
(but we are unlikely to scalarize them away because they are usually
indexed by a variable).

We need an aggregate copy propagation or extend SRA significantly to
become one (or SSA-ize some aggregates as Richi suggested but that
would not work on partially accessed structures).  I'll revisit my
thoughts about this.

I'll try to come up with some solution for the -Os problem though I'm
afraid it will be a bit limited (I don't think I will even attempt
unions or bit-fields, for example).