Consider the testcase in PR54970, or the following one. int main () { int a[] = { 1, 2, 3 }; int *p = a + 2; int *q = a + 1; asm volatile ("nop"); *p += 10; asm volatile ("nop"); *q += 10; asm volatile ("nop"); __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a)); asm volatile ("nop"); *p += 10; asm volatile ("nop"); *q += 10; asm volatile ("nop"); return 0; } For a[1] and a[2] replacements are created and we eventually get corret debug info, tracking the value changes of those array elements. But for a[0] SRA decides not to create a replacement, as it seems to be only assigned and never read (in PR54970 testcase just once, in the above testcase twice). Would it be possible for MAY_HAVE_DEBUG_STMTS to create also replacements for those fields/elements that are only ever assigned, and at the places they used to be assigned instead of setting the replacement to the value it should have create a debug bind stmt? In the above testcase that would be at the place of former a[0] = 1; add DEBUG a$0 => 1 stmt (where a$0's DECL_DEBUG_EXPR would be a[0]), and at the point of the MEM[&a, 0] = MEM[something, 0] assignment DEBUG a$0 => 4 ?
From quick skimming of tree-sra.c, I'd say we could add another bool flag like grp_to_be_replaced (say grp_to_be_debug_replaced), and in the else block of if (allow_replacements && scalar && !root->first_child && (root->grp_hint || ((root->grp_scalar_read || root->grp_assignment_read) && (root->grp_scalar_write || root->grp_assignment_write)))) do something like if (MAY_HAVE_DEBUG_STMTS && allow_replacements && scalar && !root->first_child && (root->grp_scalar_write || root->grp_assignment_write)) root->grp_to_be_debug_replaced = 1; (in addition to what the else block already does) and then in the actual modification of stmts for grp_to_be_debug_replaced prepend a debug stmt before the statement (or in the middle of statements). For the two stmts that modify such grp_to_be_debug_replaced = 1; access, a[0] = 1; and MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})&D.1719]; we would have: DEBUG a$0 => 1 // Newly added stmt a[0] = 1; and SR.2_14 = 4; # DEBUG SR.2 => SR.2_14 SR.3_13 = 5; # DEBUG SR.3 => SR.3_13 SR.4_12 = 6; # DEBUG SR.4 => SR.4_12 # DEBUG a$0 => SR.2_14 // Newly added stmt a$1_9 = SR.3_13; # DEBUG a$1 => a$1_9 a$2_4 = SR.4_12; # DEBUG a$2 => a$2_4 Martin, is that possible, does it make sense and where are all the places in the sra_modify_* etc. code that would need to be tweaked?
I already have a work-in-progress patch based on your suggestions that works for the testcase but need to think a bit more about less obvious cases that might happen. However, I have to leave the office now and so will continue tomorrow. I will probably have some questions about what I can put to the debug statements. To answer your question, apart from the change in analyze_access_subtree you described, all code checking the grp_to_be_replaced flag that might be looking at a LHS has to be extended to look at this flag too. But as I said, I already have a WIP patch. Thanks.
Created attachment 28493 [details] Untested patch I'm currently bootstrapping and testing this patch.
Unfortunately, the patch causes -fcompare-debug issues. The problem is that with it we create some declarations only when producing debug info which can affect UIDs which then changes other stuff. I tried producing DEBUG_EXPR_DECLs instead of regular decls but SET_DECL_DEBUG_EXPR does not accept DEBUG_EXPR_DECLs as an argument, it insists on VAR_DECLs. I tried removing the checking restriction but it turned out that using DEBUG_EXPR_DECLs does not fix the testcase. Jakub, do you think that can be fixed or are DEBUG_EXPR_DECLs a completely different thing that principally canno be used for this purpose? Another alternative is to build full-fledged replacement decls even when not producing debug info, even though we'd never use it. That seems slightly wasteful but can be done. (Some reworking of replacement building would be probably required so that we get rid of lazy replacement building which is no longer necessary).
Can you say what -fcompare-debug failures you saw (or was it a bootstrap problem already)? Generally, differences in DECL_UIDs between -g and -g0 should be ok as long as the decls corresponding to decls built at -g0 sort by DECL_UID the same with -g0 and -g, and there should be no differences in between SSA_NAME_VERSION values between -g0 and -g.
(In reply to comment #5) > Can you say what -fcompare-debug failures you saw (or was it a bootstrap > problem already)? Bootstrap actually passes. I's gcc.dg/pr46571.c that fails. If I just emit the decl without using it, it passes too.
Created attachment 28510 [details] gcc48-pr54971-incremental.patch Incremental patch that makes the pr46571.c testcase pass. The primary problem was passing non-NULL prefix to create_tmp_var* for something that is called solely if MAY_HAVE_DEBUG_STMTS, because create_tmp_var_name uses a global counter for all temp variables, thus if it is incremented with -g and not with -g0, it resulted e.g. in ivopts creating different var names, which, unlike DECL_UIDs, should be the same. I guess the SR.<NUM> names are still useful for debugging of SRA and later passes (in the unlikely case where a fancy name isn't assigned, if there is fancy name, there is no point in creating a SR.* name), but if we do that, it is better to create it only for replacements used in non-debug code (which is why I've moved the assignment of DECL_NAME if it doesn't have fancy name). And I've moved also the dump_file printout, because otherwise it would be confusing if we printed we've created a D.12345 replacement and subsequently immediately used SR.123 for it's name instead. I think we shouldn't be using create_tmp_var at all for the debug only replacements, as it calls gimple_add_tmp_var and preferrably the local decls should be the same between -g and -g0 too.
With your patch and my incremental patch on top of it bootstrap/regtest passed on both x86_64-linux and i686-linux btw.
Thanks a lot for looking into the miscompare, I simplified your fix a little, though. I'd prefer to have get_access_replacement small and avoid the extra parameter, we can use the grp_to_be_debug_replaced instead. I also omitted the DECL_SEEN_IN_BIND_EXPR_P test because I though it just might never be set on a newly created replacement decl. The patch is at http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02396.html
Author: jamborm Date: Fri Oct 26 16:13:00 2012 New Revision: 192848 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192848 Log: 2012-10-26 Martin Jambor <mjambor@suse.cz> PR debug/54971 * tree-sra.c (struct access): New flag grp_to_be_debug_replaced. (dump_access): Dump the new flag. (analyze_access_subtree): Set the new flag when appropriate. (create_access_replacement): Handle debug replacements differently. (generate_subtree_copies): Handle the grp_to_be_debug_replaced flag. (init_subtree_with_zero): Likewise. (sra_modify_expr): Likewise. (load_assign_lhs_subreplacements): Likewise. (sra_modify_assign): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-sra.c
Author: jakub Date: Fri Oct 26 19:19:25 2012 New Revision: 192860 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192860 Log: PR debug/54970 * cfgexpand.c (expand_debug_expr): Expand &MEM_REF[&var, n] as DEBUG_IMPLICIT_PTR + n if &var expands to DEBUG_IMPLICIT_PTR. * tree-sra.c (create_access_replacement): Allow also MEM_REFs with ADDR_EXPR first operand in DECL_DEBUG_EXPR expressions. * var-tracking.c (track_expr_p): Handle MEM_REFs in DECL_DEBUG_EXPR expressions. * dwarf2out.c (add_var_loc_to_decl): Likewise. PR debug/54971 * gcc.dg/guality/pr54970.c: New test. Added: trunk/gcc/testsuite/gcc.dg/guality/pr54970.c Modified: trunk/gcc/ChangeLog trunk/gcc/cfgexpand.c trunk/gcc/dwarf2out.c trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-sra.c trunk/gcc/var-tracking.c
I've checked in a testcase for this, unfortunately it fails on i686-linux with -Os -m32. The problem is that for a[0] there is no struct access created at all, as before esra we have a accesses solely of the form: a = *.LC0; ... _5 = MEM[(int *)&a + 8B]; ... MEM[(int *)&a + 8B] = _6; ... _8 = MEM[(int *)&a + 4B]; ... MEM[(int *)&a + 4B] = _9; ... D.1370 = *.LC1; MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})&D.1370]; ... _13 = MEM[(int *)&a + 8B]; ... MEM[(int *)&a + 8B] = _14; ... _16 = MEM[(int *)&a + 4B]; ... MEM[(int *)&a + 4B] = _17; ... a ={v} {CLOBBER}; i.e. there is no a[0] or MEM[(int *)&a] access, only two whole aggregate assignments. I wonder if SRA could create the the extra hole accesses if there are any whole aggregate writes, at least if there aren't too many (preferrably with sizes based on the underlying type fields/elements, and not for any padding). BTW, even for a[1] and a[2] at -Os -m32 there is a problem, on the memcpy we end up with: D.1370 = *.LC1; MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})&D.1370]; a$4_3 = MEM[(int[3] *)&D.1370 + 4B]; # DEBUG a$4 => a$4_3 a$8_4 = MEM[(int[3] *)&D.1370 + 8B]; # DEBUG a$8 => a$8_4
So, beyond the creation of new debug only accesses for whole struct writes into hole if there aren't too many holes, I wonder if SRA doesn't have infrastructure to do aggregate assignment propagation (which could help with the rest of the -Os -m32 issues on the committed testcase, but even for code generation on say): struct A { int a, b, c, d, e, f, g, h; } z; struct B { struct A a, b, c, d, e, f, g, h; } x, y; void foo (void) { struct A a = { 1, 2, 3, 4, 5, 6, 7, 8 }; struct A b = a; struct A c = b; struct A d = c; struct A e = d; z = e; } void bar (void) { struct B a = x; struct B b = a; struct B c = b; struct B d = c; struct B e = d; y = e; } Here, with -Os both routines result in terrible inefficient code, as total scalarization is not performed and even for these simple cases where there is one whole aggregate store and one whole aggregate read that is dominated by the store SRA (nor any other optimization pass, but IMHO SRA has best infrastructure for that) doesn't attempt to optimize by doing just y = x; (and b = x; c = x; d = x; e = x; that would be DCEd away). With -O2 only the second routine generates terrible code. While this testcase is artificial, the checked in testcase shows at least one level of extra aggregate assignment happens e.g. with compound literals.
I think it has something like that, but it only is effective if there is any scalarization and the intermediate copies are turned into dead code this way. I still think we should write aggregates only accessed whole (and not address taken) into SSA form ;)
The intermediate copies should be DSE-able, as shown when the #c14 testcases are changed to have a resp. x on all RHSs.
Author: jakub Date: Mon Nov 5 14:36:47 2012 New Revision: 193162 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193162 Log: PR debug/54970 PR debug/54971 * gcc.dg/guality/pr54970.c: Use NOP instead of "NOP" in inline-asm. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/guality/pr54970.c
(In reply to comment #13) > So, beyond the creation of new debug only accesses for whole struct writes into > hole if there aren't too many holes, I wonder if SRA doesn't have > infrastructure to do aggregate assignment propagation (which could help with > the rest of the -Os -m32 issues on the committed testcase, but even for code > generation on say): No, it does not have any infrastructure for that, it looks at each statement in isolation and e.g. at the moment it has no way of knowing that the value of b is the same as value of a. The propagation-like effect it can achieve is only done by always splitting small non bit-field structures into pieces and let the SSA propagation work on them. One issue can be that we do not even attempt that with arrays (but we are unlikely to scalarize them away because they are usually indexed by a variable). We need an aggregate copy propagation or extend SRA significantly to become one (or SSA-ize some aggregates as Richi suggested but that would not work on partially accessed structures). I'll revisit my thoughts about this. I'll try to come up with some solution for the -Os problem though I'm afraid it will be a bit limited (I don't think I will even attempt unions or bit-fields, for example).