Bug 57878 - Incorrect code: live register clobbered in split2
Summary: Incorrect code: live register clobbered in split2
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.8.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-10 18:23 UTC by Easwaran Raman
Modified: 2017-09-19 10:23 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Preprocessed source of the test case (1.78 KB, text/plain)
2013-07-10 18:23 UTC, Easwaran Raman
Details
Disassembly of the compiled r.ii (1.28 KB, text/plain)
2013-07-10 18:24 UTC, Easwaran Raman
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Easwaran Raman 2013-07-10 18:23:07 UTC
Created attachment 30493 [details]
Preprocessed source of the test case

$ ./g++_4_8 --version
g++_4_8 (GCC) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.

(Compiling the attached preprocessed source)
$ ./g++_4_8 -m32 -O2 -fno-omit-frame-pointer -fPIC -std=gnu++11 -c r.ii
$ objdump -d --no-show-raw-insn r.o  > r.s

In the attached r.s, in function _ZNSt6vectorIN3FDB9ChunkDataESaIS1_EE19_M_emplace_back_auxIIRKS1_EEEvDpOT_, the sub instruction at address 0xfd writes to edx, which is subsequently stored at [rbp-0x24]. edx is immediately clobbered. At 0x119, rbp-0x24 is clobbered and hence the load at 0x168 loads an incorrect value into ecx. 

After reload, we see the following:

(insn 117 241 196 7 (parallel [
            (set (reg/f:SI 1 dx [orig:138 D.3282 ] [138])
                (minus:SI (reg/f:SI 1 dx [orig:138 D.3282 ] [138])
                    (reg/v/f:SI 5 di [orig:103 __cur ] [103])))
            (clobber (reg:CC 17 flags))
        ]) 309 {*subsi_1}
     (nil))
(insn 196 117 268 7 (set (mem/c:SI (plus:SI (reg/f:SI 6 bp)
                (const_int -36 [0xffffffffffffffdc])) [21 %sfp+-36 S4 A32])
        (reg/f:SI 1 dx [orig:138 D.3282 ] [138])) 89 {*movsi_internal}
     (expr_list:REG_DEAD (reg/f:SI 1 dx [orig:138 D.3282 ] [138])
        (nil)))
...
(insn 119 274 237 7 (set (reg:DI 0 ax [193])
        (mem:DI (plus:SI (reg/v/f:SI 0 ax [orig:109 __first ] [109])
                (const_int 4 [0x4])) [10 MEM[base: _1, index: _28, offset: 0]+0 S8 A64])) r.ii:197 88 {*movdi_internal}
     (expr_list:REG_DEAD (reg/v/f:SI 0 ax [orig:109 __first ] [109])
        (nil)))
...
(insn 234 120 200 7 (set (reg/f:SI 1 dx [orig:138 D.3282 ] [138])
        (mem/c:SI (plus:SI (reg/f:SI 6 bp)
                (const_int -36 [0xffffffffffffffdc])) [21 %sfp+-36 S4 A32])) r.ii:197 89 {*movsi_internal}
     (expr_list:REG_DEAD (reg/f:SI 138 [ D.3282 ])
        (nil)))
...
(insn 232 122 265 7 (set (mem/c:SI (plus:SI (reg/f:SI 6 bp)
                (const_int -36 [0xffffffffffffffdc])) [21 %sfp+-36 S4 A32])
        (reg/f:SI 1 dx [orig:138 D.3282 ] [138])) r.ii:197 89 {*movsi_internal}
     (nil))

The fill in 234 and the second spill in 232 are redundant. Insn 234 gets removed by ce3 later, but 234 remains till the end. Meanwhile, 119 writes to a DI mode register and gets split by split2 into eax and edx and hence the store in 232 ends up clobbering the right value. 


When compiled with a trunk built 2 weeks ago, the compiler ICEs with the following stacck trace:

r.ii: In member function ‘void std::vector<_Tp, _Alloc>::_M_emplace_back_aux(_Args&& ...) [with _Args = {const FDB::ChunkData&}; _Tp = FDB::ChunkData; _Alloc = std::allocator<FDB::ChunkData>]’:
r.ii:192:3: internal compiler error: in assign_by_spills, at lra-assigns.c:1266
   }
   ^
0x9ba360 assign_by_spills
	/usr/local/google/home/eraman/gcc/trunk/gcc/lra-assigns.c:1266
0x9bb013 lra_assign()
	/usr/local/google/home/eraman/gcc/trunk/gcc/lra-assigns.c:1423
0x9b71b9 lra(_IO_FILE*)
	/usr/local/google/home/eraman/gcc/trunk/gcc/lra.c:2342
0x97d7b8 do_reload
	/usr/local/google/home/eraman/gcc/trunk/gcc/ira.c:4689
0x97d7b8 rest_of_handle_reload
	/usr/local/google/home/eraman/gcc/trunk/gcc/ira.c:4801
Comment 1 Easwaran Raman 2013-07-10 18:24:00 UTC
Created attachment 30494 [details]
Disassembly of the compiled r.ii
Comment 2 Easwaran Raman 2013-07-12 19:56:48 UTC
After IRA, we have:

(insn 116 115 117 6 (set (reg:DI 130 [ D.3288 ])
        (mem:DI (plus:SI (reg/v/f:SI 172 [orig:109 __first ] [109])
                (const_int 4 [0x4])) [10 MEM[base: _1, index: _44, offset: 0]+0 S8 A64])) r.ii:197 88 {*movdi_internal}
     (expr_list:REG_EQUIV (mem:DI (plus:SI (reg/v/f:SI 172 [orig:109 __first ] [109])
                (const_int 4 [0x4])) [10 MEM[base: _1, index: _44, offset: 0]+0 S8 A64])
        (nil)))
(insn 117 116 118 6 (parallel [
            (set (reg/f:SI 138 [ D.3281 ])
                (minus:SI (reg/v/f:SI 173 [orig:110 __cur ] [110])
                    (reg/v/f:SI 103 [ __cur ])))
            (clobber (reg:CC 17 flags))
        ]) 309 {*subsi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn 118 117 119 6 (parallel [
            (set (reg/f:SI 140 [ D.3282 ])
                (plus:SI (reg/v/f:SI 103 [ __cur ])
                    (const_int 4 [0x4])))
            (clobber (reg:CC 17 flags))
        ]) 273 {*addsi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn 119 118 120 6 (set (mem:DI (plus:SI (reg/v/f:SI 173 [orig:110 __cur ] [110])
                (const_int 4 [0x4])) [10 MEM[base: _75, index: _77, offset: 0B]+0 S8 A64])
        (reg:DI 130 [ D.3288 ])) r.ii:197 88 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 130 [ D.3288 ])
        (nil)))
(insn 120 119 121 6 (set (reg:DI 131 [ D.3287 ])
        (mem:DI (plus:SI (plus:SI (reg/f:SI 99 [ D.3281 ])
                    (reg/f:SI 126 [ D.3282 ]))
                (const_int 8 [0x8])) [10 MEM[base: _1, index: _44, offset: 8]+0 S8 A64])) r.ii:197 88 {*movdi_internal}
     (expr_list:REG_EQUIV (mem:DI (plus:SI (plus:SI (reg/f:SI 99 [ D.3281 ])
                    (reg/f:SI 126 [ D.3282 ]))
                (const_int 8 [0x8])) [10 MEM[base: _1, index: _44, offset: 8]+0 S8 A64])
        (nil)))
(insn 121 120 122 6 (set (mem:DI (plus:SI (plus:SI (reg/f:SI 138 [ D.3281 ])
                    (reg/f:SI 140 [ D.3282 ]))
                (const_int 8 [0x8])) [10 MEM[base: _75, index: _77, offset: 8B]+0 S8 A64])
        (reg:DI 131 [ D.3287 ])) r.ii:197 88 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 131 [ D.3287 ])
        (nil)))


After reload,
 1. insn 116 is deleted
 2. In insn 117, the pseudo 138 is replaced with dx
 3. dx is spilled into stack at offset -0x36 from bp.
 4. For insn 119, first a new pseudo 193 is created which is equivalent to 130 and is loaded from memory. This 193 is in DI mode and is replaced by ax. This would clobber edx, but that is ok since dx is now stored into stack at offset -0x36.
 5. This is followed by the the store of 193 (ax) into memory location.
 6. Then dx is loaded from bp-0x36.
 7. insn 120 is deleted
 8. For insn 121, the pseudo 131 is replaced by 196 which is assigned the hard reg ax. 

In the sequence above, shouldn't the fill of dx from bp-0x36 at step 6 above happen after step 8?
Comment 3 wmi 2013-07-14 15:34:01 UTC
Seems problem is at deciding the priority of assign hardreg for reload pseudos .i.e the func reload_pseudo_compare_func.

This is the trace of 2nd iteration of reload pseudo assignments in r.ii.209r.reload:

  2nd iter for reload pseudo assignments:
         Reload r196 assignment failure
         Reload r199 assignment failure
         Reload r204 assignment failure
         Reload r204 assignment failure
          Spill reload r194(hr=1, freq=1426)
          Spill reload r195(hr=5, freq=1426)
          Spill reload r197(hr=1, freq=1426)
          Spill reload r198(hr=5, freq=1426)
          Spill reload r202(hr=1, freq=1426)
          Spill reload r203(hr=5, freq=1426)
         Assigning to 194 (cl=GENERAL_REGS, orig=138, freq=1426, tfirst=190, tfreq=4991)...
           Assign 1 to reload r194 (freq=1426)
         Assigning to 197 (cl=GENERAL_REGS, orig=138, freq=1426, tfirst=190, tfreq=4991)...
           Assign 1 to reload r197 (freq=1426)
        Hard reg 1 is preferable by r222 with profit 3029
        Hard reg 2 is preferable by r222 with profit 1425
         Assigning to 202 (cl=GENERAL_REGS, orig=138, freq=1426, tfirst=190, tfreq=4991)...
           Assign 1 to reload r202 (freq=1426)
         Assigning to 195 (cl=INDEX_REGS, orig=140, freq=1426, tfirst=191, tfreq=4278)...
           Assign 5 to reload r195 (freq=1426)
         Assigning to 198 (cl=INDEX_REGS, orig=140, freq=1426, tfirst=191, tfreq=4278)...
           Assign 5 to reload r198 (freq=1426)
         Assigning to 203 (cl=INDEX_REGS, orig=140, freq=1426, tfirst=191, tfreq=4278)...
           Assign 5 to reload r203 (freq=1426)
         Assigning to 196 (cl=GENERAL_REGS, orig=196, freq=1426, tfirst=196, tfreq=1426)...
         Trying 2: spill 225(freq=1426)
         Assigning to 199 (cl=GENERAL_REGS, orig=199, freq=1426, tfirst=199, tfreq=1426)...
         Trying 2: spill 225(freq=1426)
         Assigning to 204 (cl=GENERAL_REGS, orig=204, freq=1426, tfirst=204, tfreq=1426)...
         Trying 2: spill 216(freq=2139)
           Assign 0 to reload r196 (freq=1426)
           Assign 0 to reload r199 (freq=1426)
           Assign 0 to reload r204 (freq=1426)
  Reassigning non-reload pseudos

Here is the dump after lra_constraints. These are the insns related with r194, r195, r196:

(insn 200 120 201 6 (set (reg/f:SI 194 [orig:138 D.3281 ] [138])
        (reg/f:SI 138 [ D.3281 ])) 1.ii:197 89 {*movsi_internal}
     (nil))
(insn 201 200 121 6 (set (reg/f:SI 195 [orig:140 D.3282 ] [140])
        (reg/f:SI 140 [ D.3282 ])) 1.ii:197 89 {*movsi_internal}
     (nil))
(insn 121 201 202 6 (set (reg:DI 196)
        (mem:DI (plus:SI (plus:SI (reg/f:SI 99 [ D.3281 ])
                    (reg/f:SI 126 [ D.3282 ]))
                (const_int 8 [0x8])) [10 MEM[base: _1, index: _44, offset: 8]+0 S8 A64])) 1.ii:197 88 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 131 [ D.3287 ])
        (nil)))
(insn 202 121 122 6 (set (mem:DI (plus:SI (plus:SI (reg/f:SI 194 [orig:138 D.3281 ] [138])
                    (reg/f:SI 195 [orig:140 D.3282 ] [140]))
                (const_int 8 [0x8])) [10 MEM[base: _75, index: _77, offset: 8B]+0 S8 A64])
        (reg:DI 196)) 1.ii:197 88 {*movdi_internal}
     (nil))

From trace, r194 r195 are assigned hardreg before r196. Usually reload pseudos will not conflict with each other except a special case: they are in the same insn. r194,r195 and r196 just belong to such case. They are all in the insn 202. 

In addition, r194, r195 and r196 are all reload pseudos, so once r194 and r195 are allocated, they will not be spilled for assigning hardreg for r196. In this case, r194 and r195 get hardreg assigned before r196. So after r194 and r195 are assigned hardreg, r196 cannot find available hardreg because it has bigger mode and require a consecutive hardreg pair. All pseudos which cannot find hardreg after two iterations will be given ax simply, and report error. Trunk report error but 4.8.1 doesn't report it because lra_assert is only enabled in trunk but not in 4.8.1.

A possible fix is to give bigger mode pseudos higher priority in lra assignment.
Index: lra-assigns.c
===================================================================
--- lra-assigns.c	(revision 200944)
+++ lra-assigns.c	(working copy)
@@ -194,15 +194,15 @@ reload_pseudo_compare_func (const void *
   if ((diff = (ira_class_hard_regs_num[cl1]
 	       - ira_class_hard_regs_num[cl2])) != 0)
     return diff;
-  if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq
-	       - regno_assign_info[regno_assign_info[r1].first].freq)) != 0)
-    return diff;
   /* Allocate bigger pseudos first to avoid register file
      fragmentation.  */
   if ((diff
        = (ira_reg_class_max_nregs[cl2][lra_reg_info[r2].biggest_mode]
 	  - ira_reg_class_max_nregs[cl1][lra_reg_info[r1].biggest_mode])) != 0)
     return diff;
+  if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq
+	       - regno_assign_info[regno_assign_info[r1].first].freq)) != 0)
+    return diff;
   /* Put pseudos from the thread nearby.  */
   if ((diff = regno_assign_info[r1].first - regno_assign_info[r2].first) != 0)
     return diff;

The failure test is ok with this fix. bootstrap and regression test is ok.
Comment 4 Alexander Monakov 2017-09-19 10:16:51 UTC
Author: amonakov
Date: Tue Sep 19 10:16:20 2017
New Revision: 252972

URL: https://gcc.gnu.org/viewcvs?rev=252972&root=gcc&view=rev
Log:
lra: make reload_pseudo_compare_func a proper comparator

	PR rtl-optimization/57878
	PR rtl-optimization/68988
	* lra-assigns.c (reload_pseudo_compare_func): Remove fragmentation
	avoidance test involving non_reload_pseudos.  Move frequency test
	below the general fragmentation avoidance test.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-assigns.c
Comment 5 Alexander Monakov 2017-09-19 10:23:48 UTC
This bug was originally fixed in July 2013 by r201036: https://gcc.gnu.org/ml/gcc-patches/2013-07/msg00732.html , but that patch changed the comparator in a way that made it non-transitive (PR 68988). In 2014, fixes for PR 60650 introduced a more robust fix for the original spill failure.

This patch reverts the non_reload_pseudos check from the comparator to fix PR 68988 and also moves the fragmentation avoidance check earlier as suggested in comment #3 since it was found that it leads to smaller cc1 size without regressions on SPEC CPU 2000: https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01054.html