[Bug rtl-optimization/64164] [4.9/5 Regression] one more stack slot used due to one less inlining level

Wed Dec 3 10:03:00 GMT 2014

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
   Last reconfirmed|2014-12-03 00:00:00         |
          Component|middle-end                  |rtl-optimization
   Target Milestone|---                         |4.9.3

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
The difference is in whether there are extra user-named variables in the end
and thus SSA coalescing decision differences:

 stm_load (volatile stm_word_t * addr)
 {
-  stm_word_t l;
-  stm_word_t value;
   stm_word_t version;
   stm_word_t l;
   struct r_entry_t * r;
-  stm_word_t now;
...
+  size_t _32;
+  size_t _33;
+  size_t _34;

...

 Conflict graph:
+1: 3
+3: 1

 After sorting:
 Sorted Coalesce list:
+(16610) _30 <-> _33
 (651) _10 <-> _30

...

-Coalesce list: (10)_10 & (30)_30 [map: 1, 2] : Success -> 1
+Coalesce list: (30)_30 & (33)_33 [map: 2, 3] : Success -> 2
+Coalesce list: (10)_10 & (30)_30 [map: 1, 2] : Fail due to conflict

So it turns out the different coalescing ends up generating worse code.
It would be interesting to see why we decide that coalescing _30 and _33
is so much more beneficial than coalescing _10 and _30.

Ah, it simply uses EDGE_FREQUENCY...  and for some reason we predicted
that _33 & 1 != 0 is 10% taken only.

So ... the theory is that the version is faster on the important path?