[Bug tree-optimization/96966] [8/9/10/11 Regression] redundant memcpy not eliminated after pointer subtraction

rguenth at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Tue Sep 8 07:00:49 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96966

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-09-08
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
           Keywords|                            |alias
   Target Milestone|---                         |8.5

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Martin Sebor from comment #1)
> According to Godbolt, GCC 8.1 and 8.2 emit optimal code for both functions
> but GCC 8.3 emits the less optimal code for f and has g jump to it. 
> Starting with 10.1, GCC emits the same suboptimal code for both functions.

This is likely "caused" by 08dfb1d682a707f7319aafec28edda424395dae5, aka
the fix for PR91108 which was also backported.  In the IL

  <bb 2> :
  _5 = MEM <__int128 unsigned> [(char * {ref-all})s_4(D)];
  MEM <__int128 unsigned> [(char * {ref-all})&a] = _5;
  _8 = MEM <__int128 unsigned> [(char * {ref-all})s_4(D)];
  MEM <__int128 unsigned> [(char * {ref-all})&a] = _8;
  return;

we lost the information that MEM <__int128 unsigned> [(char * {ref-all})s_4(D)]
and MEM <__int128 unsigned> [(char * {ref-all})&a] do not partially overlap.
The memcpy call guaranteed that.  The way the aliasing code rules out
partial overlap is by using alignment which doesn't help us here.

That it worked in GCC 8.[12] was due to bad code in VN that ignored
the possibility of a partial overlap here.

We eventually could lower memcpy (a, s, 16) to load + store with noting
they are independent using MR_DEPENDENCE_CLIQUE/BASE but this may cause
depleting of the clique resource on artificial testcases quickly
(we only have 16bits for clique).  Shifting bit allocation between
clique and base might be a possibility there, but at least clique
overflow mitigation would need to be put in place.


More information about the Gcc-bugs mailing list