Bug 55547 - [4.8 Regression] Alias analysis does not handle AND addresses correctly
Summary: [4.8 Regression] Alias analysis does not handle AND addresses correctly
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.8.0
: P1 normal
Target Milestone: 4.8.0
Assignee: Alexandre Oliva
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2012-11-30 13:12 UTC by Uroš Bizjak
Modified: 2013-01-18 11:05 UTC (History)
2 users (show)

See Also:
Host:
Target: alphaev68-unknown-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-12-01 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Uroš Bizjak 2012-11-30 13:12:47 UTC
The patch that fixed alias analysis [1] triggered following testsuite failure [2] on alphaev68-linux-gnu:

FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2 
FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2 -fomit-frame-pointer -finline-functions 
FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2 -fomit-frame-pointer -finline-functions -funroll-loops 
FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2 -fbounds-check 

The testcase fails in the line where i is checked for 26. The value is 0:

      subroutine foo (b)
        logical b
        integer i, j
        character*24 s
        save
        if (b) then
          i = 26
          j = 131
          s = 'This is a test string'
        else
          if (i .ne. 26 .or. j .ne. 131) call abort
          if (s .ne. 'This is a test string') call abort
        end if
      end subroutine foo

This is a runtime failure, but the problem can be seen with a crosscompiler, configured with --target=alpha-linux-gnu.

The testcase should be compiled with "-O2 -mcpu=ev67 -mexplicit-relocs".

Before _208r.sched1, we have:

   10: r77:DI=high(`i.843')
   11: r78:SI=0x1a
   12: [r77:DI+low(`i.843')]=r78:SI
      REG_DEADr78:SI
      REG_DEADr77:DI
...
   42: r102:DI=r81:DI+low(`s.845')
      REG_EQUAL`s.845'
   43: r106:DI=[r102:DI+0xf&0xfffffffffffffff8]
   44: r105:DI=[r81:DI+low(`s.845')&0xfffffffffffffff8]
      REG_EQUAL[`s.845'&0xfffffffffffffff8]
 ...
   57: [r81:DI+low(`s.845')&0xfffffffffffffff8]=r105:DI
      REG_DEADr105:DI
      REG_DEADr81:DI

(The last sequence is part of alpha_expand_block_move, specifically an unaligned access to the head of string "s.845"). In the above sequence, load and store to i.843 and s.845 are lined one after another.

The problematic sequence is shown in function "foo" in _.208r.sched1 RTL dump, after scheduler reordered instructions:

   17: r82:DI=high(`*$LC0')
   16: r81:DI=high(`s.845')
   70: r115:QI=0x20
   10: r77:DI=high(`i.843')
   18: r86:DI=r82:DI+low(`*$LC0')
      REG_EQUAL`*$LC0'
   42: r102:DI=r81:DI+low(`s.845')
      REG_EQUAL`s.845'
   19: r83:DI=[r82:DI+low(`*$LC0')&0xfffffffffffffff8]
      REG_DEADr82:DI
      REG_EQUAL[`*$LC0'&0xfffffffffffffff8]
   44: r105:DI=[r81:DI+low(`s.845')&0xfffffffffffffff8]
      REG_EQUAL[`s.845'&0xfffffffffffffff8]
   21: r89:DI=[r86:DI+0xf&0xfffffffffffffff8]
   20: r84:DI=[r86:DI+0x8&0xfffffffffffffff8]
   23: r91:DI=r86:DI&0x7
   11: r78:SI=0x1a
   43: r106:DI=[r102:DI+0xf&0xfffffffffffffff8]
...
   12: [r77:DI+low(`i.843')]=r78:SI
      REG_DEADr78:SI
      REG_DEADr77:DI
...
   57: [r81:DI+low(`s.845')&0xfffffffffffffff8]=r105:DI
      REG_DEADr105:DI
      REG_DEADr81:DI

As it happens, i lives inside "aligned" s address range:

(gdb) p &i
$3 = (PTR TO -> ( integer(kind=4) )) 0x120012090 <i.853>
(gdb) p &s
$4 = (PTR TO -> ( character*24 )) 0x120012094 <s.855>

Due to the way unaligned stores are implemented on alpha, where an aligned address is read first, the value is inserted and the combination is stored back to aligned address, we rewrite location of "i" with previous value, 0 in this case.

To illustrate this problem, let's put breakpoints at the store of i (b1) and store of s (b2):

   0x0000000120000a8c <+204>:   cmoveq  t10,0,t9
   0x0000000120000a90 <+208>:   ldah    t10,0(gp)
b1 0x0000000120000a94 <+212>:   stl     a3,-32640(t10)
   0x0000000120000a98 <+216>:   lda     a3,131
   0x0000000120000a9c <+220>:   or      t2,t8,t2
   0x0000000120000aa0 <+224>:   or      t3,t9,t3
   0x0000000120000aa4 <+228>:   ldah    t10,0(gp)
   0x0000000120000aa8 <+232>:   stb     t6,23(t0)
   0x0000000120000aac <+236>:   insqh   t2,t0,t8
   0x0000000120000ab0 <+240>:   mskqh   t5,t0,t5
   0x0000000120000ab4 <+244>:   stl     a3,-32644(t10)
   0x0000000120000ab8 <+248>:   insqh   t3,t0,t1
   0x0000000120000abc <+252>:   insql   t2,t0,t2
   0x0000000120000ac0 <+256>:   or      t5,t8,t5
   0x0000000120000ac4 <+260>:   stq_u   t5,15(t0)
   0x0000000120000ac8 <+264>:   insql   t3,t0,t3
   0x0000000120000acc <+268>:   or      t1,t2,t1
   0x0000000120000ad0 <+272>:   stq_u   t1,8(t0)
   0x0000000120000ad4 <+276>:   stb     a4,16(t0)
   0x0000000120000ad8 <+280>:   or      t4,t3,t4
b2 0x0000000120000adc <+284>:   stq_u   t4,-32636(t7)
   0x0000000120000ae0 <+288>:   stb     a5,17(t0)

Breakpoint 1, 0x0000000120000a94 in foo (b=.TRUE.) at save_1.f90:7
7                 i = 26
(gdb) p i
$5 = 0
(gdb) s
8                 j = 131
(gdb) p i
$6 = 26
(gdb) c
Continuing.

Breakpoint 2, 0x0000000120000adc in foo (b=.TRUE.) at save_1.f90:9
9                 s = 'This is a test string'
(gdb) p i
$7 = 26
(gdb) s
14            end subroutine foo
(gdb) p i
$8 = 0
(gdb) 

When compiled with additional -fno-schedule-insns, the test runs OK.

[1] http://gcc.gnu.org/ml/gcc-patches/2012-11/msg02225.html
[2] http://gcc.gnu.org/ml/gcc-testresults/2012-11/msg02492.html
Comment 1 Uroš Bizjak 2012-12-01 09:21:49 UTC
Reverting r188868 as suggested by Steven [1] solves the problem, too.

CC author of the patch.

[1] http://gcc.gnu.org/ml/gcc-patches/2012-11/msg02549.html
Comment 2 Uroš Bizjak 2012-12-01 09:23:04 UTC
Confirmed as 4.8 regression.
Comment 3 Alexandre Oliva 2012-12-07 21:44:56 UTC
Mine
Comment 4 Alexandre Oliva 2012-12-07 23:29:28 UTC
I don't understand how this sort of unaligned access that modifies unrelated objects can fit in with any reasonable threaded memory model, but I guess that's beyond the scope of this bug report ;-)

Here's a patch that restores negative sizes after alignment adjustments, so that the tests for negative sizes elsewhere kick in.  We might want to use absolute values elsewhere to get stricter results, but this should be at least conservatively better.

Uros, I'm regstrapping this on x86_64; would you please give it a spin on alpha?  TIA,

--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -2100,14 +2100,20 @@ memrefs_conflict_p (int xsize, rtx x, int ysize, rtx y, HOST_WIDE_INT c)
 
   /* Deal with alignment ANDs by adjusting offset and size so as to
      cover the maximum range, without taking any previously known
-     alignment into account.  */
+     alignment into account.  Make a size negative after such an
+     adjustments, so that, if we end up with e.g. two SYMBOL_REFs, we
+     assume a potential overlap, because they may end up in contiguous
+     memory locations and the stricter-alignment access may span over
+     part of both.  */
   if (GET_CODE (x) == AND && CONST_INT_P (XEXP (x, 1)))
     {
       HOST_WIDE_INT sc = INTVAL (XEXP (x, 1));
       unsigned HOST_WIDE_INT uc = sc;
-      if (xsize > 0 && sc < 0 && -uc == (uc & -uc))
+      if (sc < 0 && -uc == (uc & -uc))
 	{
-	  xsize -= sc + 1;
+	  if (xsize > 0)
+	    xsize = -xsize;
+	  xsize += sc + 1;
 	  c -= sc + 1;
 	  return memrefs_conflict_p (xsize, canon_rtx (XEXP (x, 0)),
 				     ysize, y, c);
@@ -2117,9 +2123,11 @@ memrefs_conflict_p (int xsize, rtx x, int ysize, rtx y, HOST_WIDE_INT c)
     {
       HOST_WIDE_INT sc = INTVAL (XEXP (y, 1));
       unsigned HOST_WIDE_INT uc = sc;
-      if (ysize > 0 && sc < 0 && -uc == (uc & -uc))
+      if (sc < 0 && -uc == (uc & -uc))
 	{
-	  ysize -= sc + 1;
+	  if (ysize > 0)
+	    ysize = -ysize;
+	  ysize += sc + 1;
 	  c += sc + 1;
 	  return memrefs_conflict_p (xsize, x,
 				     ysize, canon_rtx (XEXP (y, 0)), c);
Comment 5 Uroš Bizjak 2012-12-09 13:52:06 UTC
(In reply to comment #4)

> Uros, I'm regstrapping this on x86_64; would you please give it a spin on
> alpha?  TIA,

Thanks, I have started the bootstrap, please expect results in ~10h.
Comment 6 Uroš Bizjak 2012-12-09 22:44:02 UTC
(In reply to comment #4)

> Uros, I'm regstrapping this on x86_64; would you please give it a spin on
> alpha?  TIA,

The patch bootstraps and regtests OK on alpha [1]. The save_1.f90 failures are all gone now.

[1] http://gcc.gnu.org/ml/gcc-testresults/2012-12/msg00803.html
Comment 7 Uroš Bizjak 2013-01-02 08:30:26 UTC
Alexandre, do you plan to proceed with the patch in Comment #4?
Comment 8 Richard Henderson 2013-01-15 16:47:29 UTC
The patch in #c4 is ok.
Comment 9 Alexandre Oliva 2013-01-16 04:31:38 UTC
Author: aoliva
Date: Wed Jan 16 04:31:30 2013
New Revision: 195227

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195227
Log:
PR rtl-optimization/55547
PR rtl-optimization/53827
PR debug/53671
PR debug/49888
* alias.c (memrefs_conflict_p): Set sizes to negative after
AND adjustments.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/alias.c
Comment 10 Alexandre Oliva 2013-01-16 04:37:12 UTC
Fixed, although there's a second patch (now pending review) that makes the handling of negative sizes for aligned addresses stricter and more correct in http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00822.html
Comment 11 Jakub Jelinek 2013-01-16 15:19:47 UTC
Unfortunately the fix caused some guality regressions, on x86_64 -m32
+FAIL: gcc.dg/guality/drap.c  -O1  line 21 a == 5
+FAIL: gcc.dg/guality/drap.c  -O1  line 22 b == 6
up to -Os, and
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 16 arg1 == 1
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 16 arg2 == 2
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 16 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 18 arg1 == 1
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 18 arg2 == 2
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 18 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 18 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 18 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 18 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 18 arg7 == 30
(up to -Os),
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 16 arg1 == 1
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 16 arg2 == 2
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 16 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 18 arg1 == 1
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 18 arg2 == 2
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 18 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 18 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 18 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 18 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-2.c  -O1  line 18 arg7 == 30
(up to -Os),
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 14 arg1 == 1
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 14 arg2 == 2
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 14 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 14 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 14 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 14 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 14 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 16 arg1 == 1
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 16 arg2 == 2
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-3.c  -O1  line 16 arg7 == 30
(up to -Os) and finally
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 14 arg1 == 1
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 14 arg2 == 2
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 14 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 14 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 14 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 14 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 14 arg7 == 30
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 16 arg1 == 1
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 16 arg2 == 2
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 16 arg3 == 3
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 16 arg4 == 4
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 16 arg5 == 5
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 16 arg6 == 6
+FAIL: gcc.dg/guality/pr36728-4.c  -O1  line 16 arg7 == 30
(up to -Os), and similarly for x86_64 -m64 (fewer 36728-*.c regressions, but still some and all drap.c regressions).
Comment 12 Jakub Jelinek 2013-01-16 16:31:10 UTC
Ah, hjl opened PR56006 to track #c11.
Comment 13 Alexandre Oliva 2013-01-18 10:57:50 UTC
Author: aoliva
Date: Fri Jan 18 10:57:36 2013
New Revision: 195289

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195289
Log:
PR rtl-optimization/55547
PR rtl-optimization/53827
PR debug/53671
PR debug/49888
* alias.c (offset_overlap_p): New, factored out of...
(memrefs_conflict_p): ... this.  Use absolute sizes.  Retain
the conservative special case for symbolic constants.  Don't
adjust zero sizes on alignment.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/alias.c
Comment 14 Alexandre Oliva 2013-01-18 11:05:12 UTC
Author: aoliva
Date: Fri Jan 18 11:05:04 2013
New Revision: 195292

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195292
Log:
Mention PR debug/56006 in earlier commit for PR rtl-optimization/55547.

Modified:
    trunk/gcc/ChangeLog