Bug 112447 - risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3
Summary: risc-v regression: FAIL: gcc.c-torture/execute/memset-3.c -O3
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 13.0
: P3 normal
Target Milestone: ---
Assignee: Vineet Gupta
URL:
Keywords: testsuite-fail
Depends on:
Blocks:
 
Reported: 2023-11-08 20:11 UTC by Vineet Gupta
Modified: 2023-11-15 17:53 UTC (History)
6 users (show)

See Also:
Host:
Target: riscv
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-11-08 00:00:00


Attachments
manually reduced src (927 bytes, text/x-csrc)
2023-11-08 22:33 UTC, Vineet Gupta
Details
asm output ok (3.18 KB, text/plain)
2023-11-08 22:34 UTC, Vineet Gupta
Details
asm output nok (3.15 KB, text/plain)
2023-11-08 22:35 UTC, Vineet Gupta
Details
bug fix patch (808 bytes, patch)
2023-11-15 02:49 UTC, JuzheZhong
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Vineet Gupta 2023-11-08 20:11:04 UTC
As reported in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311#c8

we have following execute failures on trunk.

        === gcc: Unexpected fails for rv64gcv lp64d medlow ===
FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test

The issue is an extraneous VSETVLI instruction (with wrong SEW) being generated which creates wrong fill pattern for memset.

```
main:

[...]
        
.L36:  ; 2. loop start for @off 0 
	vse8.v		v1,0(t3)
	vse8.v		v1,0(t6)
	vse8.v		v1,0(s1)
	vse8.v	        v3,0(a5)
...
	; loop epilogue
	li	a7,15
	beq	a4,a7,.L171
	vsetvli	zero,zero,e32,m2,ta,ma   <--- wrong
	j	.L36
```

vsetvli pass dumps:

```
Phase 3: Reduce global vsetvl infos. 

  Compute LCM insert and delete data:

      Expr[2]: VALID (insn 2847, bb 3)
        Demand fields: demand_sew_lmul demand_avl
        SEW=8, VLMUL=mf2, RATIO=16, MAX_SEW=64
        TAIL_POLICY=agnostic, MASK_POLICY=agnostic
        AVL=(const_int 8 [0x8])
        VL=(nil)

VSETVL infos after phase 3

  bb 3:
    probability: always (guessed)
    Header vsetvl info:VALID (insn 2847, bb 3) (deleted)  <---
      Demand fields: demand_sew_lmul demand_avl
      SEW=8, VLMUL=mf2, RATIO=16, MAX_SEW=64
      TAIL_POLICY=agnostic, MASK_POLICY=agnostic
      AVL=(const_int 8 [0x8])
      VL=(nil)
```

So it seems LCM is deleting the valid VSETVLI insn which later causes Phase 4 to insert a different/incorrect one.

I revert the following commit and the issue goes away. 

 2023-10-18 f0e28d8c1371 RISC-V: Fix failed hoist in LICM of vmv.v.x instruction  

This at least tells us the cause of issue, next step is to fix the issue.
Comment 1 JuzheZhong 2023-11-08 22:05:44 UTC
Could you share more assembly ?
Comment 2 Vineet Gupta 2023-11-08 22:33:59 UTC
Created attachment 56539 [details]
manually reduced src
Comment 3 Vineet Gupta 2023-11-08 22:34:43 UTC
Created attachment 56540 [details]
asm output ok
Comment 4 Vineet Gupta 2023-11-08 22:35:00 UTC
Created attachment 56541 [details]
asm output nok
Comment 5 JuzheZhong 2023-11-08 22:40:10 UTC
I don't think VSETVL is wrong.


        vsetivli	zero,8,e8,mf2,ta,ma
	sd	ra,120(sp)
	vmv.x.s	a1,v1
        ...
.L36:
        vse8.v
        ...
        vsetivli	zero,8,e32,m2,ta,ma
        j    .L36

Both e8mf2 and e32m2 are valid for vse8.v since they have same ratio = 16
Comment 6 Vineet Gupta 2023-11-08 22:46:28 UTC
I have debugged this by single stepped in qemu 

when the test fails (first loop for offset 0, iteration 8)

The last VSETVLI is this one, 

   0x10a3e   0d107057  vsetvli  zero,zero,e32,m2,ta,ma
   0x10a42   j  0x10666

We eventually hit a VMV.v.x. which creates invalid pattern due to e32.

   (gdb) info reg vtype
   vtype          0xd1	209	# SEW = 010’b / e32, LMUL = 001’b / m2
   (gdb) info reg vl
   vl             0x8	8
   (gdb) info reg a0
   a0             0x41	65

   vmv.v.x	v2,a0

  (gdb) info reg v2
  v2             {q = {0x41000000410000004100000041}	 
  (gdb) info reg v3
  v2             {q = {0x41000000410000004100000041}
Comment 7 JuzheZhong 2023-11-08 22:49:20 UTC
Oh. I missed it:

vmv.v.x	v2,s0
	vse8.v	v2,0(a5)

Leave it to me today. It should be simple fix.

Thanks for report it.
Comment 8 JuzheZhong 2023-11-08 22:52:17 UTC
Could you continue debug more cases ?


FAIL: gcc.c-torture/execute/pr89369.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr89369.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr89369.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.c-torture/execute/pr89369.c   -O3 -g  execution test
FAIL: gcc.dg/pr96239.c execution test
FAIL: gcc.dg/vshift-5.c execution test
FAIL: gcc.dg/torture/pr61346.c   -O2  execution test
FAIL: gcc.dg/torture/pr61346.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/torture/pr61346.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.dg/torture/pr61346.c   -O3 -g  execution test

They are RV32 system. memset issue I will fix it soon today.
Comment 9 Vineet Gupta 2023-11-08 23:00:43 UTC
(In reply to JuzheZhong from comment #7)
> Oh. I missed it:
> 
>       vmv.v.x	v2,s0
> 	vse8.v	v2,0(a5)
> 
> Leave it to me today. It should be simple fix.
> 
> Thanks for report it.

Can I request you to let me continue to debug and fix it. I want to familiarize myself with the vsetv pass and this seems like a good opportunity to do so considering you think the fix is not hard.
Comment 10 JuzheZhong 2023-11-08 23:01:27 UTC
(In reply to Vineet Gupta from comment #9)
> (In reply to JuzheZhong from comment #7)
> > Oh. I missed it:
> > 
> >       vmv.v.x	v2,s0
> > 	vse8.v	v2,0(a5)
> > 
> > Leave it to me today. It should be simple fix.
> > 
> > Thanks for report it.
> 
> Can I request you to let me continue to debug and fix it. I want to
> familiarize myself with the vsetv pass and this seems like a good
> opportunity to do so considering you think the fix is not hard.

OK. Thanks.
Comment 11 Vineet Gupta 2023-11-14 20:02:51 UTC
As a hack I commented out set_delete() to see what the extraneous vsetvli
would have been.

```
  .L36:
       # bb 3: start of outer loop: off 0

	vsetvli	zero,zero,e8,mf2,ta,ma     # insn 2915
	vse8.v	v1,0(t3)                   # insn 2874, bb 3
	vse8.v	v1,0(t6)
	vse8.v	v1,0(s0)
  ...

	# bb 181

	addi	a4,a4,1
	li	a7,15
	bne	a4,a7,.L36                 # insn 1082

       # bb 182: start of outer loop: off 1

        vsetvli	zero,zero,e32,mf2,ta,ma    # insn 2919
	vmv.x.s	a3,v1                      # insn 1858
	vsetvli	zero,zero,e16,mf2,ta,ma
	sw	a3,8(sp)
	vmv.x.s	a3,v1
```

Essentially phase 2 is fusing vsetvl info for insn 2874 and insn 1858
But the fused info doesn't seem right. 

	vsetvli	zero,zero,e32,m2,ta,ma
	j	.L36

Manually modifying it to orig value fixes the test.

	vsetvli	zero,zero,e8,mf2,ta,ma
	j	.L36

Phase 2 logs

```
  Try lift up 1.

      earliest:
        Edge(bb 0 -> bb 2): n_bits = 13, set = {0 }
        Edge(bb 62 -> bb 63): n_bits = 13, set = {4 }
        Edge(bb 180 -> bb 181): n_bits = 13, set = {8 }
        Edge(bb 181 -> bb 3): n_bits = 13, set = {2 }

    Fuse curr info since prev info compatible with it:
      prev_info: VALID (insn 1858, bb 181)	 <-- due to Edge(bb 181 -> bb 3)
        Demand fields: demand_sew_only
        SEW=32, VLMUL=mf2, RATIO=64, MAX_SEW=64
        TAIL_POLICY=agnostic, MASK_POLICY=agnostic
        AVL=(nil)
        VL=(nil)
      curr_info: VALID (insn 2874, bb 3)
        Demand fields: demand_ratio_only demand_avl
        SEW=8, VLMUL=mf2, RATIO=16, MAX_SEW=64
        TAIL_POLICY=agnostic, MASK_POLICY=agnostic
        AVL=(const_int 8 [0x8])
        VL=(nil)

      prev_info after fused: VALID (insn 1858, bb 181)
        Demand fields: demand_sew_lmul demand_avl
        SEW=32, VLMUL=m2, RATIO=16, MAX_SEW=64
        TAIL_POLICY=agnostic, MASK_POLICY=agnostic
        AVL=(const_int 8 [0x8])
        VL=(nil)
```

This fuse in turn is happening from 

DEF_SEW_LMUL_RULE (sew_only, ratio_only, sew_lmul,
		   next_ratio_valid_for_prev_sew_p, always_false,
		   modify_lmul_with_next_ratio)

I'm not really sure if the merge callback here is correct.
Comment 12 JuzheZhong 2023-11-14 22:29:26 UTC
The merge is correct here.

vmv.x.s demand SEW = 32 wheras vle/vse demand RATIO = 16 (e8mf2)

So, to make vsetvl valid for both of them, the vtype should be sew = 32 and lmul = M2 (32 / 16 = 2).
Comment 13 Vineet Gupta 2023-11-14 23:18:12 UTC
Then I don't know where the problem actually is ?
Comment 14 JuzheZhong 2023-11-14 23:23:16 UTC
Let me give you some guide which helps you to dig into the problem.

First, reduce the case as follows:

#include <string.h>

void abort (void);
void exit (int);

#ifndef MAX_OFFSET
#define MAX_OFFSET (sizeof (long long))
#endif

#ifndef MAX_COPY
#define MAX_COPY 15
#endif

#ifndef MAX_EXTRA
#define MAX_EXTRA (sizeof (long long))
#endif

#define MAX_LENGTH (MAX_OFFSET + MAX_COPY + MAX_EXTRA)

static union {
  char buf[MAX_LENGTH];
  long long align_int;
  long double align_fp;
} u;

char A = 'A';

void reset ()
{
  int i;

  for (i = 0; i < MAX_LENGTH; i++)
    u.buf[i] = 'a';
}

void check (int off, int len, int ch)
{
  char *q;
  int i;

  q = u.buf;
  for (i = 0; i < off; i++, q++)
    if (*q != 'a')
      abort ();

  for (i = 0; i < len; i++, q++)
    if (*q != ch)
      abort ();

  for (i = 0; i < MAX_EXTRA; i++, q++)
    if (*q != 'a')
      abort ();
}

int main ()
{
  int len;
  char *p;

  /* off == 0 */
  for (len = 0; len < MAX_COPY; len++)
    {
      reset ();

      p = memset (u.buf, '\0', len);
      if (p != u.buf) abort ();
      check (0, len, '\0');

      p = memset (u.buf, A, len);
      if (p != u.buf) abort ();
      check (0, len, 'A');

      p = memset (u.buf, 'B', len);
      if (p != u.buf) abort ();
      check (0, len, 'B');
    }

  /* off == 1 */
  for (len = 0; len < MAX_COPY; len++)
    {
      reset ();

      p = memset (u.buf+1, '\0', len);
      if (p != u.buf+1) abort ();
      check (1, len, '\0');

      p = memset (u.buf+1, A, len);
      if (p != u.buf+1) abort ();
      check (1, len, 'A');

      p = memset (u.buf+1, 'B', len);
      if (p != u.buf+1) abort ();
      check (1, len, 'B');
    }

  exit (0);
}
Comment 15 Vineet Gupta 2023-11-14 23:34:13 UTC
(In reply to JuzheZhong from comment #14)
> Let me give you some guide which helps you to dig into the problem.
> 
> First, reduce the case as follows:

Did your msg get truncated or pressed send too soon ?

Because the reduced test you pasted is exactly what I uploaded to the bug and I can't reduce it any further.
Comment 16 JuzheZhong 2023-11-15 01:04:09 UTC
The victim should be these 2 pieces of codes:

.L20:
	lbu	a1,0(a3)
	li	t1,97
	bne	a1,t1,.L21
	lbu	t1,1(a3)
	bne	t1,a1,.L21
	lbu	a1,2(a3)
	bne	a1,t1,.L21
	lbu	t1,3(a3)
	bne	t1,a1,.L21
	lbu	a1,4(a3)
	bne	a1,t1,.L21
	lbu	t1,5(a3)
	bne	t1,a1,.L21
	lbu	a1,6(a3)
	bne	a1,t1,.L21
	lbu	a3,7(a3)
	bne	a3,a1,.L21
	lui	a3,%hi(A)
	lbu	a3,%lo(A)(a3)
	mv	t1,a5
	mv	a1,a4
	bltu	a4,t3,.L24
	mv	t1,t4
	addi	a1,a4,-8
	vmv.v.x	v2,a3
	vse8.v	v2,0(a2)
.L24:

.L29:
	lbu	t1,0(a1)
	li	t6,97
	bne	t1,t6,.L21
	lbu	t6,1(a1)
	bne	t6,t1,.L21
	lbu	t1,2(a1)
	bne	t1,t6,.L21
	lbu	t6,3(a1)
	bne	t6,t1,.L21
	lbu	t1,4(a1)
	bne	t1,t6,.L21
	lbu	t6,5(a1)
	bne	t6,t1,.L21
	lbu	t1,6(a1)
	bne	t1,t6,.L21
	lbu	a1,7(a1)
	bne	a1,t1,.L21
	mv	t1,a5
	mv	a1,a4
	bltu	a4,t3,.L31
	li	t6,66
	mv	t1,t4
	addi	a1,a4,-8
	vmv.v.x	v2,t6
	vse8.v	v2,0(a2)
.L31:

They are located on BB 54 and BB 113. Their VSETVLs should not be eliminiated.
Comment 17 JuzheZhong 2023-11-15 01:07:02 UTC
The incorrect elimination happens on pre_global_vsetvl_info

You can try simple hack like this:

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 8466b5d019e..65dcf931808 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3135,6 +3135,8 @@ pre_vsetvl::pre_global_vsetvl_info ()
   for (const bb_info *bb : crtl->ssa->bbs ())
     {
       sbitmap d = m_del[bb->index ()];
+      if (bb->index () == 113 || bb->index () == 54)
+        continue;
       if (bitmap_count_bits (d) == 0)
        continue;


FAIL will be fixed.

So, the idea is that we should investigate why LCM calculation return

m_del to be true on BB 54 and BB 113.

The calculation is done by pre_edge_lcm_avs
Comment 18 JuzheZhong 2023-11-15 02:03:04 UTC
I know how to fix it.

Testing a candidate patch.
Comment 19 JuzheZhong 2023-11-15 02:49:09 UTC
Created attachment 56589 [details]
bug fix patch

Hi,Vineet.

The attachment is the bug fix patch.
I have tested on both RV32 and RV64 C/C++.
No regression with reducing memset-3.c FAIL.

You can verify and send the patch.

Thanks.
Comment 20 JuzheZhong 2023-11-15 02:57:44 UTC
This is the reason of this patch:

The whole analysis is correct.But we made a mistake on inserting vsetvls.

This is the story.

We have 2 types of global vsetvls insertion.
One is earliest fusion of each end of the block.
The other is LCM suggested edge vsetvls.

So before this patch, insertion as follows:

(insn 2817 2820 2818 361 (set (reg:SI 67 vtype)
        (unspec:SI [
                (const_int 8 [0x8])
                (const_int 7 [0x7])
                (const_int 1 [0x1]) repeated x2
            ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
     (nil))
(insn 2818 2817 999 361 (set (reg:SI 67 vtype)
        (unspec:SI [
                (const_int 32 [0x20])
                (const_int 1 [0x1]) repeated x3
            ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
     (nil))

After this patch:

(insn 2817 2820 2819 361 (set (reg:SI 67 vtype)
        (unspec:SI [
                (const_int 32 [0x20])
                (const_int 1 [0x1]) repeated x3
            ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
     (nil))
(insn 2819 2817 999 361 (set (reg:SI 67 vtype)
        (unspec:SI [
                (const_int 8 [0x8])
                (const_int 7 [0x7])
                (const_int 1 [0x1]) repeated x2
            ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
     (nil))

The original insertion order is incorrect.

We should first insert earliest fusion since it is the vsetvls information already there which was seen by later LCM. We just delay the insertion.
So it should be come before the LCM suggested insertion.

Feel free to investigate it with comparing before and after the patch.
Then feel free to send patch with this fix after you understand the reasons.

Thanks.
Comment 21 GCC Commits 2023-11-15 17:35:36 UTC
The master branch has been updated by Vineet Gupta <vineetg@gcc.gnu.org>:

https://gcc.gnu.org/g:d1189ceefc1da1515e039c9cfd2f5a67b5820207

commit r14-5507-gd1189ceefc1da1515e039c9cfd2f5a67b5820207
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Tue Nov 14 19:23:19 2023 -0800

    RISC-V: fix vsetvli pass testsuite failure [PR/112447]
    
    Fixes: f0e28d8c1371 ("RISC-V: Fix failed hoist in LICM of vmv.v.x instruction")
    
    Since above commit, we have following failure:
    
      FAIL: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test
      FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test
    
    The issue was not the commit but rather it unravelled an issue in the
    vsetvli pass.
    
    Here's Juzhe's analysis:
    
    We have 2 types of global vsetvls insertion.
    One is earliest fusion of each end of the block.
    The other is LCM suggested edge vsetvls.
    
    So before this patch, insertion as follows:
    
    |  (insn 2817 2820 2818 361 (set (reg:SI 67 vtype)
    |        (unspec:SI [
    |                (const_int 8 [0x8])
    |                (const_int 7 [0x7])
    |                (const_int 1 [0x1]) repeated x2
    |            ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
    |     (nil))
    |  (insn 2818 2817 999 361 (set (reg:SI 67 vtype)
    |        (unspec:SI [
    |                (const_int 32 [0x20])
    |                (const_int 1 [0x1]) repeated x3
    |            ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
    |     (nil))
    
    After this patch:
    
    |  (insn 2817 2820 2819 361 (set (reg:SI 67 vtype)
    |        (unspec:SI [
    |                (const_int 32 [0x20])
    |                (const_int 1 [0x1]) repeated x3
    |            ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
    |     (nil))
    |  (insn 2819 2817 999 361 (set (reg:SI 67 vtype)
    |        (unspec:SI [
    |                (const_int 8 [0x8])
    |                (const_int 7 [0x7])
    |                (const_int 1 [0x1]) repeated x2
    |            ] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
    |     (nil))
    
    The original insertion order is incorrect.
    
    We should first insert earliest fusion since it is the vsetvls information
    already there which was seen by later LCM. We just delay the insertion.
    So it should be come before the LCM suggested insertion.
    
            PR target/112447
    
    gcc/ChangeLog:
            * config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): Insert
            local vsetvl info before LCM suggested one.
    
            Tested-by: Patrick O'Neill <patrick@rivosinc.com> # pre-commit-CI #679
            Co-developed-by: Vineet Gupta <vineetg@rivosinc.com>
Comment 22 Vineet Gupta 2023-11-15 17:53:53 UTC
Fixed for gcc-14.