[Bug target/81833] New: [7 Regression] PowerPC: VSX: Miscompiles ffmpeg's scalarproduct_int16_vsx at -O1

Sat Aug 12 14:42:00 GMT 2017

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81833

            Bug ID: 81833
           Summary: [7 Regression] PowerPC: VSX: Miscompiles ffmpeg's
                    scalarproduct_int16_vsx at -O1
           Product: gcc
           Version: 7.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: james410 at cowgill dot org.uk
  Target Milestone: ---

Created attachment 41984
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41984&action=edit
vsx-ffmpeg-test.c

Originally from this Debian bug:
https://bugs.debian.org/871565

GCC 7 miscompiles the scalarproduct_int16_vsx function found in FFmpeg's
source. With the attached testcase, the result is always 0 but it should return
8.

Compile with:
> gcc -c -O1 vsx-ffmpeg-test.c

This is the miscompiled assembly:
> 0000000000000000 <scalarproduct_int16_vsx>:
>    0:   0c 03 80 11     vspltisb v12,0 
>    4:   8c 03 20 10     vspltisw v1,0                  
>    8:   00 00 85 2f     cmpwi   cr7,r5,0
>    c:   3c 00 9d 40     ble     cr7,48 <scalarproduct_int16_vsx+0x48>
>   10:   00 00 40 39     li      r10,0   
>   14:   99 1e 00 7c     lxvd2x  vs32,0,r3
>   18:   57 02 00 f0     xxswapd vs32,vs32
>   1c:   ce 20 a0 7d     lvx     v13,0,r4
>   20:   28 6b 00 10     vmsumshm v0,v0,v13,v12
>   24:   8c 0a 00 10     vspltw  v0,v1,0
>   28:   88 07 00 10     vsumsws v0,v0,v0
>   2c:   2c 03 20 10     vsldoi  v1,v0,v0,12   
>   30:   10 00 63 38     addi    r3,r3,16                               
>   34:   10 00 84 38     addi    r4,r4,16
>   38:   08 00 2a 39     addi    r9,r10,8
>   3c:   b4 07 2a 7d     extsw   r10,r9                        
>   40:   00 48 85 7f     cmpw    cr7,r5,r9
>   44:   d0 ff 9d 41     bgt     cr7,14 <scalarproduct_int16_vsx+0x14>
>   48:   93 0a 20 f0     xxspltw vs33,vs33,0   
>   4c:   f0 ff 21 39     addi    r9,r1,-16             
>   50:   8e 49 20 7c     stvewx  v1,0,r9
>   54:   f2 ff 61 e8     lwa     r3,-16(r1)
>   58:   20 00 80 4e     blr

The instructions at 24 and 28 look dodgy to me. It looks to me like the t and
res variables have been assigned the same register and they get clobbered while
they're still live.

In this case the __attribute__((noinline)) is required - removing it allows the
code to work correctly.