This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/80862] New: [x86] Wrong rounding results for some test cases


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80862

            Bug ID: 80862
           Summary: [x86] Wrong rounding results for some test cases
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sebastian.peryt at intel dot com
                CC: julia.koval at intel dot com, ubizjak at gmail dot com
  Target Milestone: ---
            Target: X86

Created attachment 41408
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41408&action=edit
Patch to reproduce described error.

Recently I have found that rounding intrinsics for some particular cases
produce wrong results. There have to be three specific conditions fulfilled to
produce it:
- test has to be compiled with O1 or O2 (doesn't appear on O0),
- test case has to have only two intrinsics - regular (e.g. _mm512_cvtps_epi32)
and round (e.g. _mm512_cvt_roundps_epi32),
- both intrinsics must use the same input argument.

As a result value from first (regular) intrinsic is copied to the second
(round)intrinsic result. In asm output it can be seen that the same register is
used for both assignments:

vcvtps2dq %zmm0, %zmm1
vmovdqa64 %zmm1, -368(%rbp)
pushq -312(%rbp)
pushq -320(%rbp)
pushq -328(%rbp)
vcvtps2dq {rz-sae}, %zmm0, %zmm0
pushq -336(%rbp)
vmovdqa64 %zmm1, -304(%rbp)

>From what I gathered so far this is happening due to the use of parallel side
effect for rounding md template in i386/subst.md. Because parallel is executing
each side effect individually at first, on cse1 pass the part which is similar
for both intrinsics get optimized. After that the same register is assigned for
move operation in both assignments of the results and effectively regular and
round intrinsic produces the same result.

Probably some other side effect has to be used to set rounding flags to fix
this issue, but I am not sure which one it should be. Eventually some
modifications have to be made in cse.c to properly handle such use of parallel.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]