[Bug tree-optimization/85730] New: complex code for modifying lowest byte in a 4-byte vector

zsojka at seznam dot cz gcc-bugzilla@gcc.gnu.org
Thu May 10 11:39:00 GMT 2018


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85730

            Bug ID: 85730
           Summary: complex code for modifying lowest byte in a 4-byte
                    vector
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zsojka at seznam dot cz
  Target Milestone: ---
              Host: x86_64-pc-linux-gnu
            Target: x86_64-pc-linux-gnu

Created attachment 44109
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44109&action=edit
reduced testcase

The attached testcase has 3 implementations of the same function, yet the
compiled code differs: (@ -O3)

foo:
        movsx   edx, dil
        mov     eax, edi
        add     edx, edx
        mov     al, dl
        ret

bar:
        mov     eax, edi
        add     al, al
        ret

baz:
        movsx   edx, dil
        mov     eax, edi
        add     edx, edx
        mov     al, dl
        ret

bar() has the shortest code and is also using fewer registers. I tried
benchmarking all 3 functions on a Skylake CPU; I could not find out which
function is the fastest (the jitter was too high).

The difference between foo() and bar() is that bar() is compiled with
-fno-tree-ccp -fno-tree-fre. baz() has one extra constant in the code, which
needs to be propagated in foor() and bar().


More information about the Gcc-bugs mailing list