[Bug tree-optimization/85730] New: complex code for modifying lowest byte in a 4-byte vector
zsojka at seznam dot cz
gcc-bugzilla@gcc.gnu.org
Thu May 10 11:39:00 GMT 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85730
Bug ID: 85730
Summary: complex code for modifying lowest byte in a 4-byte
vector
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: zsojka at seznam dot cz
Target Milestone: ---
Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu
Created attachment 44109
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44109&action=edit
reduced testcase
The attached testcase has 3 implementations of the same function, yet the
compiled code differs: (@ -O3)
foo:
movsx edx, dil
mov eax, edi
add edx, edx
mov al, dl
ret
bar:
mov eax, edi
add al, al
ret
baz:
movsx edx, dil
mov eax, edi
add edx, edx
mov al, dl
ret
bar() has the shortest code and is also using fewer registers. I tried
benchmarking all 3 functions on a Skylake CPU; I could not find out which
function is the fastest (the jitter was too high).
The difference between foo() and bar() is that bar() is compiled with
-fno-tree-ccp -fno-tree-fre. baz() has one extra constant in the code, which
needs to be propagated in foor() and bar().
More information about the Gcc-bugs
mailing list