[Bug middle-end/90056] 548.exchange2_r regressions on AMD Zen
marxin at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Apr 15 11:51:00 GMT 2019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90056
--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
Created attachment 46169
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46169&action=edit
perf annotate - Ofast native vs. Ofast native PGO
I'm attaching HTML and txt perf annotate for Ofast native and Ofast native PGO
builds. As seen, it's still the same story. There's a big register pressure
that leads to spilling of some of the induction variables.
For these builds, the most significant difference is:
GOOD:
: if(block(row, 4, i4) <= 0) cycle
0.00 : 41c660: mov (%r9),%r12d
1.99 : 41c663: mov %r11d,0x80(%rsp)
0.11 : 41c66b: mov %r11d,%edx
0.02 : 41c66e: test %r12d,%r12d
0.15 : 41c671: jg 41c7b0
<__brute_force_MOD_digits_2+0xe00>
0.01 : 41c677: inc %r11
0.64 : 41c67a: add $0x144,%r9
0.13 : 41c681: add $0x144,%r8
0.05 : 41c688: add $0x144,%r10
: do i4 = l(4), u(4)
0.15 : 41c68f: cmp %r11d,0x6c(%rsp)
2.39 : 41c694: jge 41c660
<__brute_force_MOD_digits_2+0xcb0>
0.00 : 41c696: mov 0x168(%rsp),%r10
0.55 : 41c69e: mov 0x170(%rsp),%r9
0.08 : 41c6a6: mov 0x178(%rsp),%r11
0.05 : 41c6ae: mov 0x180(%rsp),%r8
: block(row, 4:9, i3) = block(row, 4:9, i3) + 10
BAD:
: if(block(row, 4, i4) <= 0) cycle
0.05 : 41a8b0: mov (%r11),%edi
0.78 : 41a8b3: mov %r10d,0x84(%rsp)
0.04 : 41a8bb: mov %r10d,%r13d
0.01 : 41a8be: test %edi,%edi
0.26 : 41a8c0: jg 41aa10
<__brute_force_MOD_digits_2+0x1210>
0.44 : 41a8c6: addq $0x144,0x48(%rsp)
4.04 : 41a8cf: addq $0x144,0x58(%rsp)
1.31 : 41a8d8: inc %r10
0.02 : 41a8db: add $0x144,%r11
: do i4 = l(4), u(4)
0.01 : 41a8e2: cmp %r10d,0x88(%rsp)
0.25 : 41a8ea: jge 41a8b0
<__brute_force_MOD_digits_2+0x10b0>
: block(row, 4:9, i3) = block(row, 4:9, i3) + 10
0.03 : 41a8ec: mov 0xd0(%rsp),%r15
0.27 : 41a8f4: addl $0xa,-0xdc(%r15)
0.20 : 41a8fc: addl $0xa,-0xb8(%r15)
0.01 : 41a904: addl $0xa,-0x94(%r15)
0.07 : 41a90c: addl $0xa,-0x70(%r15)
0.05 : 41a911: addl $0xa,-0x4c(%r15)
0.06 : 41a916: addl $0xa,-0x28(%r15)
The benchmark is quite unpredictable, I'm leaving that for now.
More information about the Gcc-bugs
mailing list