On AMD64 the following piece of code still triggers one jump threading opportunity in the RTL threader that we miss it in the tree threader on the tree-cleanup-branch: ===================================================== extern int x; extern int y; void foo (void) { if ((x & 0x00000001) || (x & 0x00004000)) y = 0; if ((x & 0x00000001) || (x & 0x00004000)) y = 1; if ((x & 0x00000001) || (x & 0x00004000)) y = 2; if ((x & 0x00000001) || (x & 0x00004000)) y = 3; if ((x & 0x00000001) || (x & 0x00004000)) y = 4; } ===================================================== This test case was reduced from insn-opinit.c where we're still threading 405 jumps, most of them of the kind shown in the test case, with the RTL threader. We do catch this on mainline. The assembly for the tree-cleanup-branch is worse than mainline: MAINLINE: -O2 TCB: -O2 .file "t.c" .file "t.c" .text .text .p2align 4,,15 .p2align 4,,15 .globl foo .globl foo .type foo, @function .type foo, @function foo: foo: .LFB2: .LFB2: movl x(%rip), %eax movl x(%rip), %eax movl %eax, %edx movl %eax, %edx andl $1, %edx andl $1, %edx jne .L2 jne .L2 testb $64, %ah testb $64, %ah je .L19 | je .L4 .L2: .L2: testb %dl, %dl testb %dl, %dl movl $0, y(%rip) movl $0, y(%rip) je .L23 | jne .L5 > testb $64, %ah > je .L4 > .L5: testb %dl, %dl testb %dl, %dl movl $1, y(%rip) movl $1, y(%rip) je .L24 | je .L4 .L10: < testb %dl, %dl testb %dl, %dl movl $2, y(%rip) movl $2, y(%rip) je .L25 | je .L16 .L14: | .L9: testb %dl, %dl testb %dl, %dl movl $3, y(%rip) movl $3, y(%rip) je .L26 | je .L17 .L17: | .L12: movl $4, y(%rip) movl $4, y(%rip) .L19: | .L14: rep ; ret rep ; ret .p2align 4,,7 .p2align 4,,7 .L23: | .L4: testb $64, %ah testb $64, %ah je .L19 | je .L14 testb %dl, %dl < movl $1, y(%rip) < jne .L10 < .L24: < testb $64, %ah < je .L19 < testb %dl, %dl testb %dl, %dl movl $2, y(%rip) movl $2, y(%rip) jne .L14 | jne .L9 .L25: | jmp .L16 > .p2align 4,,7 > .L17: testb $64, %ah testb $64, %ah je .L19 | .p2align 4,,2 > je .L14 > movl $4, y(%rip) > .p2align 4,,4 > jmp .L14 > .p2align 4,,7 > .L16: > testb $64, %ah > .p2align 4,,2 > je .L14 testb %dl, %dl testb %dl, %dl movl $3, y(%rip) movl $3, y(%rip) jne .L17 | jne .L12 .L26: < testb $64, %ah < jne .L17 < .p2align 4,,2 .p2align 4,,2 ret | jmp .L17 .LFE2: .LFE2: .size foo, .-foo .size foo, .-foo This could come from no longer iterating DOM, I guess??
If I make the x variable a paramater and y a variable (return y so that y is still used) then it works on the tree level so this is an aliasing causing missed optimization. Aka this works: int foo (int x) { int y =-1; if ((x & 0x00000001) || (x & 0x00004000)) y = 0; if ((x & 0x00000001) || (x & 0x00004000)) y = 1; if ((x & 0x00000001) || (x & 0x00004000)) y = 2; if ((x & 0x00000001) || (x & 0x00004000)) y = 3; if ((x & 0x00000001) || (x & 0x00004000)) y = 4; return y; }
I'll note the updated jump threading selection code will catch all these threading opportunities. I get something like this: foo: pushl %ebp movl %esp, %ebp movl x, %eax testb $1, %al jne .L2 testb $64, %ah je .L7 .L2: movl $3, y movl $4, y .L7: leave ret I'll note we still have dead stores. :( Missed by both the tree-ssa optimizers because we don't handle V_MUST_DEF and the RTL optimizers for reasons unknown.
The threading part of this has been fixed Now we just need to fix DSE to finish cleaning things up.
Nice. And indeed surprising that the RTL DSE doesn't catch that trivially dead store. Should I open a separate bug report for that?
Subject: Re: Missed jump threading optimization On Sat, 2005-04-23 at 16:54 +0000, steven at gcc dot gnu dot org wrote: > ------- Additional Comments From steven at gcc dot gnu dot org 2005-04-23 16:54 ------- > Nice. And indeed surprising that the RTL DSE doesn't catch that trivially > dead store. Should I open a separate bug report for that? Your call. Or we could just link this bug into the existing DSE bug. I think we'd be better off improving the tree DSE rather than the RTL stuff. This one is something we really should be catching before we hand the code off to the RTL expanders. jeff
Fixed both the DSE and the threading issue.