I've found the bug working on direct threaded interpreter for PHP. Moving from GCC 4.3 to GCC 4.4 caused a significant performance degradation. Looking into produced assembler code I realized that GCC 4.4 doesn't replace all jmps to indirect jmp with indirect jmp itself. The reason is the following new condition in function duplicate_computed_gotos() bb-reorder.c if (!optimize_bb_for_size_p (bb)) continue; I thought I would able to fix the problem using "hot" attribute, but according to this condition, in case I mark function with __attribute__((hot)) duplication doesn't work, and in case I mark it with __attribute__((cold)) it starts work. As result "hot" function works slower than "cold". You can use the simplified code to verify it. I ran it with 'gcc -O2 -S direct.c' direct.c -------- #define NEXT goto **ip++ #define guard(n) asm("#" #n) __attribute__((cold)) void *emu (void **prog) { static void *labels[] = {&&next1,&&next2,&&next3,&&next4,&&next5,&&next6,&&next7,&&next8,&&next9,&&loop}; void **ip; int count; if (!prog) { return labels; } ip=prog; count = 10000000; NEXT; next1: guard(1); NEXT; next2: guard(2); NEXT; next3: guard(3); NEXT; next4: guard(4); NEXT; next5: guard(5); NEXT; next6: guard(6); NEXT; next7: guard(7); NEXT; next8: guard(8); NEXT; next9: guard(9); NEXT; loop: if (count>0) { count--; ip=prog; NEXT; } return 0; } int main() { void *prog[] = {(void*)0,(void*)1, (void*)0,(void*)2, (void*)0,(void*)3, (void*)0,(void*)4, (void*)0,(void*)9}; void **labels = emu(0); int i; for (i=0; i < sizeof(prog)/sizeof(prog[0]); i++) { prog[i] = labels[(int)prog[i]]; } emu(prog); return 0; } I saw that the check causing the slowdown was removed in trunk, however I can't check that it was done in a proper way.
Duplicate of PR42621?
yes. It's definitely the same issue. The only additional note that __attribute__((hot)) doesn't fix the problem (as I would expect tracing down optimize_bb_for_size_p()), but makes an additional slowdown. In opposite, the __attribute__((cold)) solves the issue. It looks very strange. I suppose some condition has to be inverted :)
(In reply to comment #1) > Duplicate of PR42621? And probably a duplicate of bug 39284 also. (In reply to comment #2) > yes. It's definitely the same issue. > > The only additional note that __attribute__((hot)) doesn't fix the problem (as > I would expect tracing down optimize_bb_for_size_p()), but makes an additional > slowdown. In opposite, the __attribute__((cold)) solves the issue. It looks > very strange. > > I suppose some condition has to be inverted :) Both __attribute__((cold)) and __attribute__((hot)) have this issue with GCC 4.5.2 on my Gentoo Linux box. Neither of them solves it.
Fixed in GCC 4.5. I see the duplicated indirect jumps for the no attribute case and the attribute hot cases. And for the cold case, it is a direct jump to a bb containing the indirect jump. Also fully fixed for GCC 5 by r5-1621.