Bug 43686

Summary: GCC doesn't duplicate computed gotos for functions marked as "hot"
Product: gcc Reporter: Dmitry Stogov <dmitry>
Component: middle-endAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: dmitry, gcc-bugs, jaak, ktietz
Priority: P3 Keywords: missed-optimization
Version: 4.4.3   
Target Milestone: 5.0   
Host: i686-redhat-linux Target: i686-redhat-linux
Build: i686-redhat-linux Known to work:
Known to fail: Last reconfirmed:

Description Dmitry Stogov 2010-04-08 12:46:22 UTC
I've found the bug working on direct threaded interpreter for PHP. Moving from GCC 4.3 to GCC 4.4 caused a significant performance degradation. Looking into produced assembler code I realized that GCC 4.4 doesn't replace all jmps to indirect jmp with indirect jmp itself. The reason is the following new condition in function duplicate_computed_gotos() bb-reorder.c 

if (!optimize_bb_for_size_p (bb))
  continue;

I thought I would able to fix the problem using "hot" attribute, but
according to this condition, in case I mark function with __attribute__((hot)) duplication doesn't work, and in case I mark it with __attribute__((cold)) it starts work. As result "hot" function works slower than "cold".

You can use the simplified code to verify it. I ran it with 'gcc -O2 -S direct.c'

direct.c
--------
#define NEXT goto **ip++
#define guard(n) asm("#" #n)

__attribute__((cold)) void *emu (void **prog)
{
  static void  *labels[] = {&&next1,&&next2,&&next3,&&next4,&&next5,&&next6,&&next7,&&next8,&&next9,&&loop};
  void **ip;
  int    count;

  if (!prog) {
	  return labels;
  }  

  ip=prog;
  count = 10000000;

  
  NEXT;
 next1:
  guard(1);
  NEXT;
 next2:
  guard(2);
  NEXT;
 next3:
  guard(3);
  NEXT;
 next4:
  guard(4);
  NEXT;
 next5:
  guard(5);
  NEXT;
 next6:
  guard(6);
  NEXT;
 next7:
  guard(7);
  NEXT;
 next8:
  guard(8);
  NEXT;
 next9:
  guard(9);
  NEXT;
 loop:
  if (count>0) {
    count--;
    ip=prog;
    NEXT;
  }
  return 0;
}


int main() {
	void *prog[]   = {(void*)0,(void*)1,
	                  (void*)0,(void*)2,
	                  (void*)0,(void*)3,
	                  (void*)0,(void*)4,
	                  (void*)0,(void*)9};
	void **labels = emu(0);
	int i;
	for (i=0; i < sizeof(prog)/sizeof(prog[0]); i++) {
		prog[i] = labels[(int)prog[i]];
	}
	emu(prog);
	return 0;
}

I saw that the check causing the slowdown was removed in trunk, however I can't check that it was done in a proper way.
Comment 1 Mikael Pettersson 2010-04-08 13:32:15 UTC
Duplicate of PR42621?
Comment 2 Dmitry Stogov 2010-04-08 13:54:42 UTC
yes. It's definitely the same issue.

The only additional note that __attribute__((hot)) doesn't fix the problem (as I would expect tracing down optimize_bb_for_size_p()), but makes an additional slowdown. In opposite, the __attribute__((cold)) solves the issue. It looks very strange.

I suppose some condition has to be inverted :)
Comment 3 Jaak Ristioja 2011-06-10 09:19:49 UTC
(In reply to comment #1)
> Duplicate of PR42621?

And probably a duplicate of bug 39284 also.

(In reply to comment #2)
> yes. It's definitely the same issue.
> 
> The only additional note that __attribute__((hot)) doesn't fix the problem (as
> I would expect tracing down optimize_bb_for_size_p()), but makes an additional
> slowdown. In opposite, the __attribute__((cold)) solves the issue. It looks
> very strange.
> 
> I suppose some condition has to be inverted :)

Both __attribute__((cold)) and __attribute__((hot)) have this issue with GCC 4.5.2 on my Gentoo Linux box. Neither of them solves it.
Comment 4 Andrew Pinski 2021-07-24 06:03:27 UTC
Fixed in GCC 4.5.
I see the duplicated indirect jumps for the no attribute case and the attribute hot cases.
And for the cold case, it is a direct jump to a bb containing the indirect jump.

Also fully fixed for GCC 5 by r5-1621.