In the function below, IRA/reload generates a frame/frame pointer and spills one pointer variable to the frame. This is bad code because there are many hard registers free and there is no need for a frame and taking away the frame pointer register Y. Notice that on the hardware, it is not possible to address the frame via the stack pointer. == Background == The issue is that AVR has only 3 pointer registers X, Y, and Z with the following addressing capabilities: *X, *X++, *--X (R27:R26, call-clobbered) *Y, *Y++, *--Y, *(Y+const) (R28:R29, call-saved, frame pointer) *Z, *Z++, *--Z, *(Z+const) (R30:R31, call-clobbered) Older version of the compiler prior to 4.7 trunk r179993 allowed an addressing mode *(X+const) and emulated it by emitting appropriate instructions sequence as X = X + const r = *X X = X - const which was only a rare corner case in the old register allocator, but in the new allocator this sequence is seen very often leading to code bloat of +50% for some real-world functions. This is the reason why -mstrict-X has been added to the AVR backend, see PR46278. This option denies fake *(X+const) addressing but leads to the mentioned spills from register allocator and to code even worse as compared to without setting -mstrict-X. == Source == typedef struct { unsigned char a, b, c, d; } s_t; unsigned char func1 (s_t *x, s_t *y, s_t *z) { unsigned char s = 0; s += x->a; s += y->a; s += z->a; s += x->b; s += y->b; s += z->b; s += x->c; s += y->c; s += z->c; return s; } == Assembler output == Besides the bad code of upappropriate setting up a frame, note that the frame is 4 bytes but only 2 bytes (Y+1,Y+2) are actually used. func1: push r28 ; 45 pushqi1/1 [length = 1] push r29 ; 46 pushqi1/1 [length = 1] ; SP -= 4 ; 50 *addhi3_sp_R [length = 2] rcall . rcall . in r28,__SP_L__ ; 51 *movhi/8 [length = 2] in r29,__SP_H__ /* prologue: function */ /* frame size = 4 */ /* stack size = 6 */ .L__stack_usage = 6 std Y+2,r25 ; 2 *movhi/4 [length = 2] std Y+1,r24 movw r26,r22 ; 37 *movhi/1 [length = 1] ld r25,X ; 8 movqi_insn/4 [length = 1] ldd r30,Y+1 ; 38 *movhi/3 [length = 2] ldd r31,Y+2 ld r24,Z ; 9 movqi_insn/4 [length = 1] add r25,r24 ; 10 addqi3/1 [length = 1] movw r26,r20 ; 39 *movhi/1 [length = 1] ld r24,X ; 11 movqi_insn/4 [length = 1] add r25,r24 ; 12 addqi3/1 [length = 1] ldd r24,Z+1 ; 13 movqi_insn/4 [length = 1] add r25,r24 ; 14 addqi3/1 [length = 1] movw r30,r22 ; 40 *movhi/1 [length = 1] ldd r24,Z+1 ; 15 movqi_insn/4 [length = 1] add r25,r24 ; 16 addqi3/1 [length = 1] movw r30,r20 ; 41 *movhi/1 [length = 1] ldd r24,Z+1 ; 17 movqi_insn/4 [length = 1] add r25,r24 ; 18 addqi3/1 [length = 1] ldd r30,Y+1 ; 42 *movhi/3 [length = 2] ldd r31,Y+2 ldd r24,Z+2 ; 19 movqi_insn/4 [length = 1] add r25,r24 ; 20 addqi3/1 [length = 1] movw r30,r22 ; 43 *movhi/1 [length = 1] ldd r24,Z+2 ; 21 movqi_insn/4 [length = 1] add r25,r24 ; 22 addqi3/1 [length = 1] movw r30,r20 ; 44 *movhi/1 [length = 1] ldd r24,Z+2 ; 23 movqi_insn/4 [length = 1] add r24,r25 ; 29 addqi3/1 [length = 1] /* epilogue start */ ; SP += 4 ; 56 *addhi3_sp_R [length = 4] pop __tmp_reg__ pop __tmp_reg__ pop __tmp_reg__ pop __tmp_reg__ pop r29 ; 57 popqi [length = 1] pop r28 ; 58 popqi [length = 1] ret ; 59 return_from_epilogue [length = 1] .size func1, .-func1 .ident "GCC: (GNU) 4.7.0 20111017 (experimental)" == Command line == $ avr-gcc in.c -c -save-temps -dp -Os -mmcu=avr4 -mstrict-X -v Same happens with -O2, -O3 or with -fira-algorithm=priority. Target: avr Configured with: ../../gcc.gnu.org/trunk/configure --target=avr --prefix=/local/gnu/install/gcc-4.7 --disable-nls --disable-shared --enable-languages=c,c++ --with-dwarf2 --disable-lto --enable-checking=yes,rtl Thread model: single gcc version 4.7.0 20111017 (experimental) (GCC) GNU C (GCC) version 4.7.0 20111017 (experimental) (avr) compiled by GNU C version 4.3.2 [gcc-4_3-branch revision 141291], GMP version 5.0.1, MPFR version 3.0.0-p8, MPC version 0.8.2
Created attachment 25542 [details] in.c Source code from comment #c0 as attachment for conveniance.
Just a copy of my analysis from mailing list: > Can anyone give me some advice how to proceed with this issue? > > Can be said if this is a target issue or IRA/reload flaw? It's not a costs related problem. I think that I can explain a problem. I think that it's an IRA bug. > Spilling for insn 11. > Using reg 26 for reload 0 > Spilling for insn 17. > Using reg 30 for reload 0 > Spilling for insn 23. > Using reg 30 for reload 0 > Try Assign 60(a6), cost=16000 Wrong thing starts here... ira-color.c:4120 allocno_reload_assign (a, forbidden_regs); > changing reg in insn 2 > changing reg in insn 9 > changing reg in insn 13 > changing reg in insn 19 > Assigning 60(freq=4000) a new slot 0 > Register 60 now on stack. Call trace: allocno_reload_assign() -> assign_hard_reg() -> get_conflict_profitable_regs() The `get_conflict_profitable_regs' calculates wrong `profitable_regs[1]' (Special for Vladimir) AVR is an 8 bits microcontroller. The AVR has only 3 pointer registers X, Y, and Z with the following addressing capabilities: *X, *X++, *--X (R27:R26, call-clobbered) *Y, *Y++, *--Y, *(Y+const) (R28:R29, call-saved, frame pointer) *Z, *Z++, *--Z, *(Z+const) (R30:R31, call-clobbered) Also, all modes larger than 8 bits should start in an even register. So, `get_conflict_profitable_regs' trying to calculate two arrays: - profitable_regs[0] for first word of register 60(a6) - profitable_regs[1] for second word of register 60(a6) Values of `profitable_regs': (gdb) p print_hard_reg_set (stderr,profitable_regs[0] , 01) 0-2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 $63 = void (gdb) p print_hard_reg_set (stderr,profitable_regs[1] , 01) 0-2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 They are equal ! It's wrong because second word of register 60(a6) must be allocated to odd register. This is a wrong place in `get_conflict_profitable_regs': ... nwords = ALLOCNO_NUM_OBJECTS (a); for (i = 0; i < nwords; i++) { obj = ALLOCNO_OBJECT (a, i); COPY_HARD_REG_SET (conflict_regs[i], OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)); if (retry_p) { COPY_HARD_REG_SET (profitable_regs[i], reg_class_contents[ALLOCNO_CLASS (a)]); AND_COMPL_HARD_REG_SET (profitable_regs[i], ira_prohibited_class_mode_regs [ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)]); -------------------------------------------------------------^^^^^^^^^^^^^^^^^^^^^^^^^ } ALLOCNO_MODE (a) is a right mode for first word (word = 8bits register) But it's wrong mode for second word of allocno. Even more, ALLOCNO_MODE (a) is a right mode only for whole allocno. If we want to spill/load/store separate parts(IRA objects) of allocno we must use mode of each part(object). `ira_prohibited_class_mode_regs' derived only from HARD_REGNO_MODE_OK. So, the second word of 60(a6) permitted to any register after first word of 60(a6). For AVR: profitable_regs[1] = profitable_regs[0] << 1 Also, I have a question about the following fields of `ira_allocno': /* The number of objects tracked in the following array. */ int num_objects; /* An array of structures describing conflict information and live ranges for each object associated with the allocno. There may be more than one such object in cases where the allocno represents a multi-word register. */ ira_object_t objects[2]; --------------------------^^^^^ The SImode for AVR consists of 4 words, but only 2 objects in allocno structure. Is this right ? Denis.
Created attachment 25829 [details] RA dump
(In reply to comment #2) > > Also, I have a question about the following fields of `ira_allocno': > /* The number of objects tracked in the following array. */ > int num_objects; > /* An array of structures describing conflict information and live > ranges for each object associated with the allocno. There may be > more than one such object in cases where the allocno represents a > multi-word register. */ > ira_object_t objects[2]; > --------------------------^^^^^ > The SImode for AVR consists of 4 words, but only 2 objects in allocno > structure. > Is this right ? > > Yes, that is right. IRA objects were introduced by By Bernd Schmidt. Unfortunately, I did not review his patch. Probably, Bernd decided that 2 hard regs allocno covers most cases (and may be he is right). Other multi regs allocno is processed as one object (it means that all one register parts conflict with all another one register parts even if in reality one part does not conflict with another allocno part). Wrong profitable hard regs calculation for register files requiring aligned start register was a merging problem with a patch for allocation without cover classes. I'll try make a patch this week to solve the problem. Dennis, thanks for detail analysis of the problem. It saved my time.
(In reply to comment #4) > Wrong profitable hard regs calculation for register files requiring aligned > start register was a merging problem with a patch for allocation without cover > classes. > > I'll try make a patch this week to solve the problem. Thanks you are taking care of this. Will it also improve the situation for 3-byte types as introduced in PR50931? 3-byte types also start in even registers.
(In reply to comment #5) > (In reply to comment #4) > > > Wrong profitable hard regs calculation for register files requiring aligned > > start register was a merging problem with a patch for allocation without cover > > classes. > > > > I'll try make a patch this week to solve the problem. > > Thanks you are taking care of this. Will it also improve the situation for > 3-byte types as introduced in PR50931? 3-byte types also start in even > registers. I think it will improve. Sorry for the delay with the patch. The changes are big (the patch is about 1700 lines long) so I need a thorough testing.
Author: vmakarov Date: Mon Dec 5 17:02:54 2011 New Revision: 182015 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=182015 Log: 2011-12-05 Vladimir Makarov <vmakarov@redhat.com> PR other/50775 * ira-int.h (struct ira_object): Remove add_data. (OBJECT_ADD_DATA): Remove. * ira-build.c (ira_create_object): Remove OBJECT_ADD_DATA initialization. * ira-color.c (object_hard_regs_t, object_hard_regs): Rename to allocno_hard_regs_t, allocno_hard_regs. (object_hard_regs_node_t, object_hard_regs_node): Rename to allocno_hard_regs_node_t and allocno_hard_regs_node. (struct allocno_color_data): Add new member last_process. Move profitable_hard_regs, hard_regs_node, and hard_regs_subnodes_start from object_color_data. (object_color_data_t, object_color_data, OBJECT_COLOR_DATA): Remove. (curr_allocno_process): New static variable. (object_hard_regs_eq, object_hard_regs_htab): Rename to allocno_hard_regs_eq and allocno_hard_regs_htab. (init_object_hard_regs, finish_object_hard_regs): Rename to init_allocno_hard_regs and finish_allocno_hard_regs. (object_hard_regs_compare, object_hard_regs_node_t): Rename to allocno_hard_regs_compare and allocno_hard_regs_node_t. (create_new_object_hard_regs_node): Rename to create_new_allocno_hard_regs_node. (add_new_object_hard_regs_node_to_forest): Rename to add_new_allocno_hard_regs_node_to_forest. (add_object_hard_regs_to_forest, collect_object_hard_regs_cover): Rename to add_allocno_hard_regs_to_forest and collect_allocno_hard_regs_cover. (setup_object_hard_regs_nodes_parent): Rename to setup_allocno_hard_regs_nodes_parent. (remove_unused_object_hard_regs_nodes): Rename to remove_unused_allocno_hard_regs_nodes. (enumerate_object_hard_regs_nodes, object_hard_regs_nodes_num): Rename to enumerate_allocno_hard_regs_nodes and allocno_hard_regs_nodes_num. (object_hard_regs_nodes, object_hard_regs_subnode_t): Rename to allocno_hard_regs_nodes and allocno_hard_regs_subnode_t. (object_hard_regs_subnode, object_hard_regs_subnodes): Rename to allocno_hard_regs_subnode and allocno_hard_regs_subnodes. (object_hard_regs_subnode_index): Rename to allocno_hard_regs_subnode_index. (setup_object_hard_regs_subnode_index): Rename to setup_allocno_hard_regs_subnode_index. (get_object_hard_regs_subnodes_num): Rename to get_allocno_hard_regs_subnodes_num. (form_object_hard_regs_nodes_forest): Rename to form_allocno_hard_regs_nodes_forest. (finish_object_hard_regs_nodes_tree): Rename to form_allocno_hard_regs_nodes_forest (finish_object_hard_regs_nodes_forest): Rename to finish_allocno_hard_regs_nodes_forest. (setup_left_conflict_sizes_p): Use allocno data instead of object ones. Process conflict allocnos once. (update_left_conflict_sizes_p): Use allocno data instead of object ones. Change prototype signature. (empty_profitable_hard_regs): Use allocno data instead of object ones. (setup_profitable_hard_regs): Ditto. (get_conflict_profitable_regs): Rename to get_conflict_and_start_profitable_regs. Use allocno data for profitable regs calculation. (check_hard_reg_p): Change prototype signature. Check profitable regs for allocno not the objects. (assign_hard_reg): Process conflict allocnos only once for updating conflict costs. (setup_allocno_available_regs_num): Use allocno data instead of object ones. Modify debug output. (color_pass): Remove initialization and finalization of object color data. Modified: trunk/gcc/ChangeLog trunk/gcc/ira-build.c trunk/gcc/ira-color.c trunk/gcc/ira-int.h
Just an FYI; Bernd's work to introduce IRA objects was designed to his the most common case for multi-word objects, which is by far double-word objects.
Code looks much better now. Great! With options $ avr-gcc in.c -c -save-temps -dp -Os -mmcu=avr4 -mstrict-X The code is: func1: push r28 ; 43 pushqi1/1 [length = 1] push r29 ; 44 pushqi1/1 [length = 1] /* prologue: function */ /* frame size = 0 */ /* stack size = 2 */ .L__stack_usage = 2 movw r18,r24 ; 2 *movhi/1 [length = 1] movw r28,r22 ; 3 *movhi/1 [length = 1] ld r24,Y ; 8 movqi_insn/4 [length = 1] movw r26,r18 ; 37 *movhi/1 [length = 1] ld r25,X ; 9 movqi_insn/4 [length = 1] add r24,r25 ; 10 addqi3/1 [length = 1] movw r30,r20 ; 38 *movhi/1 [length = 1] ld r25,Z ; 11 movqi_insn/4 [length = 1] add r24,r25 ; 12 addqi3/1 [length = 1] movw r30,r18 ; 39 *movhi/1 [length = 1] ldd r25,Z+1 ; 13 movqi_insn/4 [length = 1] add r24,r25 ; 14 addqi3/1 [length = 1] ldd r25,Y+1 ; 15 movqi_insn/4 [length = 1] add r24,r25 ; 16 addqi3/1 [length = 1] movw r30,r20 ; 40 *movhi/1 [length = 1] ldd r25,Z+1 ; 17 movqi_insn/4 [length = 1] add r24,r25 ; 18 addqi3/1 [length = 1] movw r30,r18 ; 41 *movhi/1 [length = 1] ldd r25,Z+2 ; 19 movqi_insn/4 [length = 1] add r24,r25 ; 20 addqi3/1 [length = 1] ldd r25,Y+2 ; 21 movqi_insn/4 [length = 1] add r24,r25 ; 22 addqi3/1 [length = 1] movw r30,r20 ; 42 *movhi/1 [length = 1] ldd r25,Z+2 ; 23 movqi_insn/4 [length = 1] add r24,r25 ; 29 addqi3/1 [length = 1] /* epilogue start */ pop r29 ; 47 popqi [length = 1] pop r28 ; 48 popqi [length = 1] ret ; 49 return_from_epilogue [length = 1]