Compiling: typedef unsigned char Char; typedef Char *ScrnPtr; typedef ScrnPtr *ScrnBuf; typedef struct { int foo [1000]; int savelines; ScrnBuf visbuf; ScrnBuf allbuf; } TScreen; typedef struct _XtermWidgetRec { TScreen screen; int num_ptrs; } *XtermWidget; extern XtermWidget term; void VTallocbuf(void) { TScreen *screen = &term->screen; screen->visbuf = &screen->allbuf[term->num_ptrs * screen->savelines]; return; } with 4.0 and 4.1 -march=i686 -O2 generates: (the significant part of the sdiff is shown) movl 4012(%eax), %ecx | movl 4000(%edx), %ecx movl 4000(%eax), %edx | movl 4012(%edx), %eax imull %ecx, %edx | imull %ecx, %eax movl 4008(%eax), %ecx | movl 4008(%edx), %ecx leal (%ecx,%edx,4), %edx | sall $2, %eax movl %edx, 4004(%eax) | addl %ecx, %eax > movl %eax, 4004(%edx) Note that 4.1 generates sall + addl instead of a single leal instruction. This might be one of the reasons for 4.1 the code size regression in PR23153.
This again looks like it was caused by: I think this was caused by: 2005-07-30 Jan Hubicka <jh@suse.cz> * expr.c (expand_expr_real_1): Do not load mem targets into register. * i386.c (ix86_fixup_binary_operands): Likewise. (ix86_expand_unary_operator): Likewise. (ix86_expand_fp_absneg_operator): Likewise. * optabs.c (expand_vec_cond_expr): Validate dest.
Smaller test case: ===================================================== typedef struct { char **visbuf; char **allbuf; } TScreen; void VTallocbuf(TScreen *screen, unsigned long savelines) { screen->visbuf = &screen->allbuf[savelines]; } ===================================================== On AMD64 this gives me: salq $3, %rsi addq 8(%rdi), %rsi movq %rsi, (%rdi) ret instead of movq 8(%rdi), %rax leaq (%rax,%rsi,8), %rsi movq %rsi, (%rdi) ret
I'm not confident that using salq in Steven's test case is really a pessimization. I'll consider a 32-bit target then, and this testcase char ** VTallocbuf(char **allbuf, unsigned long savelines) { return &allbuf[savelines]; } For i686 we have movl 8(%esp), %eax movl 4(%esp), %edx sall $2, %eax addl %edx, %eax instead of movl 8(%esp), %eax movl 4(%esp), %edx leal (%edx,%eax,4), %eax However even in this case a no-lea code could be feasible: movl 8(%esp), %eax sall $2, %eax addl 4(%esp), %eax And this is exactly the rtl we have until peephole2, where this peephole splits the instruction: ;; Don't do logical operations with memory inputs. (define_peephole2 [(match_scratch:SI 2 "r") (parallel [(set (match_operand:SI 0 "register_operand" "") (match_operator:SI 3 "arith_or_logical_operator" [(match_dup 0) (match_operand:SI 1 "memory_operand" "")])) (clobber (reg:CC FLAGS_REG))])] "! optimize_size && ! TARGET_READ_MODIFY" [(set (match_dup 2) (match_dup 1)) (parallel [(set (match_dup 0) (match_op_dup 3 [(match_dup 0) (match_dup 2)])) (clobber (reg:CC FLAGS_REG))])] "") I think that Jan's patch should be conditionalized: if !optimize_size && !TARGET_READ_MODIFY, the transformation he removed will be done anyway, and too late in the game. Paolo
Jan, what's your analysis on this PR?
Subject: Re: [4.1 Regression] 4.1 generates sall + addl instead of leal > > > ------- Comment #4 from mmitchel at gcc dot gnu dot org 2005-10-31 04:45 ------- > Jan, what's your analysis on this PR? Hmm, I am still not sure if it matters too much, but since there are actually dupes of this problem, I think we can simply add peep2 fixing this particular common case. I am testing attached patch. /* { dg-do assemble { target i?86-*-* x86_64-*-* } } */ /* { dg-require-effective-target ilp32 } */ /* { dg-options "-O2 -march=i686" } */ /* { dg-final { scan-assembler "leal" } } */ typedef struct { char **visbuf; char **allbuf; } TScreen; void VTallocbuf(TScreen *screen, unsigned long savelines) { screen->visbuf = &screen->allbuf[savelines]; } 2005-10-31 Jan Hubicka <jh@suse.cz> PR target/23303 * i386.md: Add peep2 for simplyfing array accesses. Index: config/i386/i386.md =================================================================== --- config/i386/i386.md (revision 106275) +++ config/i386/i386.md (working copy) @@ -19785,6 +19785,58 @@ if (!rtx_equal_p (operands[0], operands[1])) emit_move_insn (operands[0], operands[1]); }) + +;; After splitting up read-modify operations, array accesses with memory operands +;; might end up in form: +;; sall $2, %eax +;; movl 4(%esp), %edx +;; addl %edx, %eax +;; instead of pre-splitting: +;; sall $2, %eax +;; addl 4(%esp), %eax +;; Turn it into: +;; movl 4(%esp), %edx +;; leal (%edx,%eax,4), %eax + +(define_peephole2 + [(parallel [(set (match_operand 0 "register_operand" "") + (ashift (match_operand 1 "register_operand" "") + (match_operand 2 "const_int_operand" ""))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_operand 3 "register_operand") + (match_operand 4 "x86_64_general_operand" "")) + (parallel [(set (match_operand 5 "register_operand" "") + (plus (match_operand 6 "register_operand" "") + (match_operand 7 "register_operand" ""))) + (clobber (reg:CC FLAGS_REG))])] +"INTVAL (operands[2]) >= 0 && INTVAL (operands[2]) <= 3 + /* Validate MODE for lea. */ + && ((!TARGET_PARTIAL_REG_STALL + && (GET_MODE (operands[0]) == QImode || GET_MODE (operands[0]) == HImode)) + || GET_MODE (operands[0]) == SImode + || (TARGET_64BIT && GET_MODE (operands[0]) == DImode)) + /* We reorder load and the shift. */ + && !rtx_equal_p (operands[1], operands[3]) + /* Last PLUS must consist of operand 0 and 3. */ + && !rtx_equal_p (operands[0], operands[3]) + && (rtx_equal_p (operands[3], operands[6]) + || rtx_equal_p (operands[3], operands[7])) + && (rtx_equal_p (operands[0], operands[6]) + || rtx_equal_p (operands[0], operands[7])) + /* The intermediate operand 0 must die or be same as output. */ + && (rtx_equal_p (operands[0], operands[5]) + || peep2_reg_dead_p (3, operands[0]))" + [(set (match_dup 3) (match_dup 4)) + (set (match_dup 0) (match_dup 1))] +{ + int scale = 1 << INTVAL (operands[2]); + rtx index = gen_lowpart (Pmode, operands[1]); + rtx base = gen_lowpart (Pmode, operands[3]); + rtx dest = gen_lowpart (Pmode, operands[5]); + + operands[1] = gen_rtx_PLUS (Pmode, base, gen_rtx_MULT (Pmode, index, GEN_INT (scale))); + operands[0] = dest; +}) ;; Call-value patterns last so that the wildcard operand does not ;; disrupt insn-recog's switch tables.
Patch posted: http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00011.html
(In reply to comment #5) > Hmm, > I am still not sure if it matters too much, but since there are actually > dupes of this problem, I think we can simply add peep2 fixing this > particular common case. > > I am testing attached patch. Could you please try to measure the code size impact of this patch? (like the examples in PR23153: xterm, PR8361 or kernel)
Subject: Bug 23303 Author: hubicka Date: Wed Nov 2 23:21:22 2005 New Revision: 106406 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=106406 Log: PR target/23303 * i386.md: Add peep2 for simplyfing array accesses. * gcc.dg/i386-lea.c: New test Added: trunk/gcc/testsuite/gcc.dg/i386-lea.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md trunk/gcc/testsuite/ChangeLog
Fixed in mainline now.