Bug 42258 - [4.5 Regression] redundant register move around mul instruction
Summary: [4.5 Regression] redundant register move around mul instruction
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.5.0
: P2 normal
Target Milestone: 4.5.0
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords: missed-optimization, ra
Depends on: 22072
Blocks: 41653
  Show dependency treegraph
 
Reported: 2009-12-03 03:27 UTC by Carrot
Modified: 2024-02-22 22:30 UTC (History)
8 users (show)

See Also:
Host: i686-linux
Target: arm-eabi
Build: i686-linux
Known to work:
Known to fail:
Last reconfirmed: 2009-12-09 16:12:19


Attachments
A patch to fix it. (308 bytes, patch)
2010-03-17 11:44 UTC, Bernd Schmidt
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carrot 2009-12-03 03:27:19 UTC
Compile following function with options -march=armv5te -mthumb -Os

int mulx(int x)
{
  return x*42;
}

Gcc generates:

mulx:
        mov     r2, #42
        mov     r3, r2    //A
        mul     r3, r0
        @ sp needed for prologue
        mov     r0, r3    //B
        bx      lr

There are two redundant mov instructions marked as A and B. The ideal code should be:

    mov r2, #42
    mul r0, r2
    bx  lr

The root causes of the two movs are different.

A. 
In arm.md there is an insn pattern which can support 3-register multiplication.

(define_insn "*thumb_mulsi3"
  [(set (match_operand:SI          0 "register_operand" "=&l,&l,&l")
        (mult:SI (match_operand:SI 1 "register_operand" "%l,*h,0")
                 (match_operand:SI 2 "register_operand" "l,l,l")))]
  "TARGET_THUMB1 && !arm_arch6"
  "*
  if (which_alternative < 2)
    return \"mov\\t%0, %1\;mul\\t%0, %2\";
  else
    return \"mul\\t%0, %2\";
  "
  [(set_attr "length" "4,4,2")
   (set_attr "insn" "mul")]
)

So after expand we got the following rtx insn:

(insn 7 6 8 3 testD.i:2 (set (reg:SI 136)
        (mult:SI (reg:SI 137)
            (reg/v:SI 135 [ x ]))) -1 (nil))

For register allocator there is no difference between (set r3 mult(r0, r2))  and   (set r2 mult(r0, r2))   if the original r2 is dead after this instruction. Unfortunately the first form is chosen, so an extra mov is generated.

B.
After rtl expansion I got:

(insn 2 4 3 2 testD.i:2 (set (reg/v:SI 135 [ x ])
        (reg:SI 0 r0 [ x ])) -1 (nil))

(note 3 2 5 2 NOTE_INSN_FUNCTION_BEG)

(note 5 3 6 3 [bb 3] NOTE_INSN_BASIC_BLOCK)

(insn 6 5 7 3 testD.i:2 (set (reg:SI 137)
        (const_int 42 [0x2a])) -1 (nil))

(insn 7 6 8 3 testD.i:2 (set (reg:SI 136)
        (mult:SI (reg:SI 137)
            (reg/v:SI 135 [ x ]))) -1 (nil))

(insn 8 7 9 3 testD.i:2 (set (reg:SI 134 [ <retval> ])
        (reg:SI 136)) -1 (nil))

(jump_insn 9 8 10 3 testD.i:2 (set (pc)
        (label_ref 11)) -1 (nil)
 -> 11)

(barrier 10 9 16)

(note 16 10 13 4 [bb 4] NOTE_INSN_BASIC_BLOCK)

(insn 13 16 14 4 testD.i:4 (clobber (reg/i:SI 0 r0)) -1 (nil))

(insn 14 13 11 4 testD.i:4 (clobber (reg:SI 134 [ <retval> ])) -1 (nil))

(code_label 11 14 17 5 1 "" [1 uses])

(note 17 11 12 5 [bb 5] NOTE_INSN_BASIC_BLOCK)

(insn 12 17 15 5 testD.i:4 (set (reg/i:SI 0 r0)
        (reg:SI 134 [ <retval> ])) -1 (nil))

(insn 15 12 0 5 testD.i:4 (use (reg/i:SI 0 r0)) -1 (nil))

The extra mov comes from insn 12 and no later pass can eliminate it. Should we propagate expression into return register?
Comment 1 Carrot 2009-12-09 02:27:31 UTC
Gcc 4.4 doesn't have this problem. It is a new regression caused by patch 152533.
Comment 3 Richard Biener 2009-12-31 11:11:39 UTC
But the patch was backported.
Comment 4 Bernd Schmidt 2010-03-17 11:05:03 UTC
It's not immediately obvious to me why the ARM mulsi3 patterns are written the way they are - what are the earlyclobber tricks supposed to be good for?  Richard E., any clues?
Comment 5 Bernd Schmidt 2010-03-17 11:44:12 UTC
Created attachment 20123 [details]
A patch to fix it.

Okay, so the pattern is written strangely because it's a two-operand mul where the input and output may not be the same register.
It may be better to write this using an earlyclobbered in-out operand, but you can't show anymore that the operation is commutative.

It can be fixed with a simple peephole optimization, at least for this testcase.
Comment 6 Steven Bosscher 2010-03-17 11:59:49 UTC
Perhaps add a comment why the peephole is needed? That information tend to get lost rather quickly.
Comment 7 Bernd Schmidt 2010-03-19 18:19:07 UTC
Subject: Bug 42258

Author: bernds
Date: Fri Mar 19 18:18:54 2010
New Revision: 157581

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157581
Log:
gcc/
	PR rtl-optimization/42258
	* ira-lives.c (check_and_make_def_conflict): Ignore conflict for a
	use that may match DEF.

testsuite/
	PR rtl-optimization/42258
	* gcc.target/arm/thumb1-mul-moves.c: New test.



Added:
    trunk/gcc/testsuite/gcc.target/arm/thumb1-mul-moves.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira-lives.c
    trunk/gcc/testsuite/ChangeLog

Comment 8 Steven Bosscher 2010-03-20 12:58:17 UTC
Shouldeth be fixedeth by aforementionedeth patcheth (comment #7). Yay!