m68k/regclass/regmove fun
Jeffrey A Law
law@upchuck.cygnus.com
Mon Mar 29 08:19:00 GMT 1999
Consider this code compiled for a m68k;
subroutine daxpy(n,da,dx,dy)
double precision dx(n),dy(n),da
integer i,n
do i = 1, n, 4
dy(i ) = dy(i ) + da*dx(i )
dy(i+1) = dy(i+1) + da*dx(i+1)
dy(i+2) = dy(i+2) + da*dx(i+2)
dy(i+3) = dy(i+3) + da*dx(i+3)
enddo
end
We had the following loop after instruction combination (useless notes
deleted, that's why the uids don't match up correctly)
(insn 80 79 81 (set (reg:DF 66)
(mult:DF (reg:DF 64)
(mem/s:DF (plus:SI (reg:SI 136)
(const_int -24)) 0))) 193 {mulsf3-1} (nil)
(nil))
(insn 83 82 98 (set (mem/s:DF (plus:SI (reg:SI 135)
(const_int -24)) 0)
(plus:DF (mem/s:DF (plus:SI (reg:SI 135)
(const_int -24)) 0)
(reg:DF 66))) 140 {addsf3-1} (insn_list 80 (nil))
(expr_list:REG_DEAD (reg:DF 66)
(nil)))
(insn 99 98 100 (set (reg:DF 80)
(mult:DF (reg:DF 64)
(mem/s:DF (plus:SI (reg:SI 136)
(const_int -16)) 0))) 193 {mulsf3-1} (nil)
(nil))
(insn 102 101 120 (set (mem/s:DF (plus:SI (reg:SI 135)
(const_int -16)) 0)
(plus:DF (mem/s:DF (plus:SI (reg:SI 135)
(const_int -16)) 0)
(reg:DF 80))) 140 {addsf3-1} (insn_list 99 (nil))
(expr_list:REG_DEAD (reg:DF 80)
(nil)))
(insn 121 120 122 (set (reg:DF 97)
(mult:DF (reg:DF 64)
(mem/s:DF (plus:SI (reg:SI 136)
(const_int -8)) 0))) 193 {mulsf3-1} (nil)
(nil))
(insn 124 123 142 (set (mem/s:DF (plus:SI (reg:SI 135)
(const_int -8)) 0)
(plus:DF (mem/s:DF (plus:SI (reg:SI 135)
(const_int -8)) 0)
(reg:DF 97))) 140 {addsf3-1} (insn_list 121 (nil))
(expr_list:REG_DEAD (reg:DF 97)
(nil)))
(insn 146 145 148 (set (mem/s:DF (reg:SI 135) 0)
(plus:DF (mem/s:DF (reg:SI 135) 0)
(reg:DF 114))) 140 {addsf3-1} (insn_list 143 (nil))
(expr_list:REG_DEAD (reg:DF 114)
(nil)))
(note 148 146 271 "" NOTE_INSN_LOOP_CONT)
Note how each multiply uses (reg:DF 64) and they're all 3 operand multiplies.
The problem (of course) is we do not have a 3 operand multiply on the m68k.
regmove, bless it's terrible bits, actually did the right thing and made the
precise transformations that we need:
(insn 287 79 80 (set (reg:DF 66)
(reg:DF 64)) 60 {movdf+1} (nil)
(nil))
(insn 80 287 81 (set (reg:DF 66)
(mult:DF (reg:DF 66)
(mem/s:DF (plus:SI (reg:SI 136)
(const_int -24)) 0))) 193 {mulsf3-1} (nil)
(nil))
(insn 83 82 98 (set (mem/s:DF (plus:SI (reg:SI 135)
(const_int -24)) 0)
(plus:DF (mem/s:DF (plus:SI (reg:SI 135)
(const_int -24)) 0)
(reg:DF 66))) 140 {addsf3-1} (insn_list 80 (nil))
(expr_list:REG_DEAD (reg:DF 66)
(nil)))
(insn 285 98 99 (set (reg:DF 80)
(reg:DF 64)) 60 {movdf+1} (nil)
(nil))
(insn 99 285 100 (set (reg:DF 80)
(mult:DF (reg:DF 80)
(mem/s:DF (plus:SI (reg:SI 136)
(const_int -16)) 0))) 193 {mulsf3-1} (nil)
(nil))
(insn 102 101 120 (set (mem/s:DF (plus:SI (reg:SI 135)
(const_int -16)) 0)
(plus:DF (mem/s:DF (plus:SI (reg:SI 135)
(const_int -16)) 0)
(reg:DF 80))) 140 {addsf3-1} (insn_list 99 (nil))
(expr_list:REG_DEAD (reg:DF 80)
(nil)))
(insn 283 120 121 (set (reg:DF 97)
(reg:DF 64)) 60 {movdf+1} (nil)
(nil))
(insn 121 283 122 (set (reg:DF 97)
(mult:DF (reg:DF 97)
(mem/s:DF (plus:SI (reg:SI 136)
(const_int -8)) 0))) 193 {mulsf3-1} (nil)
(nil))
(insn 124 123 142 (set (mem/s:DF (plus:SI (reg:SI 135)
(const_int -8)) 0)
(plus:DF (mem/s:DF (plus:SI (reg:SI 135)
(const_int -8)) 0)
(reg:DF 97))) 140 {addsf3-1} (insn_list 121 (nil))
(expr_list:REG_DEAD (reg:DF 97)
(nil)))
(insn 281 142 143 (set (reg:DF 114)
(reg:DF 64)) 60 {movdf+1} (nil)
(nil))
(insn 143 281 144 (set (reg:DF 114)
(mult:DF (reg:DF 114)
(mem/s:DF (reg:SI 136) 0))) 193 {mulsf3-1} (nil)
(nil))
(insn 146 145 148 (set (mem/s:DF (reg:SI 135) 0)
(plus:DF (mem/s:DF (reg:SI 135) 0)
(reg:DF 114))) 140 {addsf3-1} (insn_list 143 (nil))
(expr_list:REG_DEAD (reg:DF 114)
(nil)))
(note 148 146 271 "" NOTE_INSN_LOOP_CONT)
Note how we copy (reg:DF 66) into the output of the multiply which immediately
follows. So, now we've got optimal looking rtl to pass off to the register
allocators, right? Well, sorta.
The problem is now (reg:DF 64) is only used as the target of a memory load
(not shown) and the source for a bunch of copies -- ie, (reg:DF 64) only
appears in movdf insns.
The m68k movdf patterns expose the ability to load from memory into both
GPRS and FPRs, movdf also exposes the ability to copy between the two register
sets.
The net result is (reg:DF 64) doesn't prefer any particular register class.
All the other DFmode pseudos get allocated to FPRs. But our friend (reg:DF 64)
gets allocated into a GPR. Moving a DFmove value from a GPR to a FPR requires
pushing both 32bit words onto the stack, then popping them off via an fpload.
OUCH!
move.l %d1,-(%sp)
move.l %d0,-(%sp)
fmove.d (%sp)+,%fp0
fmul.d -24(%a0),%fp0
fadd.d -24(%a1),%fp0
fmove.d %fp0,-24(%a1)
move.l %d1,-(%sp)
move.l %d0,-(%sp)
fmove.d (%sp)+,%fp0
fmul.d -16(%a0),%fp0
fadd.d -16(%a1),%fp0
fmove.d %fp0,-16(%a1)
move.l %d1,-(%sp)
move.l %d0,-(%sp)
fmove.d (%sp)+,%fp0
fmul.d -8(%a0),%fp0
fadd.d -8(%a1),%fp0
fmove.d %fp0,-8(%a1)
move.l %d1,-(%sp)
move.l %d0,-(%sp)
fmove.d (%sp)+,%fp0
fmul.d (%a0),%fp0
fadd.d (%a1),%fp0
fmove.d %fp0,(%a1)
lea (32,%a1),%a1
lea (32,%a0),%a0
dbra %d2,.L7
Toon -- if you could give this patch a benchmark spin it would be greatly
appreciated:
* m68k.md (movdf): Hide GPR sources & destinations from regclass.
Index: m68k.md
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/config/m68k/m68k.md,v
retrieving revision 1.26
diff -c -3 -p -r1.26 m68k.md
*** m68k.md 1999/01/27 01:43:08 1.26
--- m68k.md 1999/03/29 10:17:04
***************
*** 1147,1154 ****
"")
(define_insn ""
! [(set (match_operand:DF 0 "general_operand" "=rm,rf,rf,&rof<>,y,rm,x,!x,!rm")
! (match_operand:DF 1 "general_operand" "rf,m,0,rofE<>,rmE,y,xH,rm,x"))]
; [(set (match_operand:DF 0 "general_operand" "=rm,&rf,&rof<>")
; (match_operand:DF 1 "general_operand" "rf,m,rofF<>"))]
"!TARGET_5200"
--- 1147,1156 ----
"")
(define_insn ""
! [(set (match_operand:DF 0 "general_operand"
! "=*rm,*rf,*rf,&*rof<>,y,*rm,x,!x,!*rm")
! (match_operand:DF 1 "general_operand"
! "*rf,m,0,*rofE<>,*rmE,y,xH,*rm,x"))]
; [(set (match_operand:DF 0 "general_operand" "=rm,&rf,&rof<>")
; (match_operand:DF 1 "general_operand" "rf,m,rofF<>"))]
"!TARGET_5200"
More information about the Gcc-patches
mailing list