x86 ashlsi3 improvements

Jan Hubicka hubicka@horac.ta.jcu.cz
Sun Feb 28 18:15:00 GMT 1999

>  Thoughts from the x86 gurus?
Well, I am not x86 guru... but here are my 2 cents.
I've done very similar changes about half a year ago and also I've done
some benchmarking on pentium and K6, so here are few notes...
>  --- 4745,4877 ----
>  return AS2 (sal%L0,%2,%0);
>  }")
>  + ;; For Pentium/Pentium MMX:
>  + ;;
>  + ;;   We want to optimize for pairability, which means generally preferring
>  + ;;   lea (which can execute in either the U or V pipe) over sal which
>  + ;;   can only execute in the U pipe.

I've done some benchmarking here and win of lea over sal is not clear.
You need to take into account important property of lea - it is executed in
operand fetch pass, so its operands have memory adress behaviour - they need
to be ready one cycle before lea is executed otherwise AGI stall happends.

With current code, gcc don't recognize this AGI stall (as well as with other
leas generated in addsi and similar patterns) so AGI stalls is this cases are
quite common and sal seems to win. I've added code to i386.c to detect such
AGI stalls and set correct collizions for scheduler. Result was better, but
sometimes sal was a win and sometimes lea. Maybe lea more often.

With my pairing MD_SCHED macros, sal is a win again, since the macros
understands how to pair sals so they are almost always paired.
>  + ;;
>  + ;; For PPro/PII
>  + ;;
>  + ;;   There's more than one approach to optimizing for this family; it is
>  + ;;   unclear which approach is best.  For now, we will try to minimize
>  + ;;   uops.  Note that sal and lea have the same characteristics, so we
>  + ;;   prefer sal as it takes less space.
This is probably correct. I don't have PPro to try, but my MD_SCHED macros
simulating decoder behaviour showed, that decoder on PPro is not a issue most
gcc code (oposite is true for K6)
>  + ;;
>  + ;; I do not know what is most appropriate for the AMD or Cyrix chips.
I am not sure about Cyrix, but I *think* that leas are executed in decode
state as on pentium if so, lea is the worst choice, because AGI stalls takes
two cycles.

As far as I can remember, for AMD all choices are roughly the same I believe.
>  + ;;
>  + ;;   srcreg == dstreg, constant shift count:
>  + ;;
>  + ;;     For a shift count of one, use "add".
>  + ;;     For a shift count of two or three, use "lea" for Pentium/Pentium MMX.
>  + ;;     All others use "sar".
As I wrote earlier, using "sar" instead lea should worth a try. If you are
interested, I can do some benchmarking again.
>  + ;;
>  + ;;   srcreg != dstreg, constant shift count:
>  + ;;
>  + ;;     For shift counts of one to three, use "lea".
>  + ;;     All others use "lea" for the first shift into the destination reg,
>  + ;;     then fall back on the srcreg == dstreg for the residual shifts.
This is same in my implementation. I don't se any purpose to change it.
>  + ;;
>  + ;;   memory destinations or nonconstant shift count:
>  + ;;
>  + ;;     Use "sal".
>  + ;;
>  + (define_insn ""
>  +   [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,r")
>  + 	(ashift:SI (match_operand:SI 1 "nonimmediate_operand" "0,r")
>  + 		   (match_operand:SI 2 "nonmemory_operand" "cI,I")))]

My implementation use:
To point out extra lea instruction generated...
>  +   /* This should be extremely rare (impossible?).  We can not encode a shift
>  +      of the stack pointer using an lea instruction.  So copy the stack pointer
>  +      into the destination register and fall into the srcreg == dstreg shifting
>  +      support.  */
According to my knowedge, it is never generated. GCC never use stack register
for "normal" thinks and stack operation never optimizes to shift. I've used
abort() on this place for a long time w/o problems.

Hope this helps
                   Have you browsed my www pages? Look at:
      Koules-the game for Svgalib,X11 and OS/2,  Xonix-the game for X11
      czech documentation for linux index, original 2D computer art and
              funny 100 years old photos and articles are there!

More information about the Gcc-patches mailing list