size optimization for x86

Jeffrey A Law law@hurl.cygnus.com
Sun Feb 28 18:15:00 GMT 1999


  In message < 19990202170438.B30861@cygnus.com >you write:
  > On Tue, Feb 02, 1999 at 10:41:43AM -0700, Jeffrey A Law wrote:
  > > Basically when optimizing for size on the x86, using lea to implement
  > > shifts loses.
  > 
  > Yep.  I'd make one other change as well --
  > 
  > > ! (define_insn ""
  > >     [(set (match_operand:SI 0 "nonimmediate_operand" "=r,rm")
  > >   	(ashift:SI (match_operand:SI 1 "nonimmediate_operand" "r,0")
  > >   		   (match_operand:SI 2 "nonmemory_operand" "M,cI")))]
  > > !   "! optimize_size"
  > 
  > Swap these alternatives so that we prefer sal when src & dest do match.
Wacking on this was the next project.  A variety of things we can look into
for this one.

lea + add generates 2 ups
lea + sal generates 2 uops
mov + sal generates 3 uops

So, if we want to minimize uops for PPro, for shifts > 3 and src != dst, we
use an lea to perform the initial shift and get the result into the destination
reg, then sal to handle remaining shifts.  This also gives maximial freedom at
the initial decode stage.

Optimizing for issue to the function units we'd want a different set of 
criteria.  Probably selecting mov + sal for this case and possibly a rethink
of other cases.

I'm not sure which we should bias towards right now (opinions welcome).


And (of course) the generation strategy for Pentium/MMX may be totally
different too.  I haven't spent much time thinking about it yet.  This was
really a side issue for something else I was looking at :-)

jeff



More information about the Gcc-patches mailing list