[PATCH] AVR: implement HI/SI logic operations and sign/zero extension by define_insn_and_split ...

Sun Apr 3 23:12:00 GMT 2005

> Björn Haase <bjoern.m.haase@web.de> writes:
> here is the first result of my work on the "splitting" issue for AVR.
> Find enclosed a patch that passed my testsuite run without any regression.

- very nice.

> I have been experimenting for quite a while with doing the spliting at
> expand time and after reload.
> 
> In theory, I came to the conclusion that it would be best, to
> expose the complexity as soon as possible to the compiler. I.e. do the
> splitting as soon as possible. Mainly in order to give the register allocator
> all the knowledge on which register is actually needed. However, I now think
> that Denis initial suggestion is best for most patterns: "Do it after
> reload :-)". 
> In my opinion, the main problem is that when doing the splitting
> at *expand* is twofold:
> 
> 1.) It will probably be
> almost impossible to recycle the condition codes that are readily generated
> by the calculating instructions (i.e. sub, add) so that one can prevent
> spurious cp/cpc instructions.

- with the hopeful exception of sub, which would be nice to use as the
  basis of optimization for simple loops which don't themselves use the
  the loop-count within the body of the loop, and not need explicit compare:

     for (i = 0; i < count; i++) {}
  => i = count; goto <while>; do {} while (i-- != 0) // cond not-zero branch
  or i = count-1; goto <while>; do {} while (i-- >= 0) //cond not-neg branch

>                                In the case of sign tests, I do not think
> that this is a severe problem since we have efficient patterns for that
> without needing lots of compares. But all of the instructions that leave
> useful information beside the sign, I'd suggest to split them after reload.
> Also the optimizer will have difficulties to optimize branching conditions
> if one expands the patterns for and and or at RTL generation. E.g. it would
> probably be extremely difficult to re-implement the branching patterns for
> the single bit tests and the sign test. In order to make combine use
> the more efficient combined patterns of type "sign_extend_value_and_substract"
> one also would need to leave zero_extend and sign_extend intact until after
> reload.
> 
> 2.) Second difficulty are moves.
> When using subreg expressions for expanding moves, one will have
> difficulties to tell the
> compiler that a [(set reg:SI 24) (set reg:SI 22)] is indeed possible without
> an intermediate non-overlapping register array. Also, it would be hard to
> make the compiler use the "movw" instructions in case that one has atomized
> everything to QImode objects. I think that one should split moves only after
> expand for this reason.

- would seem the same problem could exist to leverage addw/subw immediate?
  (as it's not clear if it may or may not be advantageous to split them
   into QI mode operations without insight as to the availability of the
   limited register operands these instructions may operate on?)

> However, for the remaining patterns (e.g. shifts, multiplies, xor) that do
> not leave the condition code in a useful state it is my feeling that it would
> be best to try to expand them as early as possible.
> Maybe this way, one might also prevent unnecessary clearing of the zero
> register and
> maybe for the case of multiplication, one might need less registers if the
> product is
> only an intermediate result or if the target for the product is memory.
> On the other hand, one would probably need to add quite a number of "use
> zero_reg" and (clobber temp_reg) information in the instruction patterns.

- given that it's not unusual for avr class applications to deal with
  limited precision data, many apps can likely benefit from simple 16=8x8
  or 16=8x8+16 fixed point operations, thereby correspondingly benefit from
  some other zero constant value allocation strategy, rather than allocating
  it in the multiplier's accumulator, which unfortunately cripples the
  avr's otherwise potentially efficient low-end signal processing mul/mac.

> My tests have shown that when using the splitters, code gets considerably
> better for complex logic operations. For my real-world application of 16k, I
> have observed about 50 bytes smaller code.

- Do you suspect that most of these benefits would be preserved if for
  example both QI and HI mode operations were optimally coded, thereby both
  being able to preserved the visibility the few word (HI) mode operations
  the avr is capable of natively, and then correspondingly splitting SI/DI
  mode operations into them as required (as opposed to all the way to QI
  mode initially? given that C defaults to wanting to turn everything into
  int operations by default, which corresponds to HI mode on the avr?)