This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [rfc] multi-word subreg lowering pass
- From: Björn Haase <bjoern dot m dot haase at web dot de>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Martin Koegler <mkoegler at auto dot tuwien dot ac dot at>,rth at redhat dot com
- Date: Tue, 28 Jun 2005 00:48:40 +0200
- Subject: Re: [rfc] multi-word subreg lowering pass
- References: <20050627073335.GA19227@ahab.auto.tuwien.ac.at>
Martin Koegler wrote on Montag, 27. Juni 2005 09:33 :
> On Sat, 7 May 2005 18:35:51 -0700, Richard Henderson wrote:
> > For AND, IOR, XOR, you should be able to delete the multi-word patterns
> > entirely, and leave those operations to be generated by the middle-end.
>
> This will not work for the current CVS version, which requires operations
> for mode_for_size(sizeof(int),MODE_INT,0) mode is present (at least for
> xor).
>
> e.g.:
> long long foo (double x, double y)
> {
> return !__builtin_isunordered (x, y);
> }
>
>
> If it compiled (without any optimizations), the gimple form is:
>
> foo (x, y)
> {
> long long int D.1390;
> _Bool D.1391;
> int D.1392;
>
> D.1391 = x unord y;
> D.1392 = !D.1391;
> D.1390 = (long long int) D.1392;
> return D.1390;
> }
>
> sizeof(_Bool) is 1 for most architectures. expand_binop for D.1392 =
> !D.1391 will be called with a QI Register and (CONST_INT 1) as operands and
> a result register with mode_for_size(sizeof(int),MODE_INT,0) mode.
>
> On i386, the result will be a SI register. As xor for SI mode is available,
> the following case will be used:
>
> if (methods != OPTAB_MUST_WIDEN
> && binoptab->handlers[(int) mode].insn_code != CODE_FOR_nothing)
> In this case, the QI operands are converted to the right mode.
>
> On AVR, word_mode is 1, while sizeof(int) is 2 (unless an option is
> specified). Therefore the result will be a HI register. If xor would
> only be available for QI mode, the following case will be used:
>
> /* These can be done a word at a time. */
> if ((binoptab == and_optab || binoptab == ior_optab || binoptab ==
> xor_optab) && class == MODE_INT
> && GET_MODE_SIZE (mode) > UNITS_PER_WORD
> && binoptab->handlers[(int) word_mode].insn_code != CODE_FOR_nothing)
>
> Here operand_subword_force will be called for a QI operand (with
> mode=HI as parameter), which will cause an internal compiler error.
>
> mfg Martin Kögler
Thank's for reviewing this. IIUC, I also had observed a couple of regressions
when removing the patterns completely. In my local experimental working
version of the AVR back-end that makes use of Richard's patch, I have used
explicit expanders for lowering xor:HI and xor:SI to a sequence of xor:QI
operations.
BTW: My present judgement concerning subreg-lowering before reload is:
1.) It is very helpful to expose the complexity to the register allocator and
for many cases the resulting code is much more efficient: Expressions using
sign/zero-extension shifts larger than one architecture-word benefit most.
Also when operating frequently with variables held in memory or initialized
with immediates, the early subreg-lowering could considerably help by
reducing register pressure. Some of the subregs could die earlier than
others.
2.) Difficulty is that for a couple of situations, one would like to maintain
the information what the zoos of individual lowered subregs correspond to.
The key issue, IMO, is condition code re-use. It would be, e.g., extremely
cumbersome to teach the mid-end passes to understand which condition code
is calculated by the expanded sequences. E.g. a cp:SI (reg:SI xx) (const_int
a) on avr would be expanded into one compare-QI-with-immediate, 3
load-QI-with-immediates and three compare_register_with_register_with_carry.
The generated sequences would be highly target specific, so that a generic
approach seems not to be easy.
For the real-world test cases I have studied so far the 2.) results in an
over-all reduced efficiency when lowering before reload. IMO, in order to
overcome this difficulty, one could not help placing some additional
knowledge in the rtl sequences: I.e. while the actual rtl that generates code
would always refer to the lowered subregs, I think that it would be necessary
to add instructions for the sole purpose of telling which kind of value is
located in which registers so that the different CSE passes have a chance to
find out at which place an existing condition-code value is re-calculated.
One possible option IMO are set-myself-to myself instructions carrying
register equal notes such as the optabs-expanders generate. The other option
that I think would work are dedicated "marker" instructions that never
generate text and only serve as hooks for CSE and combine. When aiming to use
the reg-equal notes method, one, however, would IMO need to change the
mid-end since presently, if I see correctly, they don't even survive the
first jump optimization pass. For the second (marker instruction) approach
one would need an additional pass for removing them prior to register
allocation.
Yours,
Björn