This is the mail archive of the
mailing list for the GCC project.
Re: peephole2: dead regs not marked as dead
- From: Georg Lay <avr at gjlay dot de>
- To: Paolo Bonzini <bonzini at gnu dot org>
- Cc: Michael Meissner <meissner at linux dot vnet dot ibm dot com>, GCC Development <gcc at gcc dot gnu dot org>
- Date: Wed, 10 Nov 2010 17:03:36 +0100
- Subject: Re: peephole2: dead regs not marked as dead
- References: <4CC1729C.email@example.com> <4CC93592.firstname.lastname@example.org> <4CC96A31.email@example.com> <4CC97656.firstname.lastname@example.org> <4CC9C7A0.email@example.com> <4CCAE383.firstname.lastname@example.org> <4CCAE571.email@example.com> <4CCAF3BF.firstname.lastname@example.org> <4CCAF81D.email@example.com> <4CCFDCDD.firstname.lastname@example.org> <20101108202705.GB14062@hungry-tiger.westford.ibm.com> <4CDA7AF0.email@example.com> <4CDA855A.firstname.lastname@example.org>
Paolo Bonzini schrieb:
> On 11/10/2010 11:58 AM, Georg Lay wrote:
>> In the old 3.4.x (private port) I introduced a target hook in combine,
>> just prior to where recog_for_combine gets called. The hook did some
>> canonicalization of rtx and thereby considerably reduced the number of
>> patterns that would have been necessary without that hook. It
>> transformed some unspecs.
> How is that related to delegitimize_address?
It's nothing to do with delegitimize address.
Ok, let me tell the storyan go into the mess. Some users wanted to
have a bit data type in C and to write down variables that represent a
bit in memory or in auto or as parameter. (I strongly recommended not
to do such hacking in a compiler because it is not within C and for
reasons you all know. For reasons you also know, the project was
There was already some hacking in gcc when I started and users
complained about superfluous zero-extract here and there. No bloat,
just a trade-off compiler vs. asm. tricore is capable of loading a
byte from absolute address space (LD.BU) and has instructions to
operate on any bit (and, or, xor, jmp, compare-accumulate-bitop,
extract, insert, etc.). New type _bit was mapped to BImode. To see
where the inconveniences are, let's have a look at a comment
;; The TriCore instruction set has many instructions that deal with
;; bit operands. These bits are then given in two operands: one operand
;; is the register that contains the bit, the other operand specifies
;; the bit's position within that register as an immediate, i.e. the
;; bit's position must be a runtime constant.
;; A new relocation type `bpos' for the bit's position within a byte had
;; been introduced, enabling to pack bits in memory without the need to
;; pack them in bitfields.
;; Such an instruction sequence may look like
;; ld.bu %d0, the_bit # insn A
;; jz.t %d0, bpos:the_bit, .some_label # insn A
;; The compiler could, of course, emit such instruction seqences.
;; However, emitting such sequences "en bloc" has several drawbacks:
;; -- every time we need a bit the bit must be read
;; -- emitting instruction en bloc knocks out the scheduler
;; -- when writing all the cases as combiner patterns the number of
;; insn patterns would explode
;; Therefore, a bit could have been described as BImode, which would
;; lead to the following instruction sequence
;; ld.bu %d0, the_bit # insn A
;; extr.u %d0, %d0,bpos:the_bit, 1 # insn A
;; jz.t %d0, 0, .some_label # insn B
;; I.e. a bit is represented as BImode and therefore always resides
;; in the LSB. The advantage is that the bit now lives in a register
;; and can be reused. The disadvantage is that
;; -- often, there is a superfluous instruction (extr.u)
;; -- the scheduler cannot schedule ld.bu away from extr.u
;; -- for combiner patterns the same as above applies
;; BPOS representation
;; Thus, we need a way to tag a register with the bit's position.
;; I use the new LOADBI and BPOS rtx_code, here in an assembler dump
;; which shows the 1-to-1 correspondence between insn and machine
;; # (set (reg:SI 0)
;; # (loadbi:SI (mem:BI (symbol_ref:SI ("?the_bit")))
;; # (symbol_ref:SI ("?the_bit"))))
;; ld.bu %d0, the_bit # insn A
;; # (set (pc)
;; # (if_then_else (eq (bpos:BI (reg:SI 0)
;; # (symbol_ref:SI ("?the_bit")))
;; # (const_int 0))
;; # (label_ref 42)
;; # (pc)))
;; jz.t %d0, bpos:the_bit, .some_label # insn B
;; The sole SYMBOL_REFs are the tags.
;; The reason to use own RTX code and not UNSPEC is because the combiner
;; does not like UNSPEC and the code would not be as dense as could be.
;; Using ZERO_EXTRACT for LOADBI does not work either because a
;; may not be used as lvalue which is needed when loading a bit.
;; The ZERO_EXTRACT on the left side would assume that the register that
;; is to be loaded already lives, because the ZERO_EXTRACT affects
;; just some bits, not all.
;; Note that a LOADBI in this context is a load together with a
;; and a BPOS is a shift right i.e. equivalent to some ZERO_EXTRACT.
;; The tag in a BPOS may also be a CONST_INT
;; in which case the bit's position is a compile time constant.
;; The output mode of a BPOS (M1) needs not to be BI, it may also be
;; some wider mode like QI or SI. A wider mode represents a BPOS that is
;; zero extended to that mode. The input mode's bit size must be at most
;; one plus the bit's position (8 for a SYMBOL_REF).
;; BPOS Operands
;; Next, I introduced BPOS operands, i.e. a bpos_operand predicate and
;; BPOS constraints. Thereby, I can use `bpos_operand' in insn
;; predicates and there is no need to write down the very BPOS
;; representation. That technique hides the internal BPOS representation
;; from the machine description.
;; A BPOS operand is one of these:
;; (SUBREG ((REG:BI) 0))
;; (BPOS ((REG) (CONST_INT)))
;; (BPOS ((REG) (SYMBOL_REF)))
;; (SUBREG (BPOS ((REG) (CONST_INT)) 0))
;; (SUBREG (BPOS ((REG) (SYMBOL_REF)) 0))
;; with the bit's position in REG and SUBREG is 0.
;; RELOAD pass
;; Introducing bpos_operand implies the need to reload them.
;; Reloading a BPOS should not reload the BPOS as a whole but only the
;; BPOS's register. The patch is done by MAP_RELOADS in `find_reloads'
;; which ends up in `tric_map_reloads'.
;; BPOS pass
;; A complete new pass had been introduced. The load of a bit is
;; emitted a usual
;; # (set (reg:BI 100)
;; # (mem:BI (symbol_ref:SI ("?the_bit"))))
;; These insns get splitted into the loadbi and bpos insns mentioned
;; The bpos pass had been placed before combine, i.e. somewhere
;; analysis but after the loop optimizer.
;; After all loadbi insns are split, the insns are traveled again to
;; the newly generated BPOSs into insns that support bpos_operand.
;; Amongst them are jumps, some shifts, extracts, bit operations, etc.
;; In order to reduce the number of patterns, operands that encode
;; operations that are equivalent to some BPOS are canonicalized to
;; bpos_operands, e.g. shifting a register right by 4 and then and-ing
;; with 1 and finally zero extending to SI will become
;; (bpos:SI (reg) (4)). Canonicalizing includes ZERO_EXTRACT rtx which
;; are mapped to BPOS, too.
;; The hook is placed in toplev.c, the worker function is
;; COMBINE pass
;; The combiner puts insns together. The result is stuff like
;; (bpos:SI (not:SI (reg)) (pos)) which must be canonicalized to
;; (subreg:SI (not:BI (bpos:BI (reg) (pos))) 0) so that the resulting
;; rtx will fit into bpos_operand insns. The hook is in combine.c
;; and the worker function is `tric_replace_for_combine'.
;; Own enum rtx_code for BPOS/LOADBI
;; Introducing new rtx_code requires minor and straight forward
;; rtl.def: DEF_RTL_EXPR for BPOS and LOADBI
;; DEF_RTL_EXPR(BPOS, "bpos", "ee", '2')
;; DEF_RTL_EXPR(LOADBI, "loadbi", "ee", '2')
;; return 0 for BPOS and LOADBI
and one of the insn is
[(set (match_operand:SI 0 "data_register_operand" "=d")
(ior:SI (and:SI (not:SI (subreg:SI (match_operator:BI 1
[(not:BI (match_operand:BI 2 "bpos_operand" "xd"))
(match_operand:BI 3 "bpos_operand" "xd")]) 0))
(match_operand:SI 4 "data_register_operand" "0")))]
"XOR != GET_CODE (operands)"
"or.%D1.t\t%0, %B2, %B3"
[(set_attr "pipe" "ip")])
The second not might come from the combine hook that transformed
things like (bpos:BI (xor (reg 4) 2) to (not:BI (bpos (reg 2))).
(bpos (reg 2)) is matched by "bpos_operand" and reload reloads just
the reg if it must spill and not the bpos as a whole.
>> A second use case of such a hook could look as follows: Imagine a
>> combined pattern that does not match but would be legal if the backend
>> knew that some register contains some specific value like, e.g.,
>> non-negative, zero-or-storeflagvalue, combiner has proved that some
>> bits will always be set or always be cleared;
> You can use nonzero_bits or num_signbit_copies in define_splits. In
> _this_ case, a define_insn_or_split doesn't really help, I agree with
> Joern on this. :)
Yes. It's all inside of combine. May the second or first insn
generated by split be no-op insn? What, if the split is in need of a
new pseudo? There may be left-over regs from combine as is tries to
find split points, but that means when combine inserts an SI and you
need a DI you are stuck. I hat that case and couldn't find a solution.
Note that this was back in 3.4 and I don't intend to break out of the
backend's sandbox if not absolutely necessary (in the case above I
tried to keep the changes minimal, like a hook in reload).
>> Suppose insns A, B and C with costs f(A) = f(B) = f(C) = 2.
>> Combine combines A+B and sees costs f(A+B) = 5. This makes combine
>> reject the pattern, not looking deeper.
> Does it really do that? I see nothing in combine_instructions which
> would prevent going deeper.
The costs. I see that some passes treat cost=0 special. But gccint
doesn't say a word about it so from the internals you get
interpretation like: "0 will cost nothing". Again, I don't like to use
features that are not documented on gccint level as they may change
over time like "there is this a global, non-gccint-documented variable
that in a specific pass contains a value that I can make use of". As
backend programmer I must have an interface (description) to rely on
and prefer stability over hacking.