This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Exploiting dual mode operation, implementation.

From: Richard Henderson <rth at redhat dot com>
To: Leehod Baruch <LEEHOD at il dot ibm dot com>
Cc: gcc-patches at gcc dot gnu dot org, Mircea Namolaru <NAMOLARU at il dot ibm dot com>, stevenb at suse dot de, Mostafa Hagog <MUSTAFA at il dot ibm dot com>, sleehodb at t2 dot technion dot ac dot il
Date: Mon, 27 Jun 2005 16:47:22 -0700
Subject: Re: [PATCH] Exploiting dual mode operation, implementation.
References: <OF197D0F0C.9E8FDD2C-ONC2257028.0005772E-C2257028.00504BF1@il.ibm.com>

First, I'm unhappy to see that when you say "sign-extend", you really
do mean SIGN_EXTEND and not sign extension in general, also covering
the ZERO_EXTEND case.  I'm not sure I want to accept this pass with
such a glaring omission.

Second, I'm unhappy to see that you're hard-coding SI and DImode as
your source and destination modes.  This absolutely blocks this
pass being accepted.  You should be able to handle any pair of modes,
and preferably many destination modes simultaneously.  There should
be no additional cost for multiple destination modes when they're all
no larger than word_mode; I can see it being harder and giving up if
you've multiple destination modes larger than word_mode.

> In these testcases, you can see that the redundant sign-extension
> instructions are eliminated, and the non redundant ones are moved to their
> optimal placement.

Preferably you'd provide test cases that can be integrated into the
testsuite, that show that your pass is performing as intended.
Pattern matching on a debug file, as is done in gcc.dg/tree-ssa
is very likely the easiest way.

I won't be firm on this because it's harder to reliably perform such
pattern matching across multiple targets at the rtl level.  But if
you provide something that works for ppc, other port maintainers can
add their own markup for their own targets to be sure that things are
working as expected.

> def1 + se:
> set ((reg:SI 10) (..def1rhs..))
> set ((reg:DI 100) (sign_extend:DI (reg:SI 10)))
...
> set ((subreg:SI (reg:DI 200)) (..def1rhs..))
> set ((reg:DI 100) (sign_extend:DI ((subreg:SI (reg:DI 200)))))
> set ((reg:SI 10) (subreg:SI (reg:DI 200)))

I'm a bit confused by this choice of replacement.  Why not just

  (set (reg:SI 10) (def1rhs))
  (set (reg:DI 100) (sign_extend:DI (reg:SI 10)))

what's the point of reg 200?

Conversely, this is usually a trivial transformation, and I wonder
why you don't simply generate the code you want directly and try to 
recognize it.  E.g.

  (set (reg:DI 100) (sign_extend:DI (def1rhs)))
  (set (reg:SI 10) (subreg:SI (reg:DI 100)))

In this way you don't have to worry about manipulating combine.
Perhaps you have test cases for which simple combines like the
preceeding don't work?

> /**************************************************************************

We generally don't use big divider bars like this.

> extern void union_defs (struct df *, struct ref *, struct web_entry *,
>                         struct web_entry *);

Never declare functions from another c file locally.
Always use a header.

> 	  if (validate_change (se_insn, &SET_DEST (se_set),
> 			       SET_DEST (SET_DEST (se_rhs)), 0))

You can't possibly have nested sets.  The outer of these should be
some other macro, either XEXP or SUBREG_REG.

Why didn't this ICE for you?  Either your test coverage is weak,
or you're bootstrapping with --disable-checking.

>       slot_rtx = (rtx *) htab_find_slot (insn_hash, insn, 1);

Final argument is INSERT, not 1.

>       /* We need to merge two hash tables.  */
>       use_s = xmalloc (sizeof (*use_s));
>       use_s->use_insn = new_insn;
>       use_s->hash = htab_create (10, hash_descriptor_se_insn,
> 				 eq_descriptor_se_insn, NULL);
>       htab_traverse (((struct see_use_s *) (stn_new->value))->hash,
> 		     copy_to_htab_se_insn, (PTR) use_s->hash);
>       htab_traverse (((struct see_use_s *) (stn_old->value))->hash,
> 		     copy_to_htab_se_insn, (PTR) use_s->hash);
> 
>       splay_tree_remove (use_ds , INSN_UID (old_insn));
>       splay_tree_remove (use_ds , INSN_UID (new_insn));
>       splay_tree_insert (use_ds , INSN_UID (new_insn),
> 			 (splay_tree_value) use_s);

Why are you not simply insertting into the stn_new table?

>   use_s = xmalloc (sizeof (*use_s));
>   use_s->use_insn = new_insn;
>   use_s->hash = htab_create (10, hash_descriptor_se_insn,
> 			     eq_descriptor_se_insn, NULL);
>   htab_traverse (((struct see_use_s *) (stn->value))->hash,
> 		 copy_to_htab_se_insn, (PTR) use_s->hash);
> 
>   splay_tree_remove (use_ds , INSN_UID (old_insn));
>   splay_tree_insert (use_ds , INSN_UID (new_insn), (splay_tree_value) use_s);

Why are you copying the hash table at all?  Why not just a move
to the new INSN_UID?

If you do actually need to copy, you know exactly how many entries
are required, and should avoid the resizing caused by the hard 
coding of the value 10.

>       use_s = ((struct see_use_s *) (stn->value));
>       hashed_insn = htab_find (use_s->hash, se_insn);
>       if (hashed_insn)
> 	{
> 	  /* Just to make sure ...  */
> 	  rtx se_set = single_set (se_insn);
> 	  rtx se_rhs = SET_SRC (se_set);
> 	  rtx se_reg = SET_DEST (SET_DEST (se_rhs));
> 	  rtx hashed_set = single_set (hashed_insn);
> 	  rtx hashed_rhs = SET_SRC (hashed_set);
> 	  rtx hashed_reg = SET_DEST (SET_DEST (hashed_rhs));
> 	  gcc_assert (REGNO (hashed_reg) == REGNO (se_reg));
> 	}
>       else
> 	{
> 	  slot = (rtx *) htab_find_slot (use_s->hash, se_insn, 1);
> 	  *slot = se_insn;

Use htab_find_slot once, always.  You then test *slot for non-null
and then perform your sanity check.

>       gcc_assert (lhs_reg = SET_DEST (hashed_set));
>       gcc_assert (rhs_reg = SET_SRC (hashed_set));

s/=/==/

>   if (!*first || !*second)
>     *first = NOT_RELEVANT;
>   else if ((*first == SIGN_EXTENSION_DEF)
> 	   || (*second == SIGN_EXTENSION_DEF))
>     *first = SIGN_EXTENSION_DEF;
>   else
>     *first = CONST_DEF;

Why is (0 union 0) NOT_RELEVANT?  Because you don't know how to 
manipulate NOT_RELEVANT, or because it's actually not relevant?
I might think in the later case you'd want EXTENSION_DEF.  Many
of the extension cases will be free; we should take advantage
of that in the case the extension is partially unused.

>   tmp_reg = SET_DEST (SET_DEST (stored_se_rhs));

Again with the double SET_DEST.

> 	      && (GET_CODE (PATTERN (trivial_insn)) != PARALLEL)
> 	      && single_set (trivial_insn));

The first test should be avoided; only the second is relevant.

> see_try_combine (rtx I1, rtx I2, rtx I3, int pattern_type)

This hackery with combine is atrocious.

For each attempt you are generating a brand new trivial CFG, adding
your three instructions, getting your result and restoring the old CFG?
Frankly I can't imagine a less efficient way of doing this.

Like I say above, virtually all of the time you can generate the new
pattern by hand -- at least on the def side.  I personally would be
happy if we went just that far and stopped.  If we *really* find there
are other cases we need to handle, we should hook into simplify-rtx
and combine at the appropriate places.  Which I guarantee you will be
farther down the call chain than combine_instructions.

>   /* Generate the new pattern.  */
>   start_sequence ();
>   tmp_subreg = gen_lowpart_SUBREG (SImode, tmp_reg);
>   emit_move_insn (tmp_subreg, prev_rhs);
>   tmp_reg_extended = gen_rtx_SIGN_EXTEND (GET_MODE (lhs), tmp_subreg);
>   emit_move_insn (lhs, tmp_reg_extended);

This usage of emit_move_insn is invalid.  It may only be used with
objects.  If you want expressions, you have to generate them some
other way -- directly+recognition or optabs.

If you use optabs, you have to be careful of clobbering a live flags
register.  This may well happen on i386; see zero_extendqisi2.  In
this case you'll have to have a notion of what hard registers are 
live at the spots you want to insert extensions, and notice when new
patterns kill them, and abort the insertion.

If you use direct pattern generation + recognition, you have to be
prepared for a pattern to not be recognizable, and abort the insertion.

>       rt = (def_entry + i)->extra_flag;

"def_entry[i].extra_flag" is really more readable.

>       /* Eliminate global redundant se (using lcm).  */
>       want_to_gcse_p_set (want_to_gcse_p_se);
>       gcse_main (f, dump_file, 1);
>       want_to_gcse_p_reset ();

What real benefit are you getting from gcse_main that you want to hack
it up like this?  It's really not much code to invoke LCM directly.
You've already collected all of the information on your defs and uses
via the web.  I see no reason you couldn't plug that into LCM and read
back the results.

Indeed, it seems to me that you could do this without having modified
the instruction stream at all yet, avoiding the work you do to 
artificially add instructions, and then remove them after the fact.


r~

References:
- [PATCH] Exploiting dual mode operation, implementation.
  - From: Leehod Baruch

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]