This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: i386 RTX_COST tweeks take 2

To: Richard Henderson <rth at redhat dot com>, Jan Hubicka <jh at suse dot cz>, Andi Kleen <ak at suse dot de>, gcc-patches at gcc dot gnu dot org, gcc at gcc dot gnu dot org
Subject: Re: i386 RTX_COST tweeks take 2
From: Jan Hubicka <jh at suse dot cz>
Date: Thu, 8 Nov 2001 14:13:10 +0100
References: <20011103174136.A23260@atrey.karlin.mff.cuni.cz.suse.lists.egcs> <Pine.LNX.4.33L2.0111050649160.30025-100000@kevin.inet.suse.lists.egcs> <20011105154541.A9715@atrey.karlin.mff.cuni.cz.suse.lists.egcs> <p73pu6wyb1w.fsf@amdsim2.suse.de> <20011106175612.B8917@atrey.karlin.mff.cuni.cz> <20011106235244.I29844@redhat.com> <20011107133657.B20978@atrey.karlin.mff.cuni.cz> <20011107115047.C30193@redhat.com>

> On Wed, Nov 07, 2001 at 01:36:57PM +0100, Jan Hubicka wrote:
> > http://gcc.gnu.org/ml/gcc-patches/2001-08/msg01491.html
> > 
> > It saves one extra copy after expanded multiplies.
> 
> Why can't we repair the damage with globals?  Why wouldn't 
> fixing this be any different than if the code used a local
> variable as an intermediate?
Using local variable as an intermediate has same effect.
Of course first I wondered why the extra move don't get removed, but
the purpose is bit complex. CSE do have logic for "smart" choosing of
what equivalence to use:

  /* Prefer fixed hard registers to anything.  Prefer pseudo regs to other
     hard regs.  Among pseudos, if NEW will live longer than any other reg
     of the same qty, and that is beyond the current basic block,
     make it the new canonical replacement for this qty.  */
  if (! (firstr < FIRST_PSEUDO_REGISTER && FIXED_REGNO_P (firstr))
      /* Certain fixed registers might be of the class NO_REGS.  This means
	 that not only can they not be allocated by the compiler, but
	 they cannot be used in substitutions or canonicalizations
	 either.  */
      && (new >= FIRST_PSEUDO_REGISTER || REGNO_REG_CLASS (new) != NO_REGS)
      && ((new < FIRST_PSEUDO_REGISTER && FIXED_REGNO_P (new))
	  || (new >= FIRST_PSEUDO_REGISTER
	      && (firstr < FIRST_PSEUDO_REGISTER
		  || ((uid_cuid[REGNO_LAST_UID (new)] > cse_basic_block_end
		       || (uid_cuid[REGNO_FIRST_UID (new)]
			   < cse_basic_block_start))
		      && (uid_cuid[REGNO_LAST_UID (new)]
			  > uid_cuid[REGNO_LAST_UID (firstr)]))))))

This heuristics does not work well in many cases and produce code with
unneeded copies in case it decides the new pseudo to be preferable over
the old pseudo.

It also interfere badly with GCSE CPROP that do use always the first
available occurence.  Sometimes cse decide to undo CPROP, but as
coprop is global, while cse just "pseudoglobal" it does not undo it
completely.

I've tried to remove this check and always use the older value, but it produces
longer cc1 than the former.  I've tracked it down to cases where single pseudo
is copied and later reused for something else. CPROP then do use the original
pseudo but later switch to the copy keeping the copy required, while the second
produces two pseudo (still with copy) with non-overlapping liveranges that can
regalloc coalesce.

I can elliminate such cases by the webizer pass.  With the webizer done the
patch to disable heuristics improve cc1 binarry somewhat, but the webizing pass
does not appear to be on good track to get in.  (I believe that even with SSA
path that do webizing in early compilation it makes sense, as register
allocator should do it anyway and we can't do SSA at that stage and it is nice
to have it as separate pass).

Also other cases can be tracked by reverse copy propagation as done by impact
IMO...

Honza

> 
> While preserve_subexpressions_p may not be the best heuristic,
> not using it consistently is confusing.
I am not killing preserve_subexpressions_p.  What the code does is
dummy move like in following example when asked to put value to
register 58 (it expands a*3):

(insn 23 21 25 (parallel[ 
            (set (reg:SI 64)
                (ashift:SI (reg:SI 63)
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ] ) -1 (nil)
    (nil))

(insn 25 23 27 (parallel[ 
            (set (reg:SI 62)
                (plus:SI (reg:SI 64)
                    (reg/v:SI 59)))
            (clobber (reg:CC 17 flags))
        ] ) -1 (nil)
    (expr_list:REG_EQUAL (mult:SI (reg/v:SI 59)
            (const_int 3 [0x3]))
        (nil)))

(insn 27 25 28 (set (reg:SI 58)
        (reg:SI 62)) -1 (nil)
    (nil))

Even with preserve_subexpressions_p other expanders store result directly
to the destination and don't do the extra move, so I believe after my patch
the resutling sequence is still correct in preserve_subrexpressions_p sense,
as all the temporary computation (as a*2 in insn 23) are still stored to
separate pseudos.

Also whats about killing the heuristics completelly?  I think her time has
passed, as only purpose for having two schemes of code generation is that
the second did badly on interval based liveness analysys done by stupid
regalloc.

Also webizer-like pass should make it unnecesary to care the issue at all, but
at the moment, would be OK to remove all occurences of
preserve_subexpressions_p and expect it to be constant 1?

Honza
> 
> 
> r~

References:
- Re: Performance of Integer Multiplication on PIII
  - From: Andi Kleen
- i386 RTX_COST tweeks take 2
  - From: Jan Hubicka
- Re: i386 RTX_COST tweeks take 2
  - From: Richard Henderson
- Re: i386 RTX_COST tweeks take 2
  - From: Jan Hubicka
- Re: i386 RTX_COST tweeks take 2
  - From: Richard Henderson

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]