[PATCH] Fix *_BY_PIECES_P

Mon Aug 9 16:48:00 GMT 2004

On Mon, Aug 09, 2004 at 08:46:04AM -0600, Roger Sayle wrote:
> I think this is a red herring.  Yes, GCC doesn't un-CSE the zero constant
> into it's own register, but the CLEAR_RATIO threshold is between using the
> inefficient inline sequence vs. a clrmem or libcall.  If the top sequence
> of eight instructions is faster than a call to memset or a stos* sequence,
> then CLEAR_RATIO should be higher than six.

It is faster in microbenchmarks, but so is e.g. a sequence of 12 movl/movqs,
yet MOVE_RATIO is 9 on that target.  But microbenchmarks don't necessarily
tell everything, as (much) bigger I-cache footprint can degrade performance
of real-world appplications a lot.  Both MOVE_RATIO and CLEAR_RATIO
are some compromise between the faster move/clear instructions and
I-cache friendlier movmem/clrmem/call.
I don't know how exactly the current MOVE_RATIO values were computed
(guesses, a lot of benchmarking, something else), but just as a guess,
if we don't un-CSE, then movl $0, OFF(reg) instructions are on average
3 bytes longer than movl instructions used by move_by_pieces, so the
I-cache factor is bigger there.

	Jakub