[PATCH][i386] Enable vector_loop in memset expanding and merge expanders for memset and memmov

Jan Hubicka hubicka@ucw.cz
Sun Oct 20 10:26:00 GMT 2013


> That's a good point.  I added a check for this case - so if CONST_INT is passed
> we assume that mode is QI.  But usually promoted value resides in a register, so
> we don't go over-conservative here.
Hmm, so if we use broadcast constant, then we won't end up having CONST_INT here?
It is OK then.
> 
> > > +  if (size_to_move < GET_MODE_SIZE (move_mode))
> > > +    {
> > > +      move_mode = mode_for_size (size_to_move * BITS_PER_UNIT, MODE_INT, 0);
> > > +      promoted_val = gen_lowpart (move_mode, promoted_val);
> > > +    }
> > > +  piece_size = GET_MODE_SIZE (move_mode);
> > > +  code = optab_handler (mov_optab, move_mode);
> > > +  gcc_assert (code != CODE_FOR_nothing && promoted_val != NULL_RTX);
> > > +
> > > +  dst = adjust_automodify_address_nv (dst, move_mode, destptr, 0);
> > > +
> > > +  /* Emit moves.  We'll need SIZE_TO_MOVE/PIECE_SIZES moves.  */
> > > +  gcc_assert (size_to_move % piece_size == 0);
> > > +  adjust = GEN_INT (piece_size);
> > > +  for (i = 0; i < size_to_move; i += piece_size)
> > > +    {
> > > +      if (piece_size <= GET_MODE_SIZE (word_mode))
> > > +	{
> > > +	  emit_insn (gen_strset (destptr, dst, promoted_val));
> > > +	  continue;
> > > +	}
> > Don't we want to go for strmov here, too?
> I don't quite get it.  How could strmov help us here?

strset/strmov expanders results in stos/movs instructions on some targets.
The code is supposed to use them when profitable - i.e. for constant sized
prologues and known offset, you can set up registers for rep; movsq
and then use use movb/movsw/movsl to get the required alignment (or to copy
the tail).
It seems to me that whenever we can use strset we can also use strmov, so the
code should be symmetric.
It is currently enabled only for P4, but perhaps it makes sense elsewhere, too
(and to expand compact move by pieces/store by pieces).
I do not know.  Those single stringop insns are always fishy.
> > How do we ensure that value is sufficiently promoted? (for example for
> > pentiumpro that requires bigger alignment than promotion)?
> > Also don't we need to handle vec_value for AVX?
> I don't see a problem here actually.  I tried to keep logic for promoted_val
> unchanged, and just add VEC_PROMOTED_VAL, which doesn't use alignment info at
> all.  VEC_PROMOTED_VAL is promoted to move_mode, which is the biggest mode we
> could use for storing, so we always have enough-promoted value.

OK.
> > > @@ -23015,7 +23074,8 @@ ix86_expand_movmem (rtx dst, rtx src, rtx count_exp, rtx align_exp,
> > >      align = INTVAL (expected_align_exp);
> > >    /* ALIGN is the minimum of destination and source alignment, but we care here
> > >       just about destination alignment.  */
> > > -  else if (MEM_ALIGN (dst) > (unsigned HOST_WIDE_INT) align * BITS_PER_UNIT)
> > > +  else if (!issetmem
> > > +	   && MEM_ALIGN (dst) > (unsigned HOST_WIDE_INT) align * BITS_PER_UNIT)
> > >      align = MEM_ALIGN (dst) / BITS_PER_UNIT;
> > 
> > Hmm, this really should be pushed up into expansion and the alignment lookup should be improved...
> Well, for now I just to keep some things, like this, unchanged.  We could
> continue cleaning this all up.

Yep, lets handle other things incrementally.
> 
> > > +  /* For now vector-version of memset is generated only for memory zeroing, as
> > > +     creating of promoted vector value is very cheap in this case.  */
> > > +  if (issetmem && alg == vector_loop && val_exp != const0_rtx)
> > > +    alg = unrolled_loop;
> > 
> > I believe we had vector version of promote_reg once. Load of constant is easy, too.
> > Broadcasting a value is bit different story.
> Most of the cases where we need vector memset is zeroing a memory, that's why I
> focused on this case.  But you are right, promoting constant value would be
> simple as well.
> 
> > Thanks, the patch is OK with those changes.  It really seems like an
> > improvement in maintainability.  Thanks for working on it!
> Could you look at this version of the patch?  Didn't I forget about anything?
> 
> The patch was bootstrapped, make-checked and tested on specs 2k/2k6 (stability
> only).  The same testing was performed with -minline-all-stringops.

The patch is OK.
Honza



More information about the Gcc-patches mailing list