[Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32

hubicka at ucw dot cz gcc-bugzilla@gcc.gnu.org
Fri Jun 22 22:46:00 GMT 2012


--- Comment #22 from Jan Hubicka <hubicka at ucw dot cz> 2012-06-22 22:45:35 UTC ---
> Yes.  The question is what is "very small" and how can we possibly

As what is very small is defined in the i386.c in the cost tables.
I simply run a small benchmark testing library&GCC implementations to
fill it in.  With new glibcs these tables may need upating.  I updated them
on some to make glibc in SUSE 11.x.

PR  43052 is about memcmp. Memcpy/memset should behave more or less sanely.
(that also reminds me that I should look again at the SSE memcpy/memset
implementation for 4.8)

> detect "very small".  For this testcase we can derive an upper bound
> of the size, which is 8, but the size is not constant.  I think unless
> we know we can expand the variable-size memcpy with, say, three
> CPU instructions inline there is no reason to not call memcpy.
> Thus if the CPU could do
>   tem = unaligned-load-8-bytes-from-src-and-ignore-faults;
>   mask = generate mask from size
>   store-unaligned-8-bytes-with-maxk
> then expanding the memcpy call inline would be a win I suppose.
> AVX has VMASKMOV, but I'm not sure using that for sizes <= 16
> bytes is profitable?  Note that from the specs
> of VMASKMOV it seems the memory operands need to be aligned and
> the mask does not support byte-granularity.
> Which would leave us to inline expanding the case of at most 2 byte
> memcpy.  Of course currently there is no way to record an upper
> bound for the size (we do not retain value-range information - but
> we of course should).

My secret plan was to make VRP produce value profiling histogram
when value is known to be with small range.  Should be quite easy
to implement.


More information about the Gcc-bugs mailing list