This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32
- From: "hubicka at ucw dot cz" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 22 Jun 2012 22:45:35 +0000
- Subject: [Bug tree-optimization/53726] [4.8 Regression] aes test performance drop for eembc_2_0_peak_32
- Auto-submitted: auto-generated
- References: <bug-53726-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53726
--- Comment #22 from Jan Hubicka <hubicka at ucw dot cz> 2012-06-22 22:45:35 UTC ---
> Yes. The question is what is "very small" and how can we possibly
As what is very small is defined in the i386.c in the cost tables.
I simply run a small benchmark testing library&GCC implementations to
fill it in. With new glibcs these tables may need upating. I updated them
on some to make glibc in SUSE 11.x.
PR 43052 is about memcmp. Memcpy/memset should behave more or less sanely.
(that also reminds me that I should look again at the SSE memcpy/memset
implementation for 4.8)
> detect "very small". For this testcase we can derive an upper bound
> of the size, which is 8, but the size is not constant. I think unless
> we know we can expand the variable-size memcpy with, say, three
> CPU instructions inline there is no reason to not call memcpy.
>
> Thus if the CPU could do
>
> tem = unaligned-load-8-bytes-from-src-and-ignore-faults;
> mask = generate mask from size
> store-unaligned-8-bytes-with-maxk
>
> then expanding the memcpy call inline would be a win I suppose.
> AVX has VMASKMOV, but I'm not sure using that for sizes <= 16
> bytes is profitable? Note that from the specs
> of VMASKMOV it seems the memory operands need to be aligned and
> the mask does not support byte-granularity.
>
> Which would leave us to inline expanding the case of at most 2 byte
> memcpy. Of course currently there is no way to record an upper
> bound for the size (we do not retain value-range information - but
> we of course should).
My secret plan was to make VRP produce value profiling histogram
when value is known to be with small range. Should be quite easy
to implement.
Honza