[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison
wschmidt at linux dot ibm.com
gcc-bugzilla@gcc.gnu.org
Sun May 31 20:31:59 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
--- Comment #35 from wschmidt at linux dot ibm.com ---
Hi Jeff,
Just a quick comment. We should never discuss raw runtimes of SPEC
benchmarks on Power hardware in public. It's okay to talk about
improvements (>12% in this case), but not wall clock time. Not a big
deal, but there are some legal reasons regarding SPEC that cause us to
be a little careful.
Thanks!
Bill
On 5/21/20 12:29 AM, guojiufu at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
>
> --- Comment #26 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
> Had a test on spec2017 xz_r by changing the specified loop manually, on
> ppc64le.
>
> original loop (this loops occur three times in code):
> while (++len != len_limit)
> if (pb[len] != cur[len])
> break;
> changed to loop:
> typedef long long __attribute__((may_alias)) TYPEE;
>
> for(++len; len + sizeof(TYPEE) <= len_limit; len += sizeof(TYPEE)) {
> long long a = *((TYPEE*)(cur+len));
> long long b = *((TYPEE*)(pb+len));
> if (a != b) {
> break; //to optimize len can be move forward here.
> }
> }
> for (;len != len_limit; ++len)
> if (pb[len] != cur[len])
> break;
>
> We can see xz_r runtime improved from 433s to 382s(>12%).
> It would be very valuable to do this kind of widening reading/checking.
>
More information about the Gcc-bugs
mailing list