[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

wschmidt at linux dot ibm.com gcc-bugzilla@gcc.gnu.org
Sun May 31 20:31:59 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398

--- Comment #35 from wschmidt at linux dot ibm.com ---
Hi Jeff,

Just a quick comment.  We should never discuss raw runtimes of SPEC 
benchmarks on Power hardware in public.  It's okay to talk about 
improvements (>12% in this case), but not wall clock time.  Not a big 
deal, but there are some legal reasons regarding SPEC that cause us to 
be a little careful.

Thanks!
Bill

On 5/21/20 12:29 AM, guojiufu at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
>
> --- Comment #26 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
> Had a test on spec2017 xz_r by changing the specified loop manually, on
> ppc64le.
>
> original loop (this loops occur three times in code):
>                          while (++len != len_limit)
>                                  if (pb[len] != cur[len])
>                                          break;
> changed to loop:
> typedef long long __attribute__((may_alias)) TYPEE;
>
>    for(++len; len + sizeof(TYPEE) <= len_limit; len += sizeof(TYPEE)) {
>      long long a = *((TYPEE*)(cur+len));
>      long long b = *((TYPEE*)(pb+len));
>      if (a != b) {
>        break; //to optimize len can be move forward here.
>        }
>      }
>    for (;len != len_limit; ++len)
>      if (pb[len] != cur[len])
>        break;
>
> We can see xz_r runtime improved from 433s to 382s(>12%).
> It would be very valuable to do this kind of widening reading/checking.
>


More information about the Gcc-bugs mailing list