This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/65709] [5 Regression] Bad code for LZ4 decompression with -O3 on x86_64


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709

Jeffrey Walton <noloader at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |noloader at gmail dot com

--- Comment #14 from Jeffrey Walton <noloader at gmail dot com> ---
(In reply to Jakub Jelinek from comment #10)
> (In reply to Yann Collet from comment #9)
> > Looking at the assembler generated, we see that GCC generates a MOVDQA
> > instruction for it.
> > > movdqa (%rdi,%rax,1),%xmm0
> > > $rdi=0x7fffea4b53e6
> > > $rax=0x0
> > 
> > This seems wrong on 2 levels :
> > 
> > - The function only wants to copy 8 bytes. MOVDQA works on a full SSE
> > register, which is 16 bytes. This spell troubles, if only for buffer
> > boundaries checks : the algorithm uses 8 bytes because it knows it can
> > safely read/write that size without crossing buffer limits. With 16 bytes,
> > no such guarantee.
> 
> The function has been inlined into the callers, like:
>       do { LZ4_copy8(d,s); d+=8; s+=8; } while (d<e);
> and this loop is then vectorized.  The vectorization prologue of course has
> to adjust if s is not 16 byte aligned, but it can assume that both s and d
> are 8 byte aligned (otherwise it is undefined behavior)...
Forgive my barging in Jakub. I was referred to this issue and comment from
another issue.

Its not clear to me where the leap is made that its OK use vmovdqa. Are you
stating (unequivocally, for folks like me) that is does *not* matter what the
alignment is in the C-sources; and that the prologue ensures both 's' and 's'
are eventually 16-byte aligned when vmovdqa is invoked. That is, when we see
vmovdqa used, we know the alignment is correct (and at 16-bytes).

Sorry to have to ask.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]