This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: bswap PRs 69714, 67781


Hi Bernd,

First of all, my apologize for the late reply. I was in holidays the past week 
to celebrate Chinese new year.

On Friday, February 12, 2016 05:28:43 PM Bernd Schmidt wrote:
> PR69714 is an issue where the bswap pass makes an incorrect
> transformation on big-endian targets. The source has a 32-bit bswap, but
> PA doesn't have a pattern for that. Still, we recognize that there is a
> 16-bit bswap involved, and generate code for that - loading the halfword
> at offset 2 from the original memory, as per the proper big-endian
> correction.
> 
> The problem is that we recognized the rotation of the _high_ part, which
> is at offset 0 on big-endian. The symbolic number is 0x0304, rather than
> 0x0102 as it should be. Only the latter form should ever be matched.

Which is exactly what the patch for PR67781 was set out to do (see the if 
(BYTES_BIG_ENDIAN) block in find_bswap_or_nop.

The reason why the offset is wrong is due to another if (BYTES_BIG_ENDIAN) 
block in bswap_replace. I will check the testcase added with that latter 
block, my guess is that the change was trying to fix a similar issue to 
PR67781 and PR69714. When removing it the load in avcrc is done without an 
offset. I should have run the full testsuite also on a big endian system 
instead of a few selected testcases and a bootstrap in addition to the little 
endian bootstrap+testsuite. Lesson learned.

> The
> problem is caused by the patch for PR67781, which was intended to solve
> a different big-endian problem. Unfortunately, I think it is based on an
> incorrect analysis.
> 
> The real issue with the PR67781 testcase is in fact the masking loop,
> identified by Thomas in comment #7 for 67781.
> 
> for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++)
>    ;
> n->range = rsize;
> 
> If we have a value of 0x01020304, but a range of 5, it means that
> there's an "invisible" high-order byte that we don't care about. On
> little-endian, we can just ignore it. On big-endian, this implies that
> the data we're interested in is located at an offset. The code that does
> the replacements does not use the offset or bytepos fields, it assumes
> that the bytepos always matches that of the load instruction.

Yes, but the change in find_bswap_or_nop aims at checking that we have
0x05040302 or 0x02030405 for big endian targets and 0x04030201 or 0x01020304 
for little endian targets.

Before the "if (rsize < n->range)" block, cmpnop and cmpxchg are respectively 
0x0504030201 and 0x0102030405. Then for big endian it will only keep the 4 
least significant symbolic bytes of cmpxchg (if performs a bitwise and) and 
the 4 most significant symbolic bytes of cmpnop (it performs a right shift) so 
you'd get 0x05040302 for cmpnop and 0x02030405 for cmpxchg. Both would 
translate to a load at offset 0, and then a byteswap for the latter.

As said earlier, the problem is in bswap_replace which tries to adjust the 
address of the load for big endian targets by adding a load offset. With the 
change in find_bswap_or_nop, an offset is never needed because only pattern 
that correspond to a load at offset 0 are recognized. I kept for GCC 7 to 
change that to allow offset and recognize all sub-load and sub-bswap.

> The only
> offset we can introduce is the big-endian correction, but that assumes
> we're always dealing with lowparts.
> 
> So, I think the correct/conservative fix for both bugs is to revert the
> earlier change for PR67781, and then apply the following on top:
> 
> --- revert.tree-ssa-math-opts.c	2016-02-12 15:22:57.098895058 +0100
> +++ tree-ssa-math-opts.c	2016-02-12 15:23:08.482228474 +0100
> @@ -2473,10 +2473,14 @@ find_bswap_or_nop (gimple *stmt, struct
>     /* Find real size of result (highest non-zero byte).  */
>     if (n->base_addr)
>       {
> -      int rsize;
> +      unsigned HOST_WIDE_INT rsize;
>         uint64_t tmpn;
> 
>         for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER,
> rsize++);
> +      if (BYTES_BIG_ENDIAN && n->range != rsize)
> +	/* This implies an offset, which is currently not handled by
> +	   bswap_replace.  */
> +	return NULL;
>         n->range = rsize;
>       }

This works too yes with less optimizations for big endian. I'm fine with 
either solutions. This one is indeed a bit more conservative so I see the 
appeal to use it for GCC 5 and 6.

Best regards,

Thomas


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]