This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Two -mxop wrong-code fixes (PR target/56866)
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Richard Henderson <rth at redhat dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Sat, 27 Apr 2013 10:10:11 +0200
- Subject: Re: [PATCH] Two -mxop wrong-code fixes (PR target/56866)
- References: <20130426155035 dot GS28963 at tucnak dot redhat dot com>
On Fri, Apr 26, 2013 at 5:50 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> This patch fixes two wrong-code bugs with -mxop.
> One is that vpmacsdqh instruction can be only used for vec_widen_smult_odd_v4si
> but not vec_widen_umult_odd_v4si. Consider we have
> unsigned V4SImode h* with arguments
> { 3, 3, 3, 3 } h* { 0xaaaaaaab, 0xaaaaaaab, 0xaaaaaaab, 0xaaaaaaab }
> (but not known at compile time). If we use vpmacsdqh, it sign-extends
> the numbers and thus computes (3 * 0xffffffffaaaaaaabULL) >> 32,
> i.e. 0xffffffff, while we want (3 * 0xaaaaaaabULL) >> 32, i.e. 2.
>
> The second bug is in wrong shift count for immediate xop_rotr.
> We want element bitsize - immediate to transform the r>> immediate
> into r<< immediate, but (<ssescalarnum> * 8) is correct for that only
> for V4SImode - 32. For V2DImode it is 16 instead of the desired
> 64, for V8HImode it is 64 instead of the desired 16 and for V16QImode
> it is 128 instead of the desired 8.
>
> Bootstrapped/regtested on x86_64-linux, configured --with-arch=bdver2,
> fixes:
>
> -FAIL: gcc.c-torture/execute/pr51581-1.c execution, -O3 -fomit-frame-pointer
> -FAIL: gcc.c-torture/execute/pr51581-1.c execution, -O3 -fomit-frame-pointer -funroll-loops
> -FAIL: gcc.c-torture/execute/pr51581-1.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
> -FAIL: gcc.c-torture/execute/pr51581-1.c execution, -O3 -g
> -FAIL: gcc.c-torture/execute/pr51581-2.c execution, -O3 -fomit-frame-pointer
> -FAIL: gcc.c-torture/execute/pr51581-2.c execution, -O3 -fomit-frame-pointer -funroll-loops
> -FAIL: gcc.c-torture/execute/pr51581-2.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
> -FAIL: gcc.c-torture/execute/pr51581-2.c execution, -O3 -g
> -FAIL: gcc.c-torture/execute/pr53645.c execution, -O1
> -FAIL: gcc.c-torture/execute/pr53645.c execution, -O2
> -FAIL: gcc.c-torture/execute/pr53645.c execution, -O3 -fomit-frame-pointer
> -FAIL: gcc.c-torture/execute/pr53645.c execution, -O3 -fomit-frame-pointer -funroll-loops
> -FAIL: gcc.c-torture/execute/pr53645.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
> -FAIL: gcc.c-torture/execute/pr53645.c execution, -O3 -g
> -FAIL: gcc.c-torture/execute/pr53645.c execution, -Os
> -FAIL: gcc.c-torture/execute/pr53645.c execution, -Og -g
> -FAIL: gcc.c-torture/execute/pr53645.c execution, -O2 -flto -fno-use-linker-plugin -flto-partition=none
> -FAIL: gcc.c-torture/execute/pr53645.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> -FAIL: gcc.c-torture/execute/pr56866.c execution, -O3 -fomit-frame-pointer
> -FAIL: gcc.c-torture/execute/pr56866.c execution, -O3 -fomit-frame-pointer -funroll-loops
> -FAIL: gcc.c-torture/execute/pr56866.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
> -FAIL: gcc.c-torture/execute/pr56866.c execution, -O3 -g
> -FAIL: gcc.dg/vect/pr51581-1.c execution test
> -FAIL: gcc.dg/vect/pr51581-2.c execution test
> -FAIL: gcc.dg/vect/pr51581-3.c execution test
> -FAIL: gcc.dg/vect/pr51581-1.c -flto execution test
> -FAIL: gcc.dg/vect/pr51581-2.c -flto execution test
> -FAIL: gcc.dg/vect/pr51581-3.c -flto execution test
> -FAIL: gcc.target/i386/avx-mul-1.c execution test
> -FAIL: gcc.target/i386/avx-pr51581-1.c execution test
> -FAIL: gcc.target/i386/avx-pr51581-2.c execution test
> -FAIL: gcc.target/i386/pr56866.c execution test
> -FAIL: gcc.target/i386/sse2-mul-1.c execution test
> -FAIL: gcc.target/i386/sse4_1-mul-1.c execution test
> -FAIL: gcc.target/i386/xop-mul-1.c execution test
>
> failures that appear with stock gcc just with the testsuite/
> part of the patch applied. Ok for trunk/4.8 and partly for 4.7
> (the i386.c bug has been introduced in 2012-06-25 but the sse.md
> bug existed in 4.7 already)?
>
> 2013-04-26 Jakub Jelinek <jakub@redhat.com>
>
> PR target/56866
> * config/i386/i386.c (ix86_expand_mul_widen_evenodd): Don't
> use xop_pmacsdqh if uns_p.
> * config/i386/sse.md (xop_rotr<mode>3): Fix up computation of
> the immediate rotate count.
>
> * gcc.c-torture/execute/pr56866.c: New test.
> * gcc.target/i386/pr56866.c: New test.
>
> --- gcc/config/i386/i386.c.jj 2013-04-22 10:26:22.000000000 +0200
> +++ gcc/config/i386/i386.c 2013-04-26 10:28:51.793534370 +0200
> @@ -40841,7 +40841,7 @@ ix86_expand_mul_widen_evenodd (rtx dest,
> the even slots. For some cpus this is faster than a PSHUFD. */
> if (odd_p)
> {
> - if (TARGET_XOP && mode == V4SImode)
> + if (TARGET_XOP && mode == V4SImode && !uns_p)
Please add a small comment on why !uns_p is needed here.
OK everywhere with the above addition.
Thanks,
Uros.