This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

PATCH: Properly check the end of basic block


On Wed, Nov 17, 2010 at 11:23 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Wed, Nov 17, 2010 at 2:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>
>>>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>>>
>>>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
>>>> move_or_delete_vzeroupper_2. ?This patch does it.
>>>
>>> Huh? The loop does simple linear scan of all insns in the bb, so it
>>> can't miss BB_END. IIUC, in your case the bb does not have BB_END
>>> (bb), but it has NEXT_INSN (BB_END (bb))?
>>
>> It has BB_END, but it won't be visited by NEXT_INSN starting from
>> BB_HEAD. insn != NEXT_INSN (BB_END (bb)) is used to check the
>> end of the BB everywhere in gcc.
>>
>>> Can you please provide a test case that illustrates this?
>>>
>>
>> I am enclosing a work in progress. ?We noticed that we are
>> missing a few vzerouppers at -O3 on SPEC CPU 2K/2006.
>> One isssue is we may have
>>
>> foo:
>>
>> ? ? ? call bar <<<<< Missing vzeroupper
>>
>> ? ? ? 256bit vectorized insn
>> ? ? ? goto foo
>>
>> We miss vzeroupper before call bar. ?We don't have a small testcase.
>> But this patch fixes this case by inspection. We are checking other
>> cases.
>
> @@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool
> upper_128bits_set)
> ? ? ? ? ? ? bb->index, upper_128bits_set);
>
> ? insn = BB_HEAD (bb);
> + ?last = NEXT_INSN (BB_END (bb));
> ? while (insn != BB_END (bb))
> ? ? {
> ? ? ? insn = NEXT_INSN (insn);
> + ? ? ?if (insn == last)
> + ? ? ? break;
>
> ? ? ? if (!NONDEBUG_INSN_P (insn))
> ? ? ? ?continue;
>
> The change above is not needed. The new check is never triggered - the
> loop terminates when "insn == BB_END (bb)" at "while", so I fail to
> see why additional termination for "NEXT_INSN (insn) == NEXT_INSN
> (BB_END (bb))" is needed.

Here is the patch for

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519

We have 2 blocks pointing to each others. This patch first scans
all blocks without moving vzeroupper so that we can have accurate
information about upper 128bits at block entry.

> (The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped
> with NEXT_INSN.)

Please try gcc.target/i386/avx-vzeroupper-20.c.  It will
trigger this condition.

> @@ -10970,7 +10973,7 @@ ix86_expand_epilogue (int style)
>
> ? /* Emit vzeroupper if needed. ?*/
> ? if (TARGET_VZEROUPPER
> - ? ? ?&& cfun->machine->use_avx256_p
> + ? ? ?&& (cfun->machine->use_avx256_p || flag_tree_vectorize)
> ? ? ? && !cfun->machine->caller_return_avx256_p)
> ? ? {
> ? ? ? cfun->machine->use_vzeroupper_p = 1;
> @@ -21661,7 +21664,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
> ? ? }
>
> ? /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration. ?*/
> - ?if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p)
> + ?if (TARGET_VZEROUPPER
> + ? ? ?&& (cfun->machine->use_avx256_p || flag_tree_vectorize))
>
> Decorate *ALL* calls with CALL_NEEDS_VZEROUPPER with
> -ftree-vectorize?! It looks that parts (or state machine) that set
> ...->use_avx256_p flag should be fixed.

There are:

foo:

      call bar <<<<< Missing vzeroupper

      256bit vectorized insn
      goto foo

I couldn't find a hook to set use_avx256_p before RTL expansion
starts.

> ? ? {
> ? ? ? rtx unspec;
> ? ? ? int avx256;
> diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
> b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
> new file mode 100644
> index 0000000..3301083
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx -mtune=generic -dp" } */
> +
> +extern void free (void *);
> +void
> +bar (void *ncstrp)
> +{
> + ?if(ncstrp==((void *)0))
> + ? ?return;
> + ?free(ncstrp);
> +}
> +
> +/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
>
> Hm, this testcase doesn't go together with the above change. There is
> no vectorization involved, and the scan checks that vzeroupper is NOT
> emitted.
>

This testcase is for

insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))

-- 
H.J.
---
gcc/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c (block_info_def): Add scaned and no_avx256.
	(move_or_delete_vzeroupper_2): Properly check the end of basic
	block.  Call note_stores only if no_avx256 is false.
	(scan_live_upper_128bits_2): New.
	(scan_live_upper_128bits_1): Likewise.
	(move_or_delete_vzeroupper): Call scan_live_upper_128bits_1 to
	scan predecessor blocks of all exit points.
	(ix86_expand_epilogue): Also check flag_tree_vectorize when
	generating vzeroupper.
	(ix86_expand_call): Likewise.

gcc/testsuite/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* gcc.target/i386/avx-vzeroupper-20.c: New.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]