This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

PATCH: PR target/46519: Missing vzeroupper


On Wed, Nov 17, 2010 at 8:11 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Nov 17, 2010 at 3:36 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Nov 17, 2010 at 11:23 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Wed, Nov 17, 2010 at 2:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>>> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>>>
>>>>>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>>>>>
>>>>>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
>>>>>> move_or_delete_vzeroupper_2. ?This patch does it.
>>>>>
>>>>> Huh? The loop does simple linear scan of all insns in the bb, so it
>>>>> can't miss BB_END. IIUC, in your case the bb does not have BB_END
>>>>> (bb), but it has NEXT_INSN (BB_END (bb))?
>>>>
>>>> It has BB_END, but it won't be visited by NEXT_INSN starting from
>>>> BB_HEAD. insn != NEXT_INSN (BB_END (bb)) is used to check the
>>>> end of the BB everywhere in gcc.
>>>>
>>>>> Can you please provide a test case that illustrates this?
>>>>>
>>>>
>>>> I am enclosing a work in progress. ?We noticed that we are
>>>> missing a few vzerouppers at -O3 on SPEC CPU 2K/2006.
>>>> One isssue is we may have
>>>>
>>>> foo:
>>>>
>>>> ? ? ? call bar <<<<< Missing vzeroupper
>>>>
>>>> ? ? ? 256bit vectorized insn
>>>> ? ? ? goto foo
>>>>
>>>> We miss vzeroupper before call bar. ?We don't have a small testcase.
>>>> But this patch fixes this case by inspection. We are checking other
>>>> cases.
>>>
>>> @@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool
>>> upper_128bits_set)
>>> ? ? ? ? ? ? bb->index, upper_128bits_set);
>>>
>>> ? insn = BB_HEAD (bb);
>>> + ?last = NEXT_INSN (BB_END (bb));
>>> ? while (insn != BB_END (bb))
>>> ? ? {
>>> ? ? ? insn = NEXT_INSN (insn);
>>> + ? ? ?if (insn == last)
>>> + ? ? ? break;
>>>
>>> ? ? ? if (!NONDEBUG_INSN_P (insn))
>>> ? ? ? ?continue;
>>>
>>> The change above is not needed. The new check is never triggered - the
>>> loop terminates when "insn == BB_END (bb)" at "while", so I fail to
>>> see why additional termination for "NEXT_INSN (insn) == NEXT_INSN
>>> (BB_END (bb))" is needed.
>>
>> Here is the patch for
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519
>>
>> We have 2 blocks pointing to each others. This patch first scans
>> all blocks without moving vzeroupper so that we can have accurate
>> information about upper 128bits at block entry.
>>
>>> (The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped
>>> with NEXT_INSN.)
>>
>> Please try gcc.target/i386/avx-vzeroupper-20.c. ?It will
>> trigger this condition.
>>
>>> @@ -10970,7 +10973,7 @@ ix86_expand_epilogue (int style)
>>>
>>> ? /* Emit vzeroupper if needed. ?*/
>>> ? if (TARGET_VZEROUPPER
>>> - ? ? ?&& cfun->machine->use_avx256_p
>>> + ? ? ?&& (cfun->machine->use_avx256_p || flag_tree_vectorize)
>>> ? ? ? && !cfun->machine->caller_return_avx256_p)
>>> ? ? {
>>> ? ? ? cfun->machine->use_vzeroupper_p = 1;
>>> @@ -21661,7 +21664,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
>>> ? ? }
>>>
>>> ? /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration. ?*/
>>> - ?if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p)
>>> + ?if (TARGET_VZEROUPPER
>>> + ? ? ?&& (cfun->machine->use_avx256_p || flag_tree_vectorize))
>>>
>>> Decorate *ALL* calls with CALL_NEEDS_VZEROUPPER with
>>> -ftree-vectorize?! It looks that parts (or state machine) that set
>>> ...->use_avx256_p flag should be fixed.
>>
>> There are:
>>
>> foo:
>>
>> ? ? ?call bar <<<<< Missing vzeroupper
>>
>> ? ? ?256bit vectorized insn
>> ? ? ?goto foo
>>
>> I couldn't find a hook to set use_avx256_p before RTL expansion
>> starts.
>>
>>> ? ? {
>>> ? ? ? rtx unspec;
>>> ? ? ? int avx256;
>>> diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
>>> b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
>>> new file mode 100644
>>> index 0000000..3301083
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
>>> @@ -0,0 +1,13 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3 -mavx -mtune=generic -dp" } */
>>> +
>>> +extern void free (void *);
>>> +void
>>> +bar (void *ncstrp)
>>> +{
>>> + ?if(ncstrp==((void *)0))
>>> + ? ?return;
>>> + ?free(ncstrp);
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
>>>
>>> Hm, this testcase doesn't go together with the above change. There is
>>> no vectorization involved, and the scan checks that vzeroupper is NOT
>>> emitted.
>>>
>>
>> This testcase is for
>>
>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>

I sent the patch without comments too soon.

As discussed in PR, setting and checking use_avx256_p isn't reliable.
This patch removes use_avx256_p.  Any comments?

Thanks.


-- 
H.J.
---
gcc/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c (block_info_def): Add scanned and no_avx256.
	(move_or_delete_vzeroupper_2): Properly check the end of basic
	block.  Call note_stores only if no_avx256 is false.
	(scan_live_upper_128bits_2): New.
	(scan_live_upper_128bits_1): Likewise.
	(move_or_delete_vzeroupper): Call scan_live_upper_128bits_1 to
	scan predecessor blocks of all exit points.
	(use_avx256_p): Removed.
	(init_cumulative_args): Don't set use_avx256_p.
	(ix86_function_arg): Likewise.
	(ix86_expand_move): Likewise.
	(ix86_expand_vector_move_misalign): Likewise.
	(ix86_local_alignment): Likewise.
	(ix86_minimum_alignment): Likewise.
	(ix86_expand_epilogue): Don't check use_avx256_p when generating
	vzeroupper.
	(ix86_expand_call): Likewise.

	* config/i386/i386.h (machine_function): Remove use_avx256_p.

gcc/testsuite/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* gcc.target/i386/avx-vzeroupper-10.c: Expect no avx_vzeroupper.
	* gcc.target/i386/avx-vzeroupper-11.c: Likewise.

	* gcc.target/i386/avx-vzeroupper-20.c: New.

Attachment: gcc-pr46519-2.patch
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]