This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Question about PR 48814 and ivopts and post-increment
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: Steve Ellcey <sellcey at imgtec dot com>
- Cc: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Fri, 4 Dec 2015 10:48:17 +0800
- Subject: Re: Question about PR 48814 and ivopts and post-increment
- Authentication-results: sourceware.org; auth=none
- References: <b579d986-4948-42fe-817a-939807204ad4 at BAMAIL02 dot ba dot imgtec dot org>
On Wed, Dec 2, 2015 at 5:11 AM, Steve Ellcey <sellcey@imgtec.com> wrote:
>
> I have a question involving ivopts and PR 48814, which was a fix for
> the post increment operation. Prior to the fix for PR 48814, MIPS
> would generate this loop for strcmp (C code from glibc):
>
> $L4:
> lbu $3,0($4)
> lbu $2,0($5)
> addiu $4,$4,1
> beq $3,$0,$L7
> addiu $5,$5,1 # This is a branch delay slot
> beq $3,$2,$L4
> subu $2,$3,$2 # This is a branch delay slot (only used after loop)
>
>
> With the current top-of-tree we now generate:
>
> addiu $4,$4,1
> $L8:
> lbu $3,-1($4)
> addiu $5,$5,1
> beq $3,$0,$L7
> lbu $2,-1($5) # This is a branch delay slot
> beq $3,$2,$L8
> addiu $4,$4,1 # This is a branch delay slot
>
> subu $2,$3,$2 # Done only once now after exiting loop.
>
> The main problem with the new loop is that the beq comparing $2 and $3
> is right before the load of $2 so there can be a delay due to the time
> that the load takes. The ideal code would probably be:
>
> addiu $4,$4,1
> $L8:
> lbu $3,-1($4)
> lbu $2,0($5) # This is a branch delay slot
> beq $3,$0,$L7
> addiu $5,$5,1
> beq $3,$2,$L8
> addiu $4,$4,1 # This is a branch delay slot
>
> subu $2,$3,$2 # Done only once now after exiting loop.
>
> Where we load $2 earlier (using a 0 offset instead of a -1 offset) and
> then do the increment of $5 after using it in the load. The problem
> is that this isn't something that can just be done in the instruction
> scheduler because we are changing one of the instructions (to modify the
> offset) in addition to rearranging them and I don't think the instruction
> scheduler supports that.
Hmm, I think Bernd introduced sched_flag !DONT_BREAK_DEPENDENCIES to
resolve dependence by modifying address expression. I think this is
the same problem, what's needed is to model dependence using that
framework. Maybe delay slot is special here?
>
> It looks like is the ivopts code that decided to increment the registers
> first and use the -1 offsets in the loads after instead of using 0 offsets
> and then incrementing the offsets after the loads but I can't figure out
> how or why ivopts made that decision.
>
> Does anyone have any ideas on how I could 'fix' GCC to make it generate
> the ideal code? Is there some way to do it in the instruction scheduler?
> Is there some way to modify ivopts to fix this by modifying the cost
It's likely IVO just peaks the first candidate when it runs into a
tie. Could you please post preprocessed source code so that I can
have a look? I am not familiar with glibc. Thanks.
> analysis somehow? Could I (partially) undo the fix for PR 48814?
> According to the final comment in that bugzilla report the change is
> really only needed for C11 and that the change does degrade the optimizer
> so could we go back to the old behaviour for C89/C99? The code in ivopts
I saw this change caused code size regression on arm embedded processors.
Thanks,
bin
> has changed enough since the patch was applied I couldn't immediately see
> how to do that in the ToT sources.
>
> Steve Ellcey
> sellcey@imgtec.com