This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Question about PR 48814 and ivopts and post-increment


On Wed, Dec 2, 2015 at 5:11 AM, Steve Ellcey <sellcey@imgtec.com> wrote:
>
> I have a question involving ivopts and PR 48814, which was a fix for
> the post increment operation.  Prior to the fix for PR 48814, MIPS
> would generate this loop for strcmp (C code from glibc):
>
> $L4:
>         lbu     $3,0($4)
>         lbu     $2,0($5)
>         addiu   $4,$4,1
>         beq     $3,$0,$L7
>         addiu   $5,$5,1    # This is a branch delay slot
>         beq     $3,$2,$L4
>         subu    $2,$3,$2   # This is a branch delay slot (only used after loop)
>
>
> With the current top-of-tree we now generate:
>
>         addiu   $4,$4,1
> $L8:
>         lbu     $3,-1($4)
>         addiu   $5,$5,1
>         beq     $3,$0,$L7
>         lbu     $2,-1($5)  # This is a branch delay slot
>         beq     $3,$2,$L8
>         addiu   $4,$4,1    # This is a branch delay slot
>
>         subu    $2,$3,$2   # Done only once now after exiting loop.
>
> The main problem with the new loop is that the beq comparing $2 and $3
> is right before the load of $2 so there can be a delay due to the time
> that the load takes.  The ideal code would probably be:
>
>         addiu   $4,$4,1
> $L8:
>         lbu     $3,-1($4)
>         lbu     $2,0($5)  # This is a branch delay slot
>         beq     $3,$0,$L7
>         addiu   $5,$5,1
>         beq     $3,$2,$L8
>         addiu   $4,$4,1    # This is a branch delay slot
>
>         subu    $2,$3,$2   # Done only once now after exiting loop.
>
> Where we load $2 earlier (using a 0 offset instead of a -1 offset) and
> then do the increment of $5 after using it in the load.  The problem
> is that this isn't something that can just be done in the instruction
> scheduler because we are changing one of the instructions (to modify the
> offset) in addition to rearranging them and I don't think the instruction
> scheduler supports that.
Hmm, I think Bernd introduced sched_flag !DONT_BREAK_DEPENDENCIES to
resolve dependence by modifying address expression.  I think this is
the same problem, what's needed is to model dependence using that
framework.  Maybe delay slot is special here?

>
> It looks like is the ivopts code that decided to increment the registers
> first and use the -1 offsets in the loads after instead of using 0 offsets
> and then incrementing the offsets after the loads but I can't figure out
> how or why ivopts made that decision.
>
> Does anyone have any ideas on how I could 'fix' GCC to make it generate
> the ideal code?  Is there some way to do it in the instruction scheduler?
> Is there some way to modify ivopts to fix this by modifying the cost
It's likely IVO just peaks the first candidate when it runs into a
tie.  Could you please post preprocessed source code so that I can
have a look?  I am not familiar with glibc.  Thanks.

> analysis somehow?  Could I (partially) undo the fix for PR 48814?
> According to the final comment in that bugzilla report the change is
> really only needed for C11 and that the change does degrade the optimizer
> so could we go back to the old behaviour for C89/C99?  The code in ivopts
I saw this change caused code size regression on arm embedded processors.

Thanks,
bin

> has changed enough since the patch was applied I couldn't immediately see
> how to do that in the ToT sources.
>
> Steve Ellcey
> sellcey@imgtec.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]