Bug 89794 - combine incorrectly forwards register value through auto-inc operation
Summary: combine incorrectly forwards register value through auto-inc operation
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: ---
Assignee: Segher Boessenkool
URL:
Keywords: wrong-code
Depends on:
Blocks: ohgee
  Show dependency treegraph
 
Reported: 2019-03-22 12:28 UTC by Zdenek Sojka
Modified: 2019-04-15 11:34 UTC (History)
5 users (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: armv7a-hardfloat-linux-gnueabi
Build:
Known to work:
Known to fail: 5.5.0, 6.5.0, 7.4.1, 8.3.1, 9.0
Last reconfirmed: 2019-03-25 00:00:00


Attachments
reduced testcase (236 bytes, text/plain)
2019-03-22 12:28 UTC, Zdenek Sojka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Sojka 2019-03-22 12:28:09 UTC
Created attachment 46010 [details]
reduced testcase

Output:
$ armv7a-hardfloat-linux-gnueabi-gcc -Og -fno-forward-propagate testcase.c -static
$ ./a.out 
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault

The instruction:
...
@ testcase.c:12:   __builtin_memset (&i, c, 2);
	movw	r1, #:lower16:c	@ tmp132,
	movt	r1, #:upper16:c	@ tmp132,
	ldr	ip, [r1]	@ c.2_7, c
	uxtb	r3, ip	@ c.2_7, c.2_7
	add	r3, r3, r3, lsl #8	@ tmp140, c.2_7, c.2_7,
	strh	r3, [sp]	@ movhi	@ tmp140, MEM[(void *)&i]

overwrites return address


$ armv7a-hardfloat-linux-gnueabi-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-armv7a-hardfloat/bin/armv7a-hardfloat-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-269795-checking-yes-rtl-df-extra-armv7a-hardfloat/bin/../libexec/gcc/armv7a-hardfloat-linux-gnueabi/9.0.1/lto-wrapper
Target: armv7a-hardfloat-linux-gnueabi
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --with-float=hard --with-fpu=vfpv4 --with-arch=armv7-a --with-sysroot=/usr/armv7a-hardfloat-linux-gnueabi --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=armv7a-hardfloat-linux-gnueabi --with-ld=/usr/bin/armv7a-hardfloat-linux-gnueabi-ld --with-as=/usr/bin/armv7a-hardfloat-linux-gnueabi-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-269795-checking-yes-rtl-df-extra-armv7a-hardfloat
Thread model: posix
gcc version 9.0.1 20190319 (experimental) (GCC)
Comment 1 Andrew Pinski 2019-03-22 23:12:24 UTC
Most likely related to PR 89434.
Comment 2 Jakub Jelinek 2019-03-25 09:22:08 UTC
ICEs already in r217911, so likely started with the introduction of __builtin_sub_overflow and before that has been just latent.
Comment 3 Jakub Jelinek 2019-03-25 11:11:56 UTC
Guess with PR89475 fix this will be latent, unless one disables ccp.
Anyway, to me this looks like a backend bug.  The function is leaf, but for some strange reason LRA uses the lr register and so lr needs to be pushed and poped, but that push/pop doesn't seem to be accounted for in the afp to sp elimination offset computation.
Comment 4 Richard Earnshaw 2019-04-09 10:30:03 UTC
(In reply to Jakub Jelinek from comment #3)
> Guess with PR89475 fix this will be latent, unless one disables ccp.
> Anyway, to me this looks like a backend bug.  The function is leaf, but for
> some strange reason LRA uses the lr register and so lr needs to be pushed
> and poped, but that push/pop doesn't seem to be accounted for in the afp to
> sp elimination offset computation.

I'm still seeing it in a build from 2019/04/04, so not latent.

Current suspect is the code in arm_compute_elimination_offset (in arm.c), where we eliminate from the arg pointer to the stack pointer.  The comment says that if there has been nothing pushed on the stack at all, then the offset result should be '-4' (and asserts strongly in the comments that this is the correct result) --- I don't understand why that should be the case.  However, that code is essentially 18 years old, so I'm not going to try messing with it until I understand it better.
Comment 5 Richard Earnshaw 2019-04-09 16:02:55 UTC
This appears to be combine missing a PRE_MODIFY operation.

After expand we have:

(insn 10 7 11 2 (set (reg:DI 127)
        (zero_extend:DI (mem/c:HI (plus:SI (reg/f:SI 103 afp)
                    (const_int 8 [0x8])) [1 i+0 S2 A32]))) "/tmp/test3.c":10:7 160 {zero_extendhidi2}
     (nil))
...
(insn 24 23 25 2 (set (reg:SI 133)
        (plus:SI (reg/f:SI 103 afp)
            (const_int 8 [0x8]))) "/tmp/test3.c":12:3 4 {*arm_addsi3}
     (nil))
...
(insn 33 32 34 2 (set (mem/c:HI (reg:SI 133) [0 MEM[(void *)&i]+0 S2 A16])
        (reg:HI 141)) "/tmp/test3.c":12:3 189 {*movhi_insn_arch4}
     (nil))

The auto-inc-dec pass transforms this into:

(insn 50 7 10 2 (set (reg/f:SI 133)
        (reg/f:SI 103 afp)) "/tmp/test3.c":10:7 -1
     (nil))
(insn 10 50 12 2 (set (reg:DI 127 [ i ])
        (zero_extend:DI (mem/c:HI (pre_modify:SI (reg/f:SI 133)
                    (plus:SI (reg/f:SI 133)
                        (const_int 8 [0x8]))) [1 i+0 S2 A32]))) "/tmp/test3.c":10:7 160 {zero_extendhidi2}
     (expr_list:REG_INC (reg/f:SI 133)
        (nil)))
...
(insn 33 49 34 2 (set (mem/c:HI (reg/f:SI 133) [0 MEM[(void *)&i]+0 S2 A16])
        (subreg:HI (reg:SI 140) 0)) "/tmp/test3.c":12:3 189 {*movhi_insn_arch4}
     (expr_list:REG_DEAD (reg:SI 140)
        (expr_list:REG_DEAD (reg/f:SI 133)
            (nil))))

And combine, missing the pre_modify, then substitutes insn 50 directly into insn 33

Trying 50 -> 33:
   50: r133:SI=afp:SI
   33: [r133:SI]=r140:SI#0
      REG_DEAD r140:SI
      REG_DEAD r133:SI
Successfully matched this instruction:
(set (mem/c:HI (reg/f:SI 103 afp) [0 MEM[(void *)&i]+0 S2 A16])
    (subreg:HI (reg:SI 140) 0))

Which is clearly wrong as it has now lost the pre-modify operation.
Comment 6 Richard Earnshaw 2019-04-09 17:21:03 UTC
There seems to be more to this than initially thought.  Another insn is in play.

(insn 12 10 14 2 (set (reg:SI 129)
        (bswap:SI (subreg:SI (reg:DI 127 [ i ]) 4))) "/tmp/test3.c":10:7 331 {*arm_rev}
     (expr_list:REG_DEAD (reg:DI 127 [ i ])
        (nil)))

Which uses the value loaded by the pre-modify instruction.

Combine manages to combine (and simplify insns 10 and 12, but the simplification is to

(set (reg:SI 129) (const_int 0))

and we've lost the pre-inc entirely.
Comment 7 Segher Boessenkool 2019-04-10 07:35:54 UTC
I have a patch.
Comment 8 Segher Boessenkool 2019-04-15 11:34:00 UTC
Author: segher
Date: Mon Apr 15 11:33:29 2019
New Revision: 270368

URL: https://gcc.gnu.org/viewcvs?rev=270368&root=gcc&view=rev
Log:
combine: Count auto_inc properly (PR89794)

The code that checks if an auto-increment from i0 or i1 is not lost is
a bit shaky.  The code to check the same for i2 is non-existent, and
cannot be implemented in a similar way at all.  So, this patch counts
all auto-increments, and makes sure we end up with the same number as
we started with.  This works because we still have a check that we
will not duplicate any.

We should do this some better way, but not while we are in stage 4.


	PR rtl-optimization/89794
	* combine.c (count_auto_inc): New function.
	(try_combine): Count how many auto_inc expressions there were in the
	original instructions.  Ensure we have the same number in the new
	instructions.  Remove the code that tried to ensure auto_inc side
	effects on i1 and i0 are not lost.

gcc/testsuite/
	PR rtl-optimization/89794
	* gcc.dg/torture/pr89794.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr89794.c
Modified:
    trunk/ChangeLog
    trunk/gcc/combine.c
    trunk/gcc/testsuite/ChangeLog