Bug 30798 - mode_dependent_address_p not checked when simplifying subreg
mode_dependent_address_p not checked when simplifying subreg
Status: UNCONFIRMED
Product: gcc
Classification: Unclassified
Component: middle-end
4.2.0
: P3 normal
: ---
Assigned To: Not yet assigned to anyone
: wrong-code
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-02-14 22:12 UTC by Charles J. Tabony
Modified: 2014-08-27 23:07 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Charles J. Tabony 2007-02-14 22:12:15 UTC
GCC is producing wrong code for the following C code on my port:

int f(short *p){
  int sum, i;
  sum = 0;
  for(i = 0; i < 256; i++){
    sum += *p++ & 0xFF;
  }
  return sum;
}

The problem boils down to this RTL

(zero_extend:SI (subreg:QI (mem:HI (post_inc:SI (reg/v/f:SI 61 [ p ])) [2 S2 A16]) 0))

being replaced by this RTL

(zero_extend:SI (mem:QI (post_inc:SI (reg/v/f:SI 0 r0 [orig:61 p ] [61])) [2 S1 A16]))

which changes the post-increment amount from 2 to 1.  I tracked down the problem.  Reload calls cleanup_subreg_operands for each insn.  This leads to the following sequence of nested calls:

reload1.c: reload
final.c: cleanup_subreg_operands
final.c: alter_subreg
emit-rtl.c: adjust_address_1
emit-rtl.c: change_address_1

None of these check mode_dependent_address_p.

At some point, the post_inc should be replaced with a post_modify if available or a separate add insn.  Theoretically, a machine could have neither post_modify nor add with immediate and the separate add could need reloading.  I believe the question is "Who's responsibility is it to ensure the correct code?"  Combine creates the subreg.  Perhaps combine should instead create the post_modify or separate add.  Perhaps this job should be tossed onto the pile that is reload.  Perhaps some other pass in between should ensure that any such subregs have been transformed.

I believe this is one of those bugs that has been there forever but is only exposed when more optimization is done in tree land.
Comment 1 Charles J. Tabony 2007-03-04 02:25:37 UTC
For some reason, defining WORD_REGISTER_OPERATIONS prevents Combine from transforming the HI load followed by the AND with 0xFF into a zero-extending QI load.  I don't know why this would be.  I think not defining WORD_REGISTER_OPERATIONS should always be safe.  I'm sure defining WORD_REGISTER_OPERATIONS is somehow masking the real problem.

Another odd thing is that BlackFin (bfin) produces the same code up to Life1, just before Combine, but Combine does not perform the aforementioned transformation even though bfin does not define WORD_REGISTER_OPERATIONS.  bfin has a zero-extending QI load pattern and GO_IF_LEGITIMATE_ADDRESS accepts a QI mode POST_INC.
Comment 2 Charles J. Tabony 2007-03-11 18:28:22 UTC
Hhmm.  I have made a few other changes and now when I don't define WORD_REGISTER_OPERATIONS, I get correct code.  I get an HI load followed by a QI -> SI zero extend, not the preferred POST_MODIFY.  I looked at all the changes I made since my last comment and I don't see how any could affect this.  I think this is a latent bug waiting to be exposed that for some reason is difficult to trigger.

Now combine creates the subreg, but the subreg is spilled during the greg pass.
Comment 3 Charles J. Tabony 2007-03-11 18:57:26 UTC
By the way, I think emit-rtl.c: change_address_1 could at least use an assert that if the modes are different and the addresses are the same, then mode_dependent_address_p is not true.  That is unless change_address_1 can be used to change the semantics of a mem.
Comment 4 Andrew Pinski 2014-08-27 23:07:02 UTC
I have a similar issue with a GCC 4.7 based one with the aarch64 target backported to it (and a few other patches installed too).

We start out with:
(insn 6074 7812 7824 701 (set (reg:DI 5635)
        (ashift:DI (reg:DI 5253)
            (subreg:QI (mem:HI (post_inc:DI (reg:DI 5418)) [0 MEM[base: D.35344_2, offset: 0B]+0 S2 A16]) 0))) reload1.c:1280 526 {*aarch64_ashl_sisd_or_int_di3}
     (expr_list:REG_INC (reg:DI 5418)
        (nil)))

And reload comes along and changes that to:
(insn 12175 7812 6074 701 (set (reg:QI 7 x7)
        (mem:QI (post_inc:DI (reg:DI 3 x3 [5418])) [0 MEM[base: D.35344_2, offset: 0B]+0 S1 A16])) reload1.c:1280 30 {*movqi_aarch64}
     (expr_list:REG_INC (reg:DI 3 x3 [5418])
        (nil)))

(insn 6074 12175 7824 701 (set (reg:DI 6 x6 [5635])
        (ashift:DI (reg:DI 2 x2 [5253])
            (reg:QI 7 x7))) reload1.c:1280 526 {*aarch64_ashl_sisd_or_int_di3}
     (nil))