42040 – [ia64] Inappropriate address spills

Bug 42040 - [ia64] Inappropriate address spills

Summary: [ia64] Inappropriate address spills

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.5.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-11-14 02:38 UTC by Richard Henderson
Modified:	2018-03-10 02:39 UTC (History)
CC List:	3 users (show)

See Also:	41567 28490
Host:
Target:	ia64-linux
Build:
Known to work:
Known to fail:	4.4.3, 4.5.0
Last reconfirmed:	2010-03-12 19:22:03

Attachments
Untested patch that works for sje's testcase (313 bytes, patch) 2010-03-13 03:06 UTC, Jim Wilson	Details \| Diff
untested patch, imperfect solution (475 bytes, patch) 2010-03-15 04:11 UTC, Jim Wilson	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Henderson 2009-11-14 02:38:34 UTC

Looking at msmpeg4.s from PR 41567, build with -O2 -fPIC, we see


.LC2:
        data8   mv_tables#+72
...
        addl r14 = @gprel(.LC2), gp

We should not have spilled that address to the constant pool.

Add -mno-sdata and it gets worse -- LC2 moves from .sdata to
the .data.rel.ro section but we *still* use @gprel to address
it, and that relocation may well be out of range.

Comment 1 Steve Ellcey 2010-03-12 19:22:03 UTC

I once asked Jim Wilson about this but didn't get an answer.  Here is a pointer to that email.  Also included here is a short example that generates the gprel load.

My earlier example and question can be seen in:

http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00170.html


Short test case:

typedef struct table { int a; int b; int c; } table;
extern table mv_tables[2];
void foo()
{
    init_mv_table(&mv_tables[1]);
}

Comment 2 wilson@codesourcery.com 2010-03-13 03:03:50 UTC

Subject: Re:  [ia64] Inappropriate address spills

This broke between gcc-4.0.0 and gcc-4.1.2.  It appears to be the patch
for PR 28490.  There is a test in ia64_legitimate_constant_p for symbol
+offset, where we require that the offset have the low 14-bits be zero
which is not at all what we intended to test for.  This should be a test
for an offset that fits into a short add immediate instruction, which
takes a signed-14 bit value, aka CONST_OK_FOR_I, or, hmm, let's look
that up, I guess that would be satisfies_constraint_I nowadays.  Except
that satisfies_constraint_I wants an rtx, and we have an integer, so it
might be simpler to just substitute in the right arithmetic for the
constant check.  I'll attach an untested patch that works for the
testcase.

Jim

Comment 3 Jim Wilson 2010-03-13 03:06:30 UTC

Created attachment 20097 [details]
Untested patch that works for sje's testcase

Comment 4 wilson@codesourcery.com 2010-03-13 03:16:30 UTC

Subject: Re:  [ia64] Inappropriate address spills

Or maybe we should just accept any constant here?  I tried that, and for

typedef struct table { int a; int b; int c; } table;
extern table mv_tables[100000];
void foo()
{
  init_mv_table(&mv_tables[50000]);
}

I get
	addl r35 = @ltoff(mv_tables#+606208), r1
	;;
	ld8 r35 = [r35]
	;;
	adds r35 = -6208, r35
which is perhaps still better than using @gprel and the constant pool.

In this case, in ia64_legitimate_constant_p we can just return true if
aligned_offset_symbol_operand is true, and there is no need for the
addend local variable.

Jim

Comment 5 wilson@codesourcery.com 2010-03-13 08:23:24 UTC

Subject: Re:  [ia64] Inappropriate address spills

On third thought...
The code here makes sense if we were having problems with invalid
constant recombinations.  symbol+const gets split by ia64_expand_move
into symbol+highpart which is put in the got, and lowpart which is added
with adds.  If a late optimization pass saw a REG_EQUAL note and tried
to put the constant back together again, that would be inconvenient.
The ia64_legitimate_constant_p code would prevent that.  Unfortunately,
this code also prevents us to accepting a constant before
ia64_expand_move is called, which prevents us from splitting it in the
first place.  This looks to be more complicated than I hoped.

There is a variable we can check to see if we are in the expand phase,
currently_expanding_to_rtl.  Perhaps we need to accept any constant if
currently_expanding_to_rtl is true, and only accept symbol+highpart (the
current code) if it is false.  Or alternatively, if this problem only
happened at reload time, then checking reload_in_progress and/or
reload_completed might work.

I will have to look at PR28490 again and try to remember what the
original problem that we were fixing was.

Jim

Comment 6 Jim Wilson 2010-03-15 04:08:27 UTC

I can reproduce the original problem with 28490 by changing the line in ia64_legitimate_constant_p from
          return (addend & 0x3fff) == 0;
to
            return true;
and compiling the testcase with -O.  What happens is that in reload.c we see a REG_EQUAL note with symbol+8 and put it in reg_equiv_constant because it is accepted by LEGITIMATE_CONSTANT_P.  This then gets substituted into a SET_SRC.  Except that this insn now requires a scratch register, and we are after reload, so we end up aborting.

The traditional solution for this is to define LEGITIMATE_PIC_OPERAND_P so that it rejects symbol+const when it would require a scratch register.  See the sparc port for instance.  However, this works only if flag_pic is set.  IA-64 is PIC always, and we use flag_pic for shared libraries, so we can't just set it.  Or at least, I'm not sure what will happen if we set it, and I don't want to spend that much time looking into it.

Another traditional solution is to rewrite pic symbols in such a way that they can't be recognized as constants anymore, for instance, by putting them inside unspecs.  This is ugly, and I only mention it for completeness.  I don't recommend fixing the problem this way.

It might be tempting to check for reload_in_progress, but unfortunately the LEGITIMATE_CONSTANT_P check in reload1.c comes before reload_in_progress is set, so that won't work.  And messing with reload isn't a good idea, so we shouldn't try moving where reload_in_progress is set.

That leaves the currently_expanding_to_rtl solution.  We can accept any constant if that var is true, and we use the current code if that var is false.  I think this is the second best solution after the LEGITIMATE_PIC_OPERAND_P solution.

I tried this, and found that it is an imperfect solution.  I don't get a core dump for 28490, and I don't get a constant pool entry for the 42040 testcase.  Unfortunately, I do now get a constant pool entry for the 28490 testcase.  Because symbol+8 is no longer a LEGITIMATE_CONSTANT_P, reload decides to call force_const_mem and put it in reg_equiv_memory_loc.  So this solution doesn't completely eliminate the constant pool entries, but it will eliminate most of them.

To eliminate all of them, we would have to get the LEGITIMATE_PIC_OPERAND_P solution working.

Comment 7 Jim Wilson 2010-03-15 04:11:29 UTC

Created attachment 20106 [details]
untested patch, imperfect solution

Comment 8 Steve Ellcey 2010-03-17 22:09:24 UTC

I tried the patch and didn't have any problem bootstrapping and I didn't see any regressions.  It also fixed my small test case, but when I went back and tried some of the other tests from PR 28490 I still saw some of the bad gprel usages.  Here is a slightly more cutdown testcase from 28490 that still fails with the patch applied and when compiling with -O2.

extern const char basesyntax[];
extern const char arisyntax[];
typedef struct __jmp_buf_tag { }
jmp_buf[1];
struct jmploc { jmp_buf loc; };
readtoken1 (int firstc, char const *syntax, char *eofmark, int striptabs)
{
  int c = firstc;
    for (;;)
      {
        switch (syntax[c])
          {
          case 1:
            if (syntax == (basesyntax + 130))
              goto endword;
            syntax = (basesyntax + 130);
            parsebackq_oldreturn:;
          }
      }
endword:
if (syntax == (arisyntax + 130)) {
    struct jmploc jmploc;
    if (_setjmp (jmploc.loc))
      goto parsebackq_oldreturn;
}
}

Comment 9 wilson@codesourcery.com 2010-03-17 23:25:30 UTC

Subject: Re:  [ia64] Inappropriate address spills

On Wed, 2010-03-17 at 22:09 +0000, sje at cup dot hp dot com wrote:
> I tried the patch and didn't have any problem bootstrapping and I didn't see
> any regressions.  It also fixed my small test case, but when I went back and
> tried some of the other tests from PR 28490 I still saw some of the bad gprel
> usages.  Here is a slightly more cutdown testcase from 28490 that still fails
> with the patch applied and when compiling with -O2.

I already mentioned in my previous message that the patch does not
eliminate all instances of the gprel usages (constant-pool references).
It just eliminates most of them.  If you want to eliminate all of them,
then you need to get LEGITIMATE_PIC_OPERAND_P working, which in turn
will require setting flag_pic by default. which in turn will require
verifying that this doesn't break anything.  That is probably more work
than I have time for.

And by "fails", you mean that the compiler is still emitting undesirable
code in some cases.  The code isn't incorrect, just non-optimal.  And we
get less non-optimal code with the patch, so it seems to be better than
nothing.  Unless you want to try to write the better
LEGITIMATE_PIC_OPERAND_P solution.

Jim

Comment 10 Steve Ellcey 2010-03-17 23:47:22 UTC

Reading Richard's initial comment I thought the problem was that the code was (or could be) illegal because the relocation may be out of range and we shouldn't
use the gprel relocation for any of these constant pool references.

Comment 11 wilson@codesourcery.com 2010-03-18 00:12:43 UTC

Subject: Re:  [ia64] Inappropriate address spills

On Wed, 2010-03-17 at 23:47 +0000, sje at cup dot hp dot com wrote:
> Reading Richard's initial comment I thought the problem was that the code was
> (or could be) illegal because the relocation may be out of range and we
> shouldn't
> use the gprel relocation for any of these constant pool references.

The code is invalid only if you use -mno-sdata.

The only reason why Richard wanted to use -mno-sdata is because he saw
the non-optimal code, and hoped that -mno-sdata would fix that.  It
didn't.  It just made the problem worse by generating invalid code.  So
there is probably also a bug here in the handling of -mno-sdata, but
fixing that doesn't help Richard, because it doesn't solve the real
problem he was complaining about, which was the non-optimal code.

I suspect that the -mno-sdata bug is that ia64_expand_load_address does
  else if (sdata_symbolic_operand (src, VOIDmode))
    emit_insn (gen_load_gprel (dest, src));
and sdata_symbolic_operand fails to check TARGET_NO_SDATA, so it assumes
that constant pool entries are still in the sdata section, and we end up
with gprel references to items that aren't in the sdata section anymore,
which will fail at link time.  If we fix that, -mno-sdata will work, but
we will still have all of the inappropriate address spills (i.e.
non-optimal code) which is what the PR summary is complaining about.

Jim

Comment 12 Steve Ellcey 2010-03-22 20:48:41 UTC

Since the proposed patch to meant to address non-optimimal code generation I decided to try the patch with SPEC2006 and see if it helped the performance. On SPECint, I got a 3% slowdown, mostly due to 429.mcf slowing down a lot (20%).  471.omnetap also slowed down (7%) and nothing else changed much.  With SPECfp, we got a 0.36% speedup, with 434.zeusmp (4.4%) and 459.GemsFTDT (3.39%) speeding up the most and no significant slowdowns.  I will try to see what happened with 429.mcf and see why that slowed down so much.

Comment 13 wilson@codesourcery.com 2010-03-23 02:11:37 UTC

Subject: Re:  [ia64] Inappropriate address spills

On Mon, 2010-03-22 at 20:48 +0000, sje at cup dot hp dot com wrote:
> Since the proposed patch to meant to address non-optimimal code generation I
> decided to try the patch with SPEC2006 and see if it helped the performance.

This isn't a simple issue, performance wise.

If the compiler puts an address+offset into sdata, then we get something
like
        .sdata
        .align 8
.LC2:
        data8   mv_tables#+72
	.text
        addl r14 = @gprel(.LC2), gp
        ;;                                                                      
        ld8 r40 = [r14]

If the compiler emits the code that the ABI intended here, we get
something like
        addl r14 = @ltoff(mv_tables#), r1
        ;;                                                                      
        ld8 r14 = [r14]
        ;;                                                                      
        adds r40 = 72, r14
This sequence is one instruction longer.

We deliberately do this to avoid accidentally overflowing the got.  If
you access the same array with a thousand different offsets, then
instead of putting a thousand entries into the got, we put one entry in
the got, and then add in the offset.  So this is a smaller got, but an
extra add instruction that may add to latency if we can't hide it.

This code is working for got entries, but not when an address is forced
into constant pool.  So which one is better depends on what the code
looks like.  If you have a small number of variables and a large number
of offsets, then you will get a lot more memory references and a larger
got with the first alternative.  If you have a large number of variables
and a small number of offsets, then you will get more add instructions
with the second alternative.

This issue originally came up via PR 41567, which triggers a linker
error.  Perhaps that testcase would not have triggered the linker error
if we were using the second alternative.  I haven't looked at this PR as
yet, so I don't know the details of what is going on here.

Jim

Comment 14 Steve Ellcey 2010-03-24 21:34:53 UTC

Well it looks like the big SPEC slowdowns were a glitch.  The generated code is only slightly different and when I tried to reproduce the results I saw a slight
speed up in mcf instead of a slowdown.  I still got an overall slowdown in SPEC, but it was only 0.25%.