Bug 65215 - [5 Regression] Bswap load miscompilation
Summary: [5 Regression] Bswap load miscompilation
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 5.0
: P1 normal
Target Milestone: 5.0
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2015-02-25 22:48 UTC by Jakub Jelinek
Modified: 2015-02-26 21:04 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-02-25 00:00:00


Attachments
gcc5-pr65215.patch (2.45 KB, patch)
2015-02-26 09:56 UTC, Jakub Jelinek
Details | Diff
gcc5-pr65215.patch (2.89 KB, patch)
2015-02-26 10:19 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2015-02-25 22:48:17 UTC
Starting with r210843, following testcase is miscompiled on big-endian e.g. at -O2:
static inline unsigned int
foo (unsigned int x)
{
  return (x >> 24) | ((x >> 8) & 0xFF00) | ((x << 8) & 0xFF0000) | (x << 24);
}

__attribute__((noinline, noclone)) unsigned int
bar (unsigned long *x)
{
  return foo (*x);
}

int
main ()
{
  unsigned long l = foo (0xdeadbeefU) | 0xfeedbea800000000ULL;
  if (bar (&l) != 0xdeadbeefU)
    __builtin_abort ();
  return 0;
}
Comment 1 Jakub Jelinek 2015-02-26 09:56:31 UTC
Created attachment 34879 [details]
gcc5-pr65215.patch

Untested fix.  There are still issues left, e.g. I can't understand the "bswap &&" part in
      if (bswap
          && align < GET_MODE_ALIGNMENT (TYPE_MODE (load_type))
          && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align))
        return false;
Don't you use the new MEM_REF even for the !bswap (aka nop) case?  So, I don't see how it would be safe to generate that.
And the testsuite coverage of this is definitely suboptimal, from endianity POV, bitfields etc.
Comment 2 Jakub Jelinek 2015-02-26 10:07:00 UTC
I mean something like:
unsigned int
foo (unsigned char *p)
{
  return ((unsigned int) p[0] << 24) | (p[1] << 16) | (p[2] << 8) | p[3];
}
on strict align big endian machines, let me add another testcase.
Comment 3 Jakub Jelinek 2015-02-26 10:19:48 UTC
Created attachment 34880 [details]
gcc5-pr65215.patch

Updated patch that performs the alignment check unconditionally.
Comment 4 Richard Biener 2015-02-26 10:33:33 UTC
(In reply to Jakub Jelinek from comment #1)
> Created attachment 34879 [details]
> gcc5-pr65215.patch
> 
> Untested fix.  There are still issues left, e.g. I can't understand the
> "bswap &&" part in
>       if (bswap
>           && align < GET_MODE_ALIGNMENT (TYPE_MODE (load_type))
>           && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align))
>         return false;
> Don't you use the new MEM_REF even for the !bswap (aka nop) case?  So, I
> don't see how it would be safe to generate that.
> And the testsuite coverage of this is definitely suboptimal, from endianity
> POV, bitfields etc.

I suggested that change (remove bswap &&) multiple times, but it got lost appearantly.  (I even remember applying that change myself!?)
Comment 5 Thomas Preud'homme 2015-02-26 11:07:43 UTC
(In reply to Richard Biener from comment #4)
> (In reply to Jakub Jelinek from comment #1)
> > Created attachment 34879 [details]
> > gcc5-pr65215.patch
> > 
> > Untested fix.  There are still issues left, e.g. I can't understand the
> > "bswap &&" part in
> >       if (bswap
> >           && align < GET_MODE_ALIGNMENT (TYPE_MODE (load_type))
> >           && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align))
> >         return false;
> > Don't you use the new MEM_REF even for the !bswap (aka nop) case?  So, I
> > don't see how it would be safe to generate that.
> > And the testsuite coverage of this is definitely suboptimal, from endianity
> > POV, bitfields etc.
> 
> I suggested that change (remove bswap &&) multiple times, but it got lost
> appearantly.  (I even remember applying that change myself!?)


I remember the contrary as this was the reason PR61320 was investigated in the first place. This idea is that if it's unaligned the expander will take care of this and that they would be at least as efficient as what the user code was doing to avoid doing an unaligned load.

I'm happy to revisit that position though.

Best regards,

Thomas
Comment 6 Jakub Jelinek 2015-02-26 11:17:29 UTC
I can certainly remove that hunk from the patch, if the expander and other passes handle it well.  The test can stay I guess.
Comment 7 Thomas Preud'homme 2015-02-26 11:32:08 UTC
(In reply to Jakub Jelinek from comment #6)
> I can certainly remove that hunk from the patch, if the expander and other
> passes handle it well.  The test can stay I guess.

Things are at least working on x86 (obviously), ARM and SPARC (after PR61320's fix). Also this code is there since a long time without any bug report problems due to unalignment.

One question about the patch: is there a reason not to use n->range instead of GET_MODE_BITSIZE (TYPE_MODE (load_type))?

Thanks for finding and fixing this.

Best regards.
Comment 8 Jakub Jelinek 2015-02-26 11:36:41 UTC
(In reply to Thomas Preud'homme from comment #7)
> (In reply to Jakub Jelinek from comment #6)
> > I can certainly remove that hunk from the patch, if the expander and other
> > passes handle it well.  The test can stay I guess.
> 
> Things are at least working on x86 (obviously), ARM and SPARC (after
> PR61320's fix). Also this code is there since a long time without any bug
> report problems due to unalignment.

Ok.

> One question about the patch: is there a reason not to use n->range instead
> of GET_MODE_BITSIZE (TYPE_MODE (load_type))?

Supposedly that can be used too, that should probably always be the case.
Comment 9 Richard Biener 2015-02-26 11:44:04 UTC
(In reply to Thomas Preud'homme from comment #5)
> (In reply to Richard Biener from comment #4)
> > (In reply to Jakub Jelinek from comment #1)
> > > Created attachment 34879 [details]
> > > gcc5-pr65215.patch
> > > 
> > > Untested fix.  There are still issues left, e.g. I can't understand the
> > > "bswap &&" part in
> > >       if (bswap
> > >           && align < GET_MODE_ALIGNMENT (TYPE_MODE (load_type))
> > >           && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align))
> > >         return false;
> > > Don't you use the new MEM_REF even for the !bswap (aka nop) case?  So, I
> > > don't see how it would be safe to generate that.
> > > And the testsuite coverage of this is definitely suboptimal, from endianity
> > > POV, bitfields etc.
> > 
> > I suggested that change (remove bswap &&) multiple times, but it got lost
> > appearantly.  (I even remember applying that change myself!?)
> 
> 
> I remember the contrary as this was the reason PR61320 was investigated in
> the first place. This idea is that if it's unaligned the expander will take
> care of this and that they would be at least as efficient as what the user
> code was doing to avoid doing an unaligned load.

Ah indeed.  The idea is that the expander should be able to produce the
most optimal and canonical form for an unaligned load on those targets,
especially for say loading a 4-byte value from a 2-byte aligned source
where the source had 4 individual char loads.

> I'm happy to revisit that position though.

No - I just stand corrected.

> Best regards,
> 
> Thomas
Comment 10 Richard Biener 2015-02-26 11:45:45 UTC
(In reply to Richard Biener from comment #9)
> (In reply to Thomas Preud'homme from comment #5)
> > (In reply to Richard Biener from comment #4)
> > > (In reply to Jakub Jelinek from comment #1)
> > > > Created attachment 34879 [details]
> > > > gcc5-pr65215.patch
> > > > 
> > > > Untested fix.  There are still issues left, e.g. I can't understand the
> > > > "bswap &&" part in
> > > >       if (bswap
> > > >           && align < GET_MODE_ALIGNMENT (TYPE_MODE (load_type))
> > > >           && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align))
> > > >         return false;
> > > > Don't you use the new MEM_REF even for the !bswap (aka nop) case?  So, I
> > > > don't see how it would be safe to generate that.
> > > > And the testsuite coverage of this is definitely suboptimal, from endianity
> > > > POV, bitfields etc.
> > > 
> > > I suggested that change (remove bswap &&) multiple times, but it got lost
> > > appearantly.  (I even remember applying that change myself!?)

And what I changed was

Index: tree-ssa-math-opts.c
===================================================================
--- tree-ssa-math-opts.c        (revision 212067)
+++ tree-ssa-math-opts.c        (revision 212068)
@@ -2179,7 +2179,9 @@ bswap_replace (gimple cur_stmt, gimple_s
       unsigned align;
 
       align = get_object_alignment (src);
-      if (bswap && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align))
+      if (bswap
+         && align < GET_MODE_ALIGNMENT (TYPE_MODE (load_type))
+         && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align))
        return false;
 
       gsi_move_before (&gsi, &gsi_ins);

> > 
> > I remember the contrary as this was the reason PR61320 was investigated in
> > the first place. This idea is that if it's unaligned the expander will take
> > care of this and that they would be at least as efficient as what the user
> > code was doing to avoid doing an unaligned load.
> 
> Ah indeed.  The idea is that the expander should be able to produce the
> most optimal and canonical form for an unaligned load on those targets,
> especially for say loading a 4-byte value from a 2-byte aligned source
> where the source had 4 individual char loads.
> 
> > I'm happy to revisit that position though.
> 
> No - I just stand corrected.
> 
> > Best regards,
> > 
> > Thomas
Comment 11 Jakub Jelinek 2015-02-26 21:02:31 UTC
Author: jakub
Date: Thu Feb 26 21:01:59 2015
New Revision: 221033

URL: https://gcc.gnu.org/viewcvs?rev=221033&root=gcc&view=rev
Log:
	PR tree-optimization/65215
	* tree-ssa-math-opts.c (find_bswap_or_nop_load): Return false
	for PDP endian targets.
	(perform_symbolic_merge, find_bswap_or_nop_1, find_bswap_or_nop):
	Fix up formatting issues.
	(bswap_replace): Likewise.  For BYTES_BIG_ENDIAN, if the final access
	size is smaller than the original, adjust MEM_REF offset by the
	difference of sizes.  Use is_gimple_mem_ref_addr instead of
	is_gimple_min_invariant test to avoid adding address temporaries.

	* gcc.c-torture/execute/pr65215-1.c: New test.
	* gcc.c-torture/execute/pr65215-2.c: New test.
	* gcc.c-torture/execute/pr65215-3.c: New test.
	* gcc.c-torture/execute/pr65215-4.c: New test.
	* gcc.c-torture/execute/pr65215-5.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr65215-1.c
    trunk/gcc/testsuite/gcc.c-torture/execute/pr65215-2.c
    trunk/gcc/testsuite/gcc.c-torture/execute/pr65215-3.c
    trunk/gcc/testsuite/gcc.c-torture/execute/pr65215-4.c
    trunk/gcc/testsuite/gcc.c-torture/execute/pr65215-5.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-math-opts.c
Comment 12 Jakub Jelinek 2015-02-26 21:04:01 UTC
Fixed.