User account creation filtered due to spam.

Bug 65162 - [5/6/7/8 Regression][SH] Redundant tests when storing bswapped T bit result in unaligned mem
Summary: [5/6/7/8 Regression][SH] Redundant tests when storing bswapped T bit result i...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 5.0
: P4 normal
Target Milestone: 5.5
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-22 11:33 UTC by Oleg Endo
Modified: 2016-06-03 10:03 UTC (History)
0 users

See Also:
Host:
Target: sh*-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-02-22 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Oleg Endo 2015-02-22 11:33:29 UTC
The following example is taken from libav, which contains quite some uses of this code pattern.

typedef unsigned short int uint16_t;
union unaligned_16 { uint16_t l; } __attribute__((packed));

int
test (unsigned char* buf, int bits_per_component)
{
  (((union unaligned_16 *)(buf))->l) =
    __builtin_bswap16 (bits_per_component == 10 ? 1 : 0);

  return 0;
}

compiled with 4.8 / 4.9 and -m4 -O2:

        mov     r5,r0
        cmp/eq  #10,r0
        movt    r0
        swap.b  r0,r0
        extu.w  r0,r0
        extu.b  r0,r1
        shlr8   r0
        mov.b   r1,@r4
        mov.b   r0,@(1,r4)
	rts
	mov     #0,r0

compiled with 5.0 r220892:

        mov     r5,r0
        cmp/eq  #10,r0
        movt    r1
        swap.b  r1,r1      // r1 = T << 8
        extu.w  r1,r1
        tst     r1,r1      // T = r1 == 0 = 1-T
        mov     #-1,r0
        negc    r0,r0      // r0 = 1-T = r1 != 0 = cmp/eq #10,r0
        extu.b  r1,r2
        mov.b   r0,@(1,r4) // @(1,r4) = cmp/eq #10,r0
        mov.b   r2,@r4     // @r4 = 0
        rts
        mov     #0,r0

This is caused by the introduction of the treg_set_expr stuff.
For some reason, combine tries to recalculate the 'other' stored byte and tries this pattern:

Successfully matched this instruction:
(set (reg:SI 179)
    (ne:SI (reg:SI 164 [ D.1476 ])
        (const_int 0 [0])))

The resulting insn sequence looks roughly like this:

(set (reg:SI 169) (eq:SI (reg:SI 5 r5) (const_int 10)))
(set (reg:HI 170) (rotate:HI (subreg:HI (reg:SI 169) 0) (const_int 8)))
(set (reg:SI 164) (zero_extend:SI (reg:HI 170)))
(set (reg:SI 171) (zero_extend:SI (subreg:QI (reg:SI 164) 0)))
(set (mem:QI (reg/v/f:SI 166) (subreg:QI (reg:SI 171) 0)))
(set (reg:SI 179) (ne:SI (reg:SI 164) (const_int 0)))
(set (mem:QI (plus:SI (reg/v/f:SI 166) (const_int 1)))
     (subreg:QI (reg:SI 179) 0))


If the ne:SI pattern is not matched, the redundant comparison/test is not emitted.

The sh_treg_combine.cc RTL was actually meant to handle such cases of repeated T bit inversions/tests.  However, at the moment it is triggered by conditional insns only.  Moreover, it currently will not look through zero_extend and the rotate insns.

On the other hand, this seems to happen only for unaligned stores.  Examples such as 

typedef unsigned short int uint16_t;
int
test_00 (unsigned short* buf, int bits_per_component)
{
  buf[0] =
    __builtin_bswap16 (bits_per_component == 10 ? 1 : 0);

  return 0;
}

int
test_01 (unsigned char* buf, int bits_per_component)
{
  buf[0] =
    __builtin_bswap16 (bits_per_component == 10 ? 1 : 0);

  return 0;
}

int
test_02 (unsigned char* buf, int bits_per_component)
{
  buf[0] =
    __builtin_bswap16 (bits_per_component == 10 ? 1 : 0) >> 8;

  return 0;
}

int
test_03 (unsigned char* buf, int bits_per_component)
{
  buf[1] = __builtin_bswap16 (bits_per_component == 10 ? 1 : 0) >> 0;
  buf[0] = __builtin_bswap16 (bits_per_component == 10 ? 1 : 0) >> 8;

  return 0;
}

.. do not suffer from the problem.

Probably this problem will not be triggered after unaligned loads/stores have been improved (PR 64306).
Comment 1 Oleg Endo 2015-02-22 12:56:31 UTC
(In reply to Oleg Endo from comment #0)
> The following example is taken from libav, which contains quite some uses of
> this code pattern.
> 
> typedef unsigned short int uint16_t;
> union unaligned_16 { uint16_t l; } __attribute__((packed));
> 
> int
> test (unsigned char* buf, int bits_per_component)
> {
>   (((union unaligned_16 *)(buf))->l) =
>     __builtin_bswap16 (bits_per_component == 10 ? 1 : 0);
> 
>   return 0;
> }
> 

BTW, it should actually translate to something like:

	mov	r6,r0
	cmp/eq	#10,r0
	movt	r0
	mov.b	r0,@(1,r4)
	mov	#0,r0
	rts
	mov.b	r0,@r4

or
	mov	r6,r0
	cmp/eq	#10,r0
	movt	r0
	mov.b	r0,@(1,r4)
	shlr8   r0
	rts
	mov.b	r0,@r4
Comment 2 Jakub Jelinek 2015-04-22 11:57:48 UTC
GCC 5.1 has been released.
Comment 3 Richard Biener 2015-07-16 09:10:38 UTC
GCC 5.2 is being released, adjusting target milestone to 5.3.
Comment 4 Richard Biener 2015-12-04 10:43:13 UTC
GCC 5.3 is being released, adjusting target milestone.
Comment 5 Richard Biener 2016-06-03 10:03:28 UTC
GCC 5.4 is being released, adjusting target milestone.