Bug 45274 - __restrict__ type qualifier does not work on pointers to bitfields
Summary: __restrict__ type qualifier does not work on pointers to bitfields
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P3 enhancement
Target Milestone: ---
Assignee: Andrew Pinski
URL:
Keywords: missed-optimization
Depends on:
Blocks: bitfield restrict 113395
  Show dependency treegraph
 
Reported: 2010-08-13 07:06 UTC by Anton Blanchard
Modified: 2024-01-26 23:01 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-12-04 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Anton Blanchard 2010-08-13 07:06:14 UTC
I tested an svn build from 20100813 with the following code:

struct bar {
        unsigned int a:1, b:1, c:1, d:1, e:28;
};

void foo(struct bar * __restrict__ src, struct bar * __restrict__ dst)
{
        dst->a = src->a;
        dst->b = src->b;
        dst->c = src->c;
        dst->d = src->d;
        dst->e = src->e;
}

Built as 32bit, we see loads and stores as if the compiler is following pointer aliasing rules:

# gcc -m32 -O2 -S foo.c 

foo:
	lwz 9,0(3)
	lwz 0,0(4)
	rlwimi 0,9,0,0,0
	stw 0,0(4)
	lwz 9,0(3)
	rlwimi 0,9,0,1,1
	stw 0,0(4)
	lwz 9,0(3)
	rlwimi 0,9,0,2,2
	stw 0,0(4)
	lwz 9,0(3)
	rlwimi 0,9,0,3,3
	stw 0,0(4)
	lwz 9,0(3)
	rlwimi 0,9,0,4,31
	stw 0,0(4)
	blr

Apologies if I am misusing or misinterpreting the use of __restrict__ here.

Also, when built as 64bit things are considerably more complex. Is there a reason why we can't use the same code as 32bit?

# gcc -m64 -O2 -S foo.c
...
.L.foo:
	lwz 9,0(4)
	lwz 0,0(3)
	rlwinm 9,9,0,1,31
	rlwinm 0,0,0,0,0
	or 0,9,0
	stw 0,0(4)
	rlwinm 0,0,1,1,31
	rlwinm 0,0,31,0xffffffff
	lwz 9,0(3)
	rldicl 9,9,34,63
	slwi 9,9,30
	or 0,0,9
	stw 0,0(4)
	rlwinm 9,0,2,1,31
	rlwinm 9,9,30,0xffffffff
	lwz 0,0(3)
	rldicl 0,0,35,63
	slwi 0,0,29
	or 0,9,0
	stw 0,0(4)
	rlwinm 0,0,3,1,31
	rlwinm 0,0,29,0xffffffff
	lwz 9,0(3)
	rldicl 9,9,36,63
	slwi 9,9,28
	or 0,0,9
	stw 0,0(4)
	rlwinm 0,0,0,0,3
	lwz 9,0(3)
	rlwinm 9,9,0,4,31
	or 0,0,9
	stw 0,0(4)
	blr
Comment 1 Jakub Jelinek 2010-08-13 08:01:29 UTC
I don't think this has anything to do with restrict and all with lowering bitfield accesses only during expansion, and at RTL level the bitfield operations being too big for combiner to optimize them.
Comment 2 Andrew Pinski 2010-10-28 20:05:42 UTC
Expand has the issue:

(insn 7 6 8 t2.c:7 (set (reg:SI 197)
        (mem/s:SI (reg/v/f:DI 193 [ src ]) [0 S4 A32])) -1 (nil))

Notice the aliasing set of 0.

Confirmed.
Comment 3 Andrew Pinski 2011-04-11 20:13:25 UTC
(In reply to comment #1)
> I don't think this has anything to do with restrict and all with lowering
> bitfield accesses only during expansion, and at RTL level the bitfield
> operations being too big for combiner to optimize them.

No this is unrelated to the combiner not be able to optimize the bitfield accesses.  Rather it is related to how store and loads happen on bitfields. We don't try to keep track of individual bits for a change in the store.
Comment 4 Richard Biener 2016-06-29 14:06:12 UTC
The alias-set issue doesn't occur since quite some time (it's using alias-set 1
for me).  Also restrict is working.

GCC 6 optimizes this on x86_64 to

foo:
.LFB0:
        .cfi_startproc
        movzbl  (%rdi), %edx
        movzbl  (%rsi), %eax
        movl    %edx, %r8d
        movl    %edx, %ecx
        andl    $-4, %eax
        andl    $1, %r8d
        andl    $2, %ecx
        orl     %r8d, %eax
        orl     %ecx, %eax
        movl    %edx, %ecx
        andl    $8, %edx
        andl    $4, %ecx
        andl    $-13, %eax
        orl     %ecx, %eax
        orl     %edx, %eax
        movb    %al, (%rsi)
        movl    (%rdi), %eax
        andl    $-16, %eax
        movl    %eax, %edx
        movl    (%rsi), %eax
        andl    $15, %eax
        orl     %edx, %eax
        movl    %eax, (%rsi)
        ret

thus it is doing a good job in piecewise copying of the struct.  It doesn't
detect that it can simply use a 32bit load/store.  But there are duplicates
in bugzilla for that issue.

Interestingly with some bitfield lowering work plus some match.pd hackery
I get that:

foo:
.LFB0:
        .cfi_startproc
        movl    (%rdi), %eax
        movl    %eax, (%rsi)
        ret

whee.
Comment 5 Andrew Pinski 2020-01-14 08:06:34 UTC
Mine.  With my bit-field lowering and a patch to reassociation to some handle BIT_INSERT optimizations, we are able to optimize this to just load/store.