This is caused by early SRA splitting elem's assignment into separate per-field assignments. struct x { unsigned a : 6; unsigned b : 26; }; int f(struct x *x, unsigned a, unsigned b) { struct x elem = { .a = a, .b = b }; int i; for (i = 0; i < 512; i++) x[i] = elem; } Generated code: .LFB0: .cfi_startproc leaq 2048(%rdi), %rcx andl $63, %esi sall $6, %edx .p2align 4,,10 .p2align 3 .L2: movzbl (%rdi), %eax addq $4, %rdi andl $-64, %eax orl %esi, %eax movb %al, -4(%rdi) movl -4(%rdi), %eax andl $63, %eax orl %edx, %eax movl %eax, -4(%rdi) cmpq %rcx, %rdi jne .L2 rep ret .cfi_endproc
So at -O2 we get decent code from GCC 9+ due to store merging which "undoes" what SRA did. But at -O3 the loop gets split into two.