This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/82729] adjacent small objects can be initialized with a single store (but aren't for char a[] = "a")
- From: "peter at cordes dot ca" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 26 Oct 2017 11:29:01 +0000
- Subject: [Bug rtl-optimization/82729] adjacent small objects can be initialized with a single store (but aren't for char a[] = "a")
- Auto-submitted: auto-generated
- References: <bug-82729-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82729
--- Comment #2 from Peter Cordes <peter at cordes dot ca> ---
(In reply to Richard Biener from comment #1)
> The issue is we have no merging of stores at the RTL level and the GIMPLE
> level doesn't know whether the variables will end up allocated next to each
> other.
Are bug reports like this useful at all? It seems that a good fraction of the
missed-optimization bugs I file are things that gcc doesn't really have the
infrastructure to find. I'm hoping it's helping to improve gcc in the long
run, at least. I guess I could try to learn more about gcc internals to find
out why it misses them on my own before filing, but either way it seems
potentially useful to document efficient asm possibilities even if gcc's
current design makes it hard to take advantage.
Anyway, could GIMPLE notice that multiple small objects are being written and
hint to RTL that it would be useful to allocate them in a certain way? (And
give RTL a merged store that RTL would have to split if it decides not to?)
Or a more conservative approach could still be an improvement. Can RTL realize
that it can use 4-byte stores that overlap into not-yet-initialized or
otherwise dead memory?
For -march=haswell or generic we get
movl $97, %edx
movl $25185, %eax # avoid an LCP stall on Nehalem or earlier
movw %dx, 7(%rsp)
... lea
movl $6513249, 12(%rsp)
movw %ax, 9(%rsp)
movb $0, 11(%rsp)
This is pretty bad for code-size, and this would do the same thing with no
merging between objects, just knowing when to allow overlap into other objects.
movl $0x61, 7(%rsp) # imm32 still shorter than a mov imm32 ->
reg and 16-bit store
movl $0x6261, 9(%rsp)
movl $0x636261, 12(%rsp)
(Teaching gcc that mov $imm16 is safe on Sandybridge-family is a separate bug,
I guess. It's only other instructions with an imm16 that LCP stall, unlike on
Nehalem and earlier where mov $imm16 is a problem too. Silvermont marks
instruction lengths in the cache to avoid LCP stalls entirely, and gcc knows
that.)