39329 – x86 -Os could use mulw for (uint16 * uint16)>>16

Bug 39329 - x86 -Os could use mulw for (uint16 * uint16)>>16

Summary: x86 -Os could use mulw for (uint16 * uint16)>>16

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.4.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2009-03-01 02:42 UTC by astrange+gcc@gmail.com
Modified:	2021-07-21 05:01 UTC (History)
CC List:	1 user (show)

See Also:
Host:	i?86--
Target:	i?86--
Build:	i?86--
Known to work:
Known to fail:
Last reconfirmed:	2009-03-01 11:17:38

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description astrange+gcc@gmail.com 2009-03-01 02:42:30 UTC

Using 'gcc -Os -fomit-frame-pointer -march=core2 -mtune=core2' for

unsigned short mul_high_c(unsigned short a, unsigned short b)
{
    return (unsigned)(a * b) >> 16;
}

unsigned short mul_high_asm(unsigned short a, unsigned short b)
{
    unsigned short res;
    asm("mulw %w2" : "=d"(res),"+a"(a) : "rm"(b));
    return res;
}

I get

_mul_high_c:
	subl	$12, %esp
	movzwl	20(%esp), %eax
	movzwl	16(%esp), %edx
	addl	$12, %esp
	imull	%edx, %eax
	shrl	$16, %eax
	ret
_mul_high_asm:
	subl	$12, %esp
	movl	16(%esp), %eax
	mulw 20(%esp)
	addl	$12, %esp
	movl	%edx, %eax
	ret

mulw puts its outputs in dx:ax, and dx contains (dx:ax)>>16, so the shift is avoided.

Ignoring the weird Darwin stack adjustment code, the version with mulw is somewhat shorter and avoids a movzwl. I'm not sure what the performance difference is; mulw is listed in Agner's tables as fairly low latency, but requires a length changing prefix for memory.

This type of operation is useful in fixed-point math, such as embedded audio codecs or arithmetic coders.

Comment 1 Richard Biener 2009-03-01 11:17:38 UTC

Confirmed.  It's probably difficult to expose this to combine, so a
peephole may be the only choice to catch it.