I just tried a simple unoptimized compile. -ftime-report said that
final took 5% of the time (obviously final does more than formatting),
and the assembler took 4% of the total user time, and system time took
16% of wall clock time. Cutting those numbers in half makes 1% seem
not implausible to me, maybe even low.
I'm considering an unoptimized compile because that is where the
assembler makes the most difference--the compiler is faster and the
assembler output probably tends to be longer, and also an unoptimized
compile is when people care most about speed. For an optimizing
compile, the assembler is obviously going to be less of a factor.