This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
Re: [Patch, libfortran] Improve performance of byte swapped IO
PING**2
On Mon, Jan 14, 2013 at 12:44 AM, Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
> PING**1.2
>
> Yet another slightly updated patch attached. Compared to the previous
> version, now with specializations for size 12 and 16 as well. For the
> real(10) benchmark, with the previous v3 patch (please disregard the
> absolute values in the post quoted below, there were wrong due to a
> bug):
>
> Unformatted sequential write/read performance test
> Record size Write MB/s Read MB/s
> ==========================================================
> 4 80.578833140738340 127.33074266188656
> 8 137.61682156650559 184.49033790407984
> 16 202.72871312800621 275.98801561061816
> 32 275.33538767460863 413.43956672052303
> 64 341.04488670485119 555.13744525826564
> 128 384.77917051919820 671.44655208024699
> 256 410.97208129045833 763.97660513918527
> 512 425.76619227779878 826.41086693364593
> 1024 430.77035999730009 840.30757120448550
> 2048 438.30318459339475 885.50033810296600
> 4096 455.79422809097599 919.78265920652086
> 8192 465.74499205886326 959.06963983370918
> 16384 472.48133493971142 991.11244162081744
> 32768 471.00024619567603 1015.7428144049615
> 65536 474.91235280949985 1021.2150519080892
> 131072 475.18664487440901 1006.3701982554830
> 262144 478.00435092846868 985.17141300594039
> 524288 476.72837201590363 991.74226579987987
>
> With the new v4 patch:
>
> Unformatted sequential write/read performance test
> Record size Write MB/s Read MB/s
> ==========================================================
> 4 87.353141847504133 145.09410391177835
> 8 166.95093628370549 223.60877830048437
> 16 272.20937208187746 364.91673986840277
> 32 415.26016354252715 599.41744252952310
> 64 592.97676703528009 900.53345964312450
> 128 748.27218547147686 1189.7131837787238
> 256 874.83098506714384 1561.3649529261234
> 512 935.69494481144284 1823.1760143164879
> 1024 983.51689491813215 1931.8773088107300
> 2048 1009.5491761651396 1971.6978586130062
> 4096 1115.5862027658552 2119.4151169997808
> 8192 1172.9400229568287 2184.1403983641089
> 16384 1222.6659284153168 2258.5490449229878
> 32768 1242.2417626697293 2251.8159046253918
> 65536 1227.9967555594396 2313.4106672387143
> 131072 1204.4295656544052 2129.1309150039478
> 262144 1135.7905614378458 2154.7146453789856
> 524288 1075.5769074402640 2170.5151501933169
>
>
> On Fri, Jan 11, 2013 at 10:41 PM, Janne Blomqvist
> <blomqvist.janne@gmail.com> wrote:
>> PING.
>>
>> Slightly updated patch attached, which further improves the generic
>> size fallback that is used when the element size is not 2/4/8 bytes.
>> Changing the us_perf benchmark to use real(10), with the v2 patch the
>> performance is:
>>
>> Unformatted sequential write/read performance test
>> Record size Write MB/s Read MB/s
>> ==========================================================
>> 4 59.028550429522085 86.019754350948787
>> 8 79.028327063130590 95.803502000733374
>> 16 99.980457395413296 138.68367462874946
>> 32 122.56886206338788 180.05609910155042
>> 64 152.00478266944486 212.69931319407567
>> 128 197.74137934940202 235.19728791956828
>> 256 155.36245780017779 244.60578379215929
>> 512 157.13385845966246 245.07467397691480
>> 1024 177.26553799130201 260.44908357795623
>> 2048 208.22852888945587 260.21587143113527
>> 4096 222.88410474980634 262.66162209490591
>> 8192 226.71167580652920 265.81191407123663
>> 16384 206.51818241747065 263.59395165591724
>> 32768 230.18707026455866 265.88990325026526
>> 65536 229.19783089391504 268.04485112932684
>> 131072 231.12215662044449 267.40543904427710
>> 262144 230.72012123598142 267.60086931504122
>> 524288 230.48959460456055 268.78750211303725
>>
>> With the new v3 patch I get
>>
>> Unformatted sequential write/read performance test
>> Record size Write MB/s Read MB/s
>> ==========================================================
>> 4 59.779061121239941 92.777125264010024
>> 8 92.727504266051341 126.64775563782673
>> 16 128.94793911163904 184.69194300482837
>> 32 169.78916283536847 267.06752001266767
>> 64 209.50296476919556 341.60515130910238
>> 128 236.36709738360679 416.73212655882151
>> 256 251.79029695383340 465.46804746749740
>> 512 259.62269939828633 500.87346060356265
>> 1024 265.08842337586458 508.95530627428275
>> 2048 268.71795530051884 532.12211365683640
>> 4096 280.86546884821030 546.88907054369884
>> 8192 286.96049684823578 569.60958187426183
>> 16384 292.04368984868103 608.11503416324865
>> 32768 292.96677387959392 629.80651297065833
>> 65536 291.69098580137114 624.27103478079641
>> 131072 292.75666234956418 605.99766136491496
>> 262144 291.35520038228975 611.59061455535834
>> 524288 292.15446100501691 623.76232623081580
>>
>>
>> On Sat, Jan 5, 2013 at 11:13 PM, Janne Blomqvist
>> <blomqvist.janne@gmail.com> wrote:
>>> On Sat, Jan 5, 2013 at 5:35 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Fri, Jan 4, 2013 at 11:35 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>>>>> Janne Blomqvist <blomqvist.janne@gmail.com> writes:
>>>>>
>>>>>> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
>>>>>> index c8ecc3a..bf2250a 100644
>>>>>> --- a/libgfortran/io/file_pos.c
>>>>>> +++ b/libgfortran/io/file_pos.c
>>>>>> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, gfc_unit *u)
>>>>>> }
>>>>>> else
>>>>>> {
>>>>>> + uint32_t u32;
>>>>>> + uint64_t u64;
>>>>>> switch (length)
>>>>>> {
>>>>>> case sizeof(GFC_INTEGER_4):
>>>>>> - reverse_memcpy (&m4, p, sizeof (m4));
>>>>>> + memcpy (&u32, p, sizeof (u32));
>>>>>> + u32 = __builtin_bswap32 (u32);
>>>>>> + m4 = *(GFC_INTEGER_4*)&u32;
>>>>>
>>>>> Isn't that an aliasing violation?
>>>>
>>>> It looks like one. Why not simply do
>>>>
>>>> m4 = (GFC_INTEGER_4) u32;
>>>>
>>>> ? I suppose GFC_INTEGER_4 is always the same size as uint32_t but signed?
>>>
>>> Yes, GFC_INTEGER_4 is a typedef for int32_t. As for why I didn't do
>>> the above, C99 6.3.1.3(3) says that if the unsigned value is outside
>>> the range of the signed variable, the result is
>>> implementation-defined. Though I suppose the sensible
>>> "implementation-defined behavior" in this case on a two's complement
>>> target is to just do a bitwise copy.
>>>
>>> Anyway, to be really safe one could use memcpy instead; the compiler
>>> optimizes small fixed size memcpy's just fine. Updated patch attached.
>>>
>>>
>>> --
>>> Janne Blomqvist
>>
>>
>>
>> --
>> Janne Blomqvist
>
>
>
> --
> Janne Blomqvist
--
Janne Blomqvist