Bug 62025 - [4.9/5 Regression] Miscompilation of openssl sha512.c
Summary: [4.9/5 Regression] Miscompilation of openssl sha512.c
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.9.1
: P3 normal
Target Milestone: 4.9.2
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2014-08-05 14:46 UTC by Jakub Jelinek
Modified: 2014-10-14 14:01 UTC (History)
6 users (show)

See Also:
Host:
Target: s390-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-08-12 00:00:00


Attachments
sha.c (2.93 KB, text/plain)
2014-08-05 14:46 UTC, Jakub Jelinek
Details
pr62025.c (1.41 KB, text/plain)
2014-08-13 16:25 UTC, Jakub Jelinek
Details
pr62025-2.c (76 bytes, text/plain)
2014-08-13 16:27 UTC, Jakub Jelinek
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2014-08-05 14:46:39 UTC
Created attachment 33248 [details]
sha.c

The following testcase is miscompiled on s390-linux, starting with r207605 (including latest trunk), with e.g. -O2 -m31 -march=z196 -mtune=z10 options.
Comment 1 Jakub Jelinek 2014-08-05 14:58:15 UTC
In *.sched1 I see no significant insn changes in between r207604 (works) and r207605 (aborts), just r14 is mentioned live on entry of various bbs.
The testcase fails at r207605 also with -O2 -m31 -march=z196 -mtune=z10 -mno-lra.
Richard or Andreas, can you please have a look?
Comment 2 Jakub Jelinek 2014-08-05 15:02:43 UTC
-fno-shrink-wrap doesn't help.
Comment 3 Jakub Jelinek 2014-08-11 14:02:36 UTC
Note, when I replace the if+abort with a printout of all the 8 c.h[] values and take a poll of the results from:
./cc1.208700 -m31 -O0 sha.c -o sha0.s -quiet -nostdinc
./cc1.207604 -nostdinc -quiet -O2 -m31 -march=z196 -mtune=z10 sha.c -o sha1.s
./cc1.207605 -nostdinc -quiet -O2 -m31 -march=z196 -mtune=z10 sha.c -o sha2.s
./cc1.207604 -nostdinc -quiet -O2 -mno-lra -m31 -march=z196 -mtune=z10 sha.c -o sha3.s
./cc1.207605 -nostdinc -quiet -O2 -m31 -march=z196 -mno-lra -mtune=z10 sha.c -o sha4.s
/usr/src/gcc/objz/gcc/cc1 -nostdinc -quiet -O2 -m31 -march=z196 -mtune=z10 sha.c -o sha5.s
/usr/src/gcc/objz/gcc/cc1 -nostdinc -quiet -O2 -m31 -march=z196 -mtune=z10 sha.c -o sha6.s -mno-lra
where /usr/src/gcc/objz/gcc/cc1 is trunk from around Aug 5th, I get different results from the testcase, and also if I remove the last line or two lines from the loop in the function.  But which function have different result from the -O0 results depends on how many lines are removed.
With no lines removed (G for results matching -O0, B for different result):
GGBGBBB
With the last line removed the results are GBGBBGB.
With the last two lines removed GBGBBGG, when 3 or more lines from the loop are removed, all results are the same (i.e. all Gs).
As I believe the testcase doesn't have undefined behavior with any of the loop lines removed, this shows that perhaps the shrink wrapping case just changed things big enough that some RA or later issue either started appearing, or went away.
Comment 4 Jakub Jelinek 2014-08-11 16:47:06 UTC
I've applied on top of r207604 just the
https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/s390/s390.h?r1=207605&r2=207604&pathrev=207605
change and the testcase was miscompiled too, so the bug is just triggered by different RA decisions from having reg 14 live on the epilogue.
The miscompilation goes away with -fno-schedule-insns2, but also with
-fno-schedule-insns -fschedule-insns2, but as I said earlier, appart from different live in/out lr in/out (including r14 there) *.sched1 dump doesn't contain any real insn changes.
Note only -mtune=z10 is miscompiled, other tuning options are ok.
Comment 5 Jeffrey A. Law 2014-08-12 04:07:02 UTC
While r207605 may be the trigger for the failure for this test.  I can twiddle the test so that it fails with 207604 and 207605.  I'm starting to wonder if 207605 just tripped something latent.

I'm still looking, but we may have been totally barking up the wrong tree focusing on 207605.
Comment 6 Jeffrey A. Law 2014-08-12 06:43:37 UTC
It's late and I need to catch some zzzs. But as I hinted in my prior update, I think we may chasing something latent.   I would recommend looking very closely at r204497, which my bisecting implicated as the failing commit for a severely hacked up test.

Reverting r204497 by hand on the trunk results in the original testcase working properly.  I'm too tired to really analyze, but I think it's worth a looksie.
Comment 7 bin cheng 2014-08-12 06:52:41 UTC
Oh, it's my commit, I will have a look if you are ok with that.
Comment 8 Jeffrey A. Law 2014-08-12 06:53:28 UTC
More eyes never hurt :-)  This pair is going to bed.
Comment 9 Jakub Jelinek 2014-08-12 08:17:38 UTC
(In reply to Jeffrey A. Law from comment #6)
> It's late and I need to catch some zzzs. But as I hinted in my prior update,
> I think we may chasing something latent.   I would recommend looking very
> closely at r204497, which my bisecting implicated as the failing commit for
> a severely hacked up test.
> 
> Reverting r204497 by hand on the trunk results in the original testcase
> working properly.  I'm too tired to really analyze, but I think it's worth a
> looksie.

I never said r207605 was the buggy change, after all, as only the EPILOGUE_USES
definition itself makes the code misbehave, it is clear it only triggers a latent issue.  For any case we need to find out what is actually miscompiled in the testcase and from that why.
I've tried this morning following hack on top of r207605:
--- sched-deps.c.xx	2014-08-11 12:05:39.000000000 +0200
+++ sched-deps.c	2014-08-12 09:51:48.540485357 +0200
@@ -3034,6 +3034,14 @@ sched_analyze_insn (struct deps_desc *de
       || (NONJUMP_INSN_P (insn) && control_flow_insn_p (insn)))
     reg_pending_barrier = MOVE_BARRIER;
 
+  if (!DEBUG_INSN_P (insn) && reload_completed && getenv ("SKIPB") && getenv ("SKIPE"))
+    {
+      static int cntskip;
+      ++cntskip;
+      if (cntskip > atoi (getenv ("SKIPB")) && cntskip < atoi (getenv ("SKIPE")))
+	reg_pending_barrier = TRUE_BARRIER;
+    }
+
   if (sched_pressure != SCHED_PRESSURE_NONE)
     {
       setup_insn_reg_uses (deps, insn);

and it seems that with:
./cc1 -m31 -O2 -march=z196 -mtune=z10 sha.c -o sha1.s -quiet -nostdinc; SKIPB=6261 SKIPE=6263 ./cc1 -m31 -O2 -march=z196 -mtune=z10 sha.c -o sha2.s -quiet -nostdinc
sha1.s is expectedly miscompiled, while sha2.s gives the correct result and differs just in a single true barrier during sched2, i.e. no changes in split2 dump, and just small amount of changes in the sched2 dump.
Comment 10 Jakub Jelinek 2014-08-12 08:54:58 UTC
And with:

--- sched-deps.c	(revision 207605)
+++ sched-deps.c	(working copy)
@@ -3034,6 +3034,21 @@ sched_analyze_insn (struct deps_desc *de
       || (NONJUMP_INSN_P (insn) && control_flow_insn_p (insn)))
     reg_pending_barrier = MOVE_BARRIER;
 
+  if (!DEBUG_INSN_P (insn) && reload_completed && getenv ("SKIPB") && getenv ("SKIPE"))
+    {
+      static int cntskip;
+      ++cntskip;
+      if (cntskip > atoi (getenv ("SKIPB")) && cntskip < atoi (getenv ("SKIPE")))
+	reg_pending_barrier = TRUE_BARRIER;
+    }
+  if (!DEBUG_INSN_P (insn) && reload_completed && getenv ("FLUSHB") && getenv ("FLUSHE"))
+    {
+      static int cntskip;
+      ++cntskip;
+      if (cntskip > atoi (getenv ("FLUSHB")) && cntskip < atoi (getenv ("FLUSHE")))
+	flush_pending_lists (deps, insn, true, true);
+    }
+
   if (sched_pressure != SCHED_PRESSURE_NONE)
     {
       setup_insn_reg_uses (deps, insn);

again, on top of r207605, and -O2 -m31 -march=z196 -mtune=z10, I get no miscompilation with
FLUSHB=6261 FLUSHE=6263 ./cc1 -m31 -O2 -march=z196 -mtune=z10 sha.c -o sha3.s -quiet -nostdinc
diff -up -U8 sha1.s sha3.s
--- sha1.s	2014-08-12 10:40:14.813740998 +0200
+++ sha3.s	2014-08-12 10:46:58.425686712 +0200
@@ -6246,38 +6246,38 @@ sha512_block_data_order:
 	lm	%r2,%r3,312(%r15)
 	x	%r11,152(%r15)
 	or	%r5,%r8
 	l	%r8,276(%r15)
 	n	%r9,216(%r15)
 	srl	%r4,7
 	n	%r11,184(%r15)
 	al	%r3,252(%r8)
-	ahi	%r8,128
-	alc	%r2,120(%r8)
+	alc	%r2,248(%r8)
+	x	%r0,1104(%r15)
 	xr	%r11,%r9
 	st	%r11,1072(%r15)
-	xr	%r1,%r5
 	n	%r7,188(%r15)
-	x	%r0,1104(%r15)
-	st	%r8,276(%r15)
+	xr	%r1,%r5
 	stm	%r2,%r3,120(%r15)
 	st	%r0,1084(%r15)
 	l	%r3,116(%r15)
 	sll	%r3,25
 	lr	%r6,%r3
 	l	%r3,224(%r15)
 	xr	%r3,%r10
 	st	%r3,1068(%r15)
 	l	%r3,156(%r15)
 	l	%r2,232(%r15)
 	xr	%r2,%r12
 	st	%r2,1088(%r15)
-	lm	%r10,%r11,120(%r15)
 	n	%r3,220(%r15)
+	ahi	%r8,128
+	lm	%r10,%r11,120(%r15)
+	st	%r8,276(%r15)
 	al	%r11,164(%r15)
 	alc	%r10,160(%r15)
 	xr	%r7,%r3
 	st	%r7,1076(%r15)
 	lr	%r3,%r11
 	or	%r6,%r4
 	lm	%r4,%r5,304(%r15)
 	al	%r3,1068(%r15)
Comment 11 Jakub Jelinek 2014-08-12 09:56:55 UTC
And the bug in that is that in sha1.s (the miscompiled one) sched2 wrongly scheduled ahi %r8,128 which clobbers %cc in between a %cc producer and %cc consumer.  If I manually fix that up as follows:

--- sha1.s	2014-08-12 08:39:50.694224146 -0400
+++ sha4.s	2014-08-12 09:38:58.914224146 -0400
@@ -6251,8 +6251,8 @@ sha512_block_data_order:
 	srl	%r4,7
 	n	%r11,184(%r15)
 	al	%r3,252(%r8)
+	alc	%r2,248(%r8)
 	ahi	%r8,128
-	alc	%r2,120(%r8)
 	xr	%r11,%r9
 	st	%r11,1072(%r15)
 	xr	%r1,%r5

then the testcase succeeds.

In *.split2 we have:
(insn 11376 7193 11377 5 (parallel [
            (set (reg:CCL1 33 %cc)
                (compare:CCL1 (plus:SI (reg:SI 3 %r3 [orig:2900 D.1817+4 ] [2900])
                        (mem:SI (plus:SI (reg:SI 8 %r8 [orig:65 ivtmp.30 ] [65])
                                (const_int 252 [0xfc])) [2 MEM[base: _25, offset: 248B]+4 S4 A32]))
                    (reg:SI 3 %r3 [orig:2900 D.1817+4 ] [2900])))
            (set (reg:SI 3 %r3 [orig:2900 D.1817+4 ] [2900])
                (plus:SI (reg:SI 3 %r3 [orig:2900 D.1817+4 ] [2900])
                    (mem:SI (plus:SI (reg:SI 8 %r8 [orig:65 ivtmp.30 ] [65])
                            (const_int 252 [0xfc])) [2 MEM[base: _25, offset: 248B]+4 S4 A32])))
        ]) 329 {*addsi3_carry1_cc}
     (nil))
(insn 11377 11376 7194 5 (parallel [
            (set (reg:SI 2 %r2 [ D.1817 ])
                (plus:SI (plus:SI (ltu:SI (reg:CCL1 33 %cc)
                            (const_int 0 [0]))
                        (reg:SI 2 %r2 [ D.1817 ]))
                    (mem:SI (plus:SI (reg:SI 8 %r8 [orig:65 ivtmp.30 ] [65])
                            (const_int 248 [0xf8])) [2 MEM[base: _25, offset: 248B]+0 S4 A64])))
            (clobber (reg:CC 33 %cc))
        ]) 407 {*addsi3_alc}
     (expr_list:REG_DEAD (reg:CCL1 33 %cc)
        (expr_list:REG_UNUSED (reg:CC 33 %cc)
            (nil))))
...
(insn 2799 7206 7208 5 (parallel [
            (set (reg:SI 8 %r8 [orig:65 ivtmp.30 ] [65])
                (plus:SI (reg:SI 8 %r8 [orig:65 ivtmp.30 ] [65])
                    (const_int 128 [0x80])))
            (clobber (reg:CC 33 %cc))
        ]) 327 {*addsi3}
     (expr_list:REG_UNUSED (reg:CC 33 %cc)
        (nil)))

In *.sched2 dump this is (+ is from normal r207605 compilation, - from FLUSHB=6261 FLUSHE=6263 (i.e. - is working one, + is miscompiled one):

+(insn:TI 11376 2759 2799 5 (parallel [
             (set (reg:CCL1 33 %cc)
                 (compare:CCL1 (plus:SI (reg:SI 3 %r3 [orig:2900 D.1817+4 ] [2900])
                         (mem:SI (plus:SI (reg:SI 8 %r8 [orig:65 ivtmp.30 ] [65])
                                 (const_int 252 [0xfc])) [2 MEM[base: _25, offset: 248B]+4 S4 A32]))
                     (reg:SI 3 %r3 [orig:2900 D.1817+4 ] [2900])))
             (set (reg:SI 3 %r3 [orig:2900 D.1817+4 ] [2900])
                 (plus:SI (reg:SI 3 %r3 [orig:2900 D.1817+4 ] [2900])
                     (mem:SI (plus:SI (reg:SI 8 %r8 [orig:65 ivtmp.30 ] [65])
                             (const_int 252 [0xfc])) [2 MEM[base: _25, offset: 248B]+4 S4 A32])))
         ]) 329 {*addsi3_carry1_cc}
      (nil))
-(insn:TI 11377 11376 2773 5 (parallel [
+(insn:TI 2799 11376 11377 5 (parallel [
+            (set (reg:SI 8 %r8 [orig:65 ivtmp.30 ] [65])
+                (plus:SI (reg:SI 8 %r8 [orig:65 ivtmp.30 ] [65])
+                    (const_int 128 [0x80])))
+            (clobber (reg:CC 33 %cc))
+        ]) 327 {*addsi3}
+     (expr_list:REG_UNUSED (reg:CC 33 %cc)
+        (nil)))
+(insn:TI 11377 2799 2763 5 (parallel [
             (set (reg:SI 2 %r2 [ D.1817 ])
                 (plus:SI (plus:SI (ltu:SI (reg:CCL1 33 %cc)
                             (const_int 0 [0]))
                         (reg:SI 2 %r2 [ D.1817 ]))
                     (mem:SI (plus:SI (reg:SI 8 %r8 [orig:65 ivtmp.30 ] [65])
-                            (const_int 248 [0xf8])) [2 MEM[base: _25, offset: 248B]+0 S4 A64])))
+                            (const_int 120 [0x78])) [2 MEM[base: _25, offset: 248B]+0 S4 A64])))
             (clobber (reg:CC 33 %cc))
         ]) 407 {*addsi3_alc}
      (expr_list:REG_DEAD (reg:CCL1 33 %cc)
         (expr_list:REG_UNUSED (reg:CC 33 %cc)
             (nil))))
Comment 12 bin cheng 2014-08-12 10:38:05 UTC
Hi,

I can reproduce the exact mis-scheduled instruction issue as in Jakub's comment with/without the ivopt patch (204497).  Turns out gcc chooses candidate like {&K512, 128}_loop with the patch while candidate {&K512[16], 128}_loop without the patch.  This is caused by candidate cost change.  As far as I can see, both candidate are good for the iv use and shouldn't generate wrong code either.  
The interesting thing is, mis-scheduled instruction is the only code difference (apart from some offset difference as base of candidates are different) caused by ivopt patch.  Normally middle-end patch should have larger impact on generated code if there is.  I will have a look why different iv candidate can result in one instruction mis-scheduled.  Maybe it's useful for investigation.

Anyway, I think 204497 is generally fine in this case.

Thanks,
bin
Comment 13 Jakub Jelinek 2014-08-12 11:06:22 UTC
Just verified the trunk and it is the same thing there, also sched2 generating:
        al      %r3,252(%r8)
        ahi     %r8,128
        alc     %r2,120(%r8)
where the first insn is cc setter, second cc clobber and third cc user and clobber, so the r8=r8+128 insn should not have been scheduled in between those two.  It seems scheduler has some code to adjust the address (248 -> 120) based on the r8 adjustment, otherwise it would be wrong to move it even because of r8 register, perhaps if we take that path we don't fully check other side effects (cc clobber in this case).  s390 AFAIK doesn't have addition which doesn't clobber flags, unlike many other architectures.
Comment 14 Jakub Jelinek 2014-08-12 11:47:29 UTC
Seems that is the http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00155.html minipass, supposedly when it sees that it could break a reg dependency (on r8 in this case), it just ignores the fact that it can have other side effects.
parse_add_or_inc uses single_set which will return the increment SET even if there are extra clobbers.  Giving up would make the minipass useless on many architectures, so I think we just need to make sure that we ignore the reg dependency only on the address register (r8 here), but not for the clobbered regs.
Comment 15 Jakub Jelinek 2014-08-12 12:16:42 UTC
So, what about this completely untested fix?  Unfortunately the scheduler change has been committed with no testcases.  I guess I'll do a bootstrap/regtest with some printout where this patch changes things and where we consider it at all.

--- gcc/sched-deps.c.jj	2014-08-06 10:34:13.000000000 +0200
+++ gcc/sched-deps.c	2014-08-12 14:12:06.625193731 +0200
@@ -4751,6 +4751,24 @@ find_inc (struct mem_inc_info *mii, bool
 			   "inc conflicts with store failure.\n");
 		goto next;
 	      }
+
+	  /* The inc instruction could have clobbers, make sure those
+	     registers are not used in mem insn.  */
+	  FOR_EACH_INSN_DEF (def, mii->inc_insn)
+	    if (!reg_overlap_mentioned_p (DF_REF_REG (def), mii->mem_reg0))
+	      {
+		df_ref use;
+		FOR_EACH_INSN_USE (use, mii->mem_insn)
+		  if (reg_overlap_mentioned_p (DF_REF_REG (def),
+					       DF_REF_REG (use)))
+		    {
+		      if (sched_verbose >= 5)
+			fprintf (sched_dump,
+				 "inc clobber used in store failure.\n");
+		      goto next;
+		    }
+	      }
+
 	  newaddr = mii->inc_input;
 	  if (mii->mem_index != NULL_RTX)
 	    newaddr = gen_rtx_PLUS (GET_MODE (newaddr), newaddr,
Comment 16 Jeffrey A. Law 2014-08-12 14:29:58 UTC
In reference to c#12.   I think the ivopts changes are just setting up the situation that is mis-handled later.  I'd gotten as far as seeing the +128 increment moving in the scheduler, but hadn't looked to see if it was valid.

Anyway, so yes I think the ivopts stuff is fine.

I should have realized I was chasing something of that nature when the bisection settled on the ivopts code as the trigger.
Comment 17 Jakub Jelinek 2014-08-12 21:25:12 UTC
Author: jakub
Date: Tue Aug 12 21:24:40 2014
New Revision: 213887

URL: https://gcc.gnu.org/viewcvs?rev=213887&root=gcc&view=rev
Log:
	PR target/62025
	* sched-deps.c (find_inc): Check if inc_insn doesn't clobber
	any registers that are used in mem_insn.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/sched-deps.c
Comment 18 Jakub Jelinek 2014-08-12 22:04:10 UTC
Author: jakub
Date: Tue Aug 12 22:03:37 2014
New Revision: 213888

URL: https://gcc.gnu.org/viewcvs?rev=213888&root=gcc&view=rev
Log:
	PR target/62025
	* sched-deps.c (find_inc): Check if inc_insn doesn't clobber
	any registers that are used in mem_insn.

Modified:
    branches/gcc-4_9-branch/gcc/ChangeLog
    branches/gcc-4_9-branch/gcc/sched-deps.c
Comment 19 bin.cheng 2014-08-13 04:21:56 UTC
(In reply to Jeffrey A. Law from comment #16)
> In reference to c#12.   I think the ivopts changes are just setting up the
> situation that is mis-handled later.  I'd gotten as far as seeing the +128
> increment moving in the scheduler, but hadn't looked to see if it was valid.
> 
> Anyway, so yes I think the ivopts stuff is fine.
> 
> I should have realized I was chasing something of that nature when the
> bisection settled on the ivopts code as the trigger.

Yes, The scheduling behavior is triggered by specific offset in this case.  It changes below insn sequence:

 11405: ......
 11406: {%r2:SI=ltu(%cc:CCL1,0)+%r2:SI+[%r8:SI+0xf8];clobber %cc:CC;}
        ......
 2803: {%r8:SI=%r8:SI+0x80;clobber %cc:CC;}
      REG_UNUSED %cc:CC


into:
 11405: ......
 2803: {%r8:SI=%r8:SI+0x80;clobber %cc:CC;}
      REG_UNUSED %cc:CC
 11406: {%r2:SI=ltu(%cc:CCL1,0)+%r2:SI+[%r8:SI+0x78];clobber %cc:CC;}

by changing the offset in insn 11406.

The problem is why minipass in http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00155.html would do this transformation.  According to the description, it is to change
  rn++
  rm=[rn]
into
  rm=[rn+4]
  rn++

Here it is exactly the opposite tranformation and introducing more dependency.

Thanks.
Comment 20 Jakub Jelinek 2014-08-13 16:25:07 UTC
Created attachment 33314 [details]
pr62025.c
Comment 21 Jakub Jelinek 2014-08-13 16:27:13 UTC
Created attachment 33315 [details]
pr62025-2.c

Possible testcase for the testsuite.  Except this one only used to be miscompiled with trunk and not 4.9 branch, and unfortunately also needs -march=z900 at least, so we'd need effective-target for 64-bit s390-*-*.
Perhaps not worth adding this into the testsuite.
Comment 22 Jakub Jelinek 2014-09-01 11:16:13 UTC
Author: jakub
Date: Mon Sep  1 11:15:41 2014
New Revision: 214786

URL: https://gcc.gnu.org/viewcvs?rev=214786&root=gcc&view=rev
Log:
	PR target/62025
	* sched-deps.c (add_or_update_dep_1): If ask_dependency_caches
	returned DEP_PRESENT, make sure to set DEP_MULTIPLE on present_dep.
	(find_inc): Revert 2014-08-13 change.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/sched-deps.c
Comment 23 Jakub Jelinek 2014-09-01 11:50:08 UTC
Author: jakub
Date: Mon Sep  1 11:49:36 2014
New Revision: 214790

URL: https://gcc.gnu.org/viewcvs?rev=214790&root=gcc&view=rev
Log:
	PR target/62025
	* sched-deps.c (add_or_update_dep_1): If ask_dependency_caches
	returned DEP_PRESENT, make sure to set DEP_MULTIPLE on present_dep.
	(find_inc): Revert 2014-08-12 change.

Modified:
    branches/gcc-4_9-branch/gcc/ChangeLog
    branches/gcc-4_9-branch/gcc/sched-deps.c
Comment 24 Jakub Jelinek 2014-10-14 14:01:07 UTC
Should be fixed for 4.9.2+.