Defining acceptable maintenance "window" times


#1

Hi team,

Can you please give us the ability to nominate windows of times that the school has defined as acceptable for a planned outage to occur?
At present, our device seems to restart frequently and at the most inconvenient times - this is causing an unacceptable level of frustration amongst our users.

I should add, that if emergency maintenance needs to be performed outside of these windows, please could you communicate with us before “bouncing the box”?

Thanks guys, appreciate your help in getting this under control!

Cheers,
Andy


#2

Hi Andy,
Fair call!!

Your box was one of the units that was affected by a bug we had in some monitoring. Ping checks were failing spontaneously, which is not generally an issue, but each failure was resulting in a leaking file descriptor. Once the number of leaked file descriptors exceeded the OS limit this triggered a restart. This was restarting your box a few times per day.

We patched this issue last night, but your device has automatic updates disabled (we can change that) and I did not realise that your box missed the update until it showed up on my monitoring this morning. So I triggered the update to prevent further bug related restarts over the day. Sorry for not giving you heads up regarding this, how rude huh!!!

We have had some issues perfecting our update process which has meant we have historically pushed updates out much earlier than we perhaps should be so we can monitor them. But the last 4 updates have been 100% successful (WAAAHOOOO) which means we have got the transactional update process downpacked in SphireOS so we can start triggering updates at 2am local time.

For critical updates that just have to go out, we can announce them if we have to push them out during the day.

Cheers,
Michael