Intl. connectivity: CloudFlare

Clients using CloudFlare for their website might be affected by an outage affecting two undersea cables and CloudFlare.

CloudFlare status: https://www.cloudflarestatus.com

More information: http://www.techcentral.co.za/seacom-wacs-problems-hit-sa-internet/62649/

Arryn: offline

8:50PM There is a problem in the Hetzner Datacenter, which they are working on. Arryn is offline until they get it fixed, which hopefully will not be too long.

9:30PM Hetzner has fixed the problem – I’m waiting for a RFO to find out exactly what their problem was.

Update: Hetzner says that a “network cable was loose”.

Arryn: apache problem

11:30AM All sites on Arryn are displaying a 500 server error. We’re working on the problem.

11:40 We’ve found the problem, and are working on a fix

11:45 Server is being rebooted after the fix.

11:57 Server is up after reboot, but the problem remains. Emergency ticket has been submitted to CloudLinux, who have logged into the server to fix the problem with their software.

12:01 The problem has been fixed. Apologies for the downtime.

Brienne: server offline

2:40AM Scheduled maintenance was completed on Brienne at 9:10PM on Friday 28 November, but the server failed to reboot. Hetzner’s datacenter technicians are still working on the problem, almost 6 hours later. If they fail to get the server up and running, I will ask them to remove the harddrives so that we can move all the data to another server.

3:45AM the Hetzner technicians have been unable to boot the server, even after swapping out multiple bits of hardware. We will attache the hard drives to another server, and start copying the data to another one of our servers. Unfortunately, this will take many hours, during which time the sites formerly hosted on Brienne will be offline.

4:55AM The technicians have attached the hard drives to another server, and we have started copying the data.

Dantari offline

11:15 There’s a problem on Dantari, which we’re working on.

11:39 The problem has been fixed, and Dantari is rebooting. Because the server has been online for so long (327 days) the OS is doing a filesystem check, which cannot be avoided. This should be completed within 30 minutes, and then the server will complete booting up. Sincere apologies for the inconvenience this is causing.

Capture

11:49 filesystem check is at 64%

11:55 66%

12:03 71% completed

12:08 83%

12:13 the filesystem check is 90% complete

12:17 filesystem check has completed, and boot will continue

12:20 Dantari is back online, load will be a bit high (slow sites) while the disk I/O settles down after the reboot.