Intl. connectivity: CloudFlare

Clients using CloudFlare for their website might be affected by an outage affecting two undersea cables and CloudFlare.

CloudFlare status: https://www.cloudflarestatus.com

More information: http://www.techcentral.co.za/seacom-wacs-problems-hit-sa-internet/62649/

Emergency maintenance

Tue 11 August 4:40PM Hetzner has informed us that the RAID alarm is currently sounding, and that KEATS needs to be switched off in order to diagnose and fix the problem.

5:28PM Keats is being shutdown now

6:02PM Response from Hetzner:

This mail serves to confirm that the maintenance on your server tex001_truservcomm_jhb1_009, was completed successfully. 
SDA was swapped, and RAID is currently rebuilding.

6:15PM The RAID rebuild is at 24%

6:27PM RAID rebuild is at 30%

6:41PM 41%

6:53PM RAID rebuild is at 50%

7:19PM 65%

7:51PM 79%

8:07PM RAID rebuild is at 90%

10:36PM Hetzner says that the server is “fixed”. Unfortunately, it won’t boot. I am therefore going to reinstall the server, and then restore all hosting accounts from backup. Please accept my sincere apologies for this. I will work through the night and tomorrow to get all sites up as soon as possible.

 

2AM Wed 12 August OS has been rebuilt, cPanel and Cloudlinux have been installed. Restoration of hosting accounts is starting now.

4:30AM All accounts have been restored from backup.

Ongoing maintenance: Tyrion

10:05 AM Tyrion will be offline for short periods of time througout the day as Hetzner technicians attempt to troubleshoot a hardware error.

Most clients have been moved off Tyrion and onto Lannister, so this will affect only the 10 or so domains which are still pointing to Tyrion.

13:15 Troubleshooting has completed. Tyron will go offline at 6PM tonight (14 July 2015) for up to 12 hours while the OS is reinstalled.

Arryn: offline

8:50PM There is a problem in the Hetzner Datacenter, which they are working on. Arryn is offline until they get it fixed, which hopefully will not be too long.

9:30PM Hetzner has fixed the problem – I’m waiting for a RFO to find out exactly what their problem was.

Update: Hetzner says that a “network cable was loose”.

Arryn: apache problem

11:30AM All sites on Arryn are displaying a 500 server error. We’re working on the problem.

11:40 We’ve found the problem, and are working on a fix

11:45 Server is being rebooted after the fix.

11:57 Server is up after reboot, but the problem remains. Emergency ticket has been submitted to CloudLinux, who have logged into the server to fix the problem with their software.

12:01 The problem has been fixed. Apologies for the downtime.