What’s Up With The KTC Forums?
If you’re a member of our community, you’re certainly aware that our forums have been having quite a few issues today (Monday, November 21st 2016). Without going into too many technical details I wanted to let you know what’s going on and what’s being done to mitigate the issues. This post will be updated as further information is available.
What’s Going On With The Forums
Our forums are hosted by a company called Zetaboards. www.zetaboards.com They’ve been with this company since the site’s inception in November of 2006. This morning they experienced an outage that affected users on multiple servers. The server that our forums resides on was one of the affected.
This outage is NOT specific to KillTheCan.org and is affecting all users on a few of ZetaBoards servers.
What’s Being Done
ZetaBoards has identified the issue as a hardware issue and are working to restore access as quickly as possible. The hardware failure this morning corrupted the database on the server running communities on several additional servers. At this point they have repaired the hardware and are working on restoring both the software and the database. Thanks to a variety of our backup protection measures they currently do not believe there will be data lost. However the process of rebuilding the server and database is unfortunately a slow one. It has forced them to keep communities offline while the work is being completed.
For our part, we want to make sure that people still have a place to post roll on a daily basis and give their promise to their fellow KTC brothers and sisters for that added accountability that makes our site so successful. Feel free to use the daily roll call posts here on the blog to post roll by leaving a comment with your forum name, days and quit group.
Update – 11.22.2016 @ 10:28 AM
The following note was just posted to our host’s support forum:
We are continuing our work to restore access to the communities affected by yesterday’s outage. The outage itself was caused by an abrupt hardware failure which led to a corrupted database. The defective hardware has been replaced and we’re working to fix the database.
We maintain multiple layers of backup systems in the event of issues like this. However in this case, our primary backup, which would have allowed us to restore access much more quickly, was also unusable.
As a result we’ve had to move to our secondary backup systems. While these are complete backups and will restore all data, they take much longer to bring back online.
We currently have 2 steps remaining in the restoration process. Based on performance in the second to last step we expect that step to finish this afternoon. Once that is done and the final step begins we’ll be able to provide a clearer estimate for full access to be restored.
We understand that this is a difficult experience for your communities and we’re doing everything we can to restore them as quickly as possible. Thank you again for your patience.
Update – 11.22.2016 @ 7:30 PM
The following note was just posted to our host’s support forum:
We have moved on to the final step. The initial completion estimate is the morning of Wednesday (Eastern Time). I will continue to provide updates throughout the evening as I am able to provide more accurate estimates.
Update – 11.23.2016 @ 3:34 AM
The following note was just posted to our host’s support forum:
The final step continues to run smoothly. The current completion estimate is around 10AM (Eastern Time).
Update – 11.23.2016 @ 6:30 PM
The following note was just posted to our host’s support forum:
All servers are now back online. Thanks for your patience while we worked to resolve this outage.
Thank you for your patience as we work through these issues.