Postscript on the outage

A picture named car.gifTwo weeks ago, when I was in Calif, preparing my house to go on the market, and moving almost 40 years of possessions, or giving them away, or selling them, and preparing to drive cross-country — while all that was going on — stopped working.

It appeared the problem might be with the host, the server was running on Amazon, and the people at Amazon went above and beyond the call of duty to try to find a problem on their end, to no avail. I left town with the server turned off, and most of the content and apps ported to run on an older server hosted elsewhere.

On the drive, with time to think, I had an idea I knew what the problem was, and managed a way to get a look at the server (that was a big part of the problem, I couldn’t get through to the server from Remote Desktop Connection). Turns out one of Apache’s log files had grown to over a gigabyte in size. It doesn’t run at all with such a large log file.

Performance degraded over time until finally it no longer took hits and maintaining the bloated log file consumed all the CPU cycles.

I deleted the log file, reattached the EBS volume, restarted the server, and it’s been working perfectly for 72 hours. I’d say with a high degree of confidence that the problem is solved.

Bottom-line: The outage was my fault.



%d 位部落客按了讚: