why it’s good to have more than one blog in your life
Shared server ‘perseus’ offline for unscheduled hardware replacement Posted (August 8th, 2011 at 12:36 am PST) by patrickm We regret to report that the shared server ‘perseus’ had to be taken offline to perform a file-system integrity check, as it came back from a reboot with a read-only file-system too many times. Unfortunately, this process did not work and we are now replacing the hardware for this server and once that is complete, we will be restoring the server from backups. Once this begins, the server will remain online during the restore. We will update this post with details!
Update Aug 8th, 4:35pm PDT: The file system check continues, unabated. We have no ETA on when this will finish, but you can be sure that we will keep you up to date as this issue progresses. – Jason
Update Aug 8th, 1:00pm PDT: The file system check seems to have restarted due to errors, and is back to 25%. We are working on alternative plans to get this server back up, such as restoring from backup. More information to come. -Justin
Update Aug 8th, 11:00am PDT: The file system check is still running normally. It is now up to 46.2% completed. -Justin
Update Aug 8th, 10:00am PDT: The file system check is still progressing nicely. It is now up to 30% completed. -Justin
Update Aug 8th, 9:30am PDT: The hardware replacement has been successful, the server so far is running stable. Unfortunately the many reboots from the old hardware issues made the filesystem inconsistent so it is having to do a file system check now. It is currently 17.6% done. Now that the hardware is working and we can give percentages of the progress of the fsck, we will update this post more regularly with the progress. -Justin
Update Aug 8th, 8:00am PDT: Our apologizes for the extended downtime, we do have our admin team on the case and they are trying to replace the problematic hardware (raid card), which should be the faster and less downtime method for fixing this server. When we have a definite answer on if this replacement will be a permanent fix, or if more serious hardware replacement is needed, we will update this post. -Justin
Update Aug 8th, 6:00am PDT: We are now replacing the hardware for this server. We will update this post with a status on the restoration of data! -PatrickM
Update Aug 8th, 5:54am PDT: We have replaced the RAID controller for this server and restarted the file-system integrity check. We do not yet have an ETA for this, but we will update this post shortly with more details! -PatrickM
Update Aug 8th, 5:00am PDT: We are in the process of replacing the RAID controller for this server in the hopes that it helps us to complete this check. We will update this post shortly with news! -PatrickM