No one wants to hear the words, “The site is down,” because it ultimately means loss of income. Income losses might continue adding up long after the website is live again. Why? Because customers don’t tolerate downtime very well. There are too many other sites from which to choose on this ever-growing Internet of ours.
Your customers have an expectation of 100 percent uptime. They don’t care about maintenance, malware, or other issues affecting your underlying infrastructure; they want your site to be there and available when they’re ready to see it. If your site isn’t available, you might lose the customer’s loyalty forever.
Some providers provide a financial remedy for outages when the provider is found to be at fault. However, no amount of remedy can bring back customers who find an outage unacceptable in severity, duration, or cause. Other providers make limited promises in their contracts, stating that they do not promise that your service will be uninterrupted, error-free, or completely secure. You also must acknowledge by signing the agreement that you accept the risks inherent in Internet connectivity that could result in the loss of your privacy, information, and property.
It might surprise you to know that at a 99% uptime guarantee, that’s 3 days 15 hours and 36 minutes of lost business or 7 hours and 12 minutes per month. At 99.999% (known as five 9s), that’s 5 minutes and 15 seconds of downtime per year.
Downtime Explained
Downtime, simply put, is the length of time that your site is unavailable to external users. External users are customers or people who want to view your site for educational, research, jobs, or competitive reconnaissance. But whatever the reason is that someone wants to look at your website, it needs to be available and functional. For hosted sites and applications, there are a few possible reasons, or so-called “root causes,” for site downtime:
- Hardware failures
- Human error
- Internet outage
- Malware attacks
- Negligence
As a hosted site and services customer, you need to find the root cause of the downtime. You need to find the root cause because it will determine your loyalty to the provider.
Hardware Failures
You might think that hardware failures are acceptable and inevitable. They are. However, in terms of downtime, hardware failures are generally not acceptable root causes. How can that be since failures of this type are inevitable? The reasons why hardware failure is not an acceptable root cause excuse for an outage boils down to these three terms: redundancy, failover, and elasticity. The underlying hardware in contemporary data centers has a high level of redundancy built into it. Redundant power supplies, redundant backup batteries, redundant storage interfaces, redundant disk array configurations, redundant network interfaces, and redundant network paths make the hardware failure excuse a very difficult one to rationalize. From server architecture to the data center’s own infrastructure, redundant systems are in place to prevent hardware outages.
There are instances of what’s known as “cascading failures” that are catastrophic in nature and affect a wide range of systems in a data center. Although they are rare, they do happen. An in-depth root cause investigation should identify the reasons for the failures.
Human Error
Proper training, written procedures, and change management systems are in place to prevent human error. Data center personnel must follow strict procedures and protocols when patching, repairing, and updating any system’s hardware and software components. Human error is an unacceptable excuse for outages. Human error, like hardware failure, is also normal, natural, and an unfortunate fact. To prevent human error-caused downtime, many providers have now deployed automated systems for updates, maintenance, and other repairs. If human error is the root cause of an outage, customers need to take a serious look at their hosting contract’s bailout clauses for breaches of service level agreements (SLAs) and notification times for termination of the agreement.
Internet Outages
Internet outages occur without notice and can have a major impact on your business. The only proactive remedy you can take is either to have a hot backup site ready to failover to or have your site setup with cloud resiliency. In other words, don’t allow a localized outage to kill your business for any length of time. Localized outages will prevent some customers from accessing your site but certainly not everyone. Remember that it’s your reputation that’s on the line and to have a local outage take your site down is shortsighted.
For example, if your website is physically located in a data center in Virginia and there’s a major Internet line cut or some other outage takes it offline, then your revenue is now in the hands of the crews working to restore that service. However, if your website is located in more than one location, such as California and Virginia, and an outage occurs, your hosting provider will simply reroute all traffic to the California data center. Your site stays up and experiences very little, if any, downtime. Geographic diversification via the cloud is nothing new and is often a standard offering for hosted sites. Don’t allow a single point of failure to affect your bottom line.
Malware Attacks
Being exposed to the Internet, malware is a problem that we all face and it’s getting worse. To justify that statement, you only have to look at the frequency of security updates from hardware and software vendors. Have you ever heard of Patch Tuesday? It is Microsoft’s weekly release of updates for its many current operating systems and enterprise software. If you’re on the correct mailing lists, you’ll receive daily, if not multiple daily emails outlining newly discovered security flaws.
You only have to reach as far as your browser to search for “Ransomware attacks” to find dozens of recent instances of businesses paying huge sums of money to reverse the effects of a ransomware attack. Viruses, Trojan Horses, Spyware, Adware, Ransomware, worms, and advanced persistent threats (APTs) are continuously launched against servers, routers, firewalls, Internet of Things (IoT) devices, mobile devices, and operating systems. Your hosting provider needs to have preventative measures and action plans in place in case of malware attacks.
Negligence
Hosting provider negligence is perhaps the most unforgivable of all provider sins. Negligence can range from support staff not answering customer requests to not keeping systems updated with patches to failing to take regular backups of customer data. Negligence leaves your data open to loss by theft, destruction, malware attacks, and preventable outages.
The best that you can do as a customer is to check that updates, encryption, backups, and other regular maintenance tasks are being carried out as described in your contract. If you have access to your systems via terminal sessions or remote desktop services, you can check for yourself. You should report any violations of your terms of service (TOS) and SLAs as soon as you find a problem.
Cost of Downtime Calculator
The actual cost of downtime is as individual as fingerprints. There’s no general rule of thumb that can quantify your costs. This is because each site is different in the amount of revenue that streams from a website. For example, when Amazon went down for 63 minutes on its major Prime Day, some organizations estimate the outage to have cost the number one online retailer more than $99,166,667 in estimated lost sales.
In 2014, Gartner reported from multiple surveys that the average cost of downtime is $5,600 per minute. However, the Gartner blogger, Andrew Lerner, goes on to state that “this is just an average and there is a large degree of variance, based on the characteristics of your business and environment (i.e., your vertical, risk tolerance etc). For example, [one] study indicates the range is from $140K to $540K per hour.”
eCommerce losses are generally calculated by totaling the amount of revenue generated over a year’s time and divided by the number of minutes in a year (525,600) to obtain a per-minute cost multiplied by downtime minutes.
Downtime loss ($) = Annual revenue/525,600 X Downtime minutes
This calculation only takes into account the loss of revenue from the actual downtime experienced and there are other formulae available to factor in other costs into the equation. Other cost factors may include loss of reputation, meetings, press releases, hardware and software upgrades, and the cost of changing providers due to contractual obligations. The greatest loss, which is most difficult to calculate, is loss of reputation.
Scheduled vs. Unscheduled Downtime
There are two types of downtime that you will see described in hosting contracts: scheduled and unscheduled. Scheduled downtime covers maintenance windows that might include patching, rebooting, hardware maintenance, security enhancement, and other tasks as required. Unscheduled downtime is everything else that includes natural disasters, security breaches, malware infections, hardware failures, human error, or emergency patching.
Be sure that your hosting provider understands the nature of your business and when it is acceptable to schedule maintenance windows. Scheduled maintenance does not violate a service level agreement because the outage is planned and announced in advance. Some unscheduled downtime might also fall under acceptable downtime in the case of emergency security patching requirements or other such emergencies. As a hosting customer, you will generally be notified as quickly as possible prior to the outage but realize that these temporary outages might occur.
Availability and Service Level Agreements (SLAs)
A service level agreement (SLA) is a contract between you and the hosting company that you hire to house and optionally maintain your services and applications. Generally speaking, your availability guarantees are included in the SLA. The SLA and contractual elements should also include remedies, if any, should there be a violation of the agreed-upon availability or other services.
Such contracts and agreements are written from the perspective of the hosting company, so it is recommended that you have an attorney examine any agreements prior to signing. These contracts are not written in stone and can be amended, changed, and edited to satisfy the needs of the customer (you), so you need not accept all the words of a poorly written contract.
Service Level | Annual | Monthly |
99 | 3 days 15 hours 36 minutes | 7 hours 12 minutes |
99.9 | 8 hours 45 minutes 36 seconds | 43 minutes 12 seconds |
99.99 | 52 minutes 34 seconds | 4 minutes 19 seconds |
99.999 | 5 minutes 15 seconds | 26 seconds |
99.9999 | 32 seconds | 3 seconds |
Table 1: SLAs and allowed outage durations
What Causes Website Downtime?
The previously mentioned root cause analysis is an investigation into the cause of an outage or downtime event. In addition to finding the root cause, investigators will often assign “blame” to an outage. For example, in the case of human error, the root cause is human error and the blame is human error. If a zero-day flaw is found to be the root cause, the blame is sometimes placed on patch management problems or on human error.
Sometimes an outage will be labeled as “customer caused” and no further investigation is required by the hosting company. Customers may engage hosting company personnel to pinpoint the problem but usually with an additional charge. There is no remedy for a customer caused outage and the downtime cannot count against the SLA. The customer is required to fix the problem as soon as practical. Depending on the problem, hosting companies can suspend service until the problem is fixed if the issue endangers underlying infrastructure or other customer’s data.
Choose Partners With Uptime in Mind
Perhaps the best method to deal with downtime is to realize that it can happen regardless of the safeguards you and your hosting company puts into place and into practice. As a hosting customer, you must devise a method of handling outages. It could be in the form of a hot backup site that you can quickly switch to in the case of a primary site failure or it could be as simple as directing customers to a maintenance page with an approximate ETA somewhere on the page. You need to be careful to select a hosting company with good customer reviews, a long track record of few outages, and a highly redundant data center. But realize that some outages can be your fault.
Your contract is a partnership between you and the hosting company. Be sure that the company representatives who you interface with have an understanding of your business, your needs, and your expectations of the services you are purchasing.
If you want to partner with hosting professionals dedicated to keeping your site running, contact ColoCrossing.