Request access

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

How AWS Shares SLOs With You

António Araújo
António Araújo
Go To Market Lead
Rely.io
António Araújo
April 21, 2022
5
 min read
How AWS Shares SLOs With You | Rely.io

tl;dr — AWS shares availability SLOs for dozens of their services. Full list available here

Reliability Glossary

“SLIs drive SLOs which inform SLAs”

From Google Cloud Tech

Service-Level Agreement (SLA)

A promise, usually written into a contract between two parties, of the acceptable performance of a service over a certain time period. Failure to meet the promise may result in penalties, such as refunds or issuing of compensation credits. 

Example SLA: A Software as a Service (SaaS) API provider saying that 99% of all responses are delivered within 100ms, tracked on a monthly basis.

Service-Level Indicator (SLI)

The actual metric used to measure performance. 

Example SLI: On the SLA described above, the SLI is the percentage of responses that were indeed delivered within 100ms in the month. So if this SLI is at 98%, the provider is not in compliance with the SLA. 

Service-Level Objective (SLO)

Mathematically, the SLO and SLA are the same concept. SLAs are associated with customer-facing promises around reliability and performance. SLOs, while having the same formula, are an internal mechanism for increased productivity and accountability across teams and services.

Example SLO: An SLA must always be defined and monitored by a more strict SLO. For instance, the SLA above should have a higher SLO such as 99.5% instead of the 99%.

Learn more about SLAs, SLIs, and SLOs with our In-Depth SRE Guide

SLAs in SaaS

Why do you need a SaaS SLA?

As a SaaS customer, you need an SLA because the vendor’s expected reliability and performance must match your own requirements. For example, if you offer a 99% monthly uptime SLA to your customers, you probably shouldn’t use a cloud database provider that only guarantees 98% availability. 

Furthermore, an SLA ensures that the right incentives and commitments are in place. On the one hand, it shows that the provider is prepared to mitigate incidents should they occur; on the other hand, it creates a mechanism to terminate the relationship or be compensated if the vendor fails to comply with the SLA. 

If you provide a service, however, it is also within your interest to offer an SLA because you must always manage customer expectations, even by sharing how likely service can be down and what happens if it does. By codifying the minimum requirements for each level of service and by clearly specifying the service parameters, you ensure clear communication and transparency of what’s being offered. In Business-to-Business (B2B) SaaS, an SLA is mandatory if you’re selling to large companies. Additionally, a company that speaks clearly and confidently of their SLAs is more likely to impress buyers. 

 

Examples of public SaaS SLAs

 

Google Translate SLA

Types of SLIs: Availability

Compensation for downtime starts at: <99.9%

 

Twilio SLA

Types of SLIs: Availability

Compensation for downtime starts at: <99.95%

 

Microsoft Azure CosmosDB SLA

Types of SLIs: Availability, Throughput, Consistency, Latency

Compensation for downtime starts at: <99.9%

Does Amazon really share SLOs with customers?

Spoiler alert: Yes, they do. 

Amazon Web Services (AWS) counts with some really high-profile customers, including major healthcare providers and governments worldwide. As you’d expect, SLAs are table stakes for AWS the same way they are for any other major public cloud provider. 

If you visit AWS’ SLA repository, you’ll see that they list 147 services with publicly available SLAs (at the time of writing). What many people don’t know is that many of those services also have public SLOs, mostly for availability. AWS calls it their Availability Design Goal and makes it accessible here

Having worked at AWS myself, I remember how spectacular it was to talk about the 11 9's of durability. I’m talking about Amazon Simple Storage Service (S3) being designed to provide 99.999999999% of durability over a given year. In simple terms, that means if you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years S3 FAQs.

I’ve only recently learned about the SLO methodology but later I realized that I was already familiar with the concept because of S3. And S3 is a great example to help explain the difference between an SLA and an SLO. Let’s look at the language used by AWS to talk about availability: 

The S3 Standard storage class is designed for 99.99% availability

The key term here is designed for which helps identify this percentage as the SLO. The availability SLA (or contractual SLO) is also shared by AWS and, as you’d expect, it is lower than the availability design goal — 99.9% < 99.99%. 

The 11 9’s of durability I was so enthusiastic about before, on the other hand, don’t make it into S3’s SLA. Don’t get me wrong, data durability might not be part of the contractual SLA but it’s still impressive and should be praised that the durability SLO is publicly available.  

Conclusion

SLOs are a powerful mechanism to measure and to manage reliability as complexity increases in the software, infrastructure and engineering processes. The large majority of organizations should not be like AWS and share them externally. However, products, services and user journeys should be represented and monitored through different kinds of internal SLOs, such as availability and latency. When these SLOs have owners and are taken into account for decision making, teams are able to move faster and build systems that are more resilient. 

If you are a B2B SaaS company, ensure that you’re monitoring SLOs that are more strict than your customer SLAs. Otherwise, you won’t be able to take proactive measures to fix issues that will eventually impact your SLA compliance. Furthermore, by always keeping a record of an SLI’s performance, you’ll easily verify compliance every time a customer submits a service credit request. 

António Araújo
António Araújo
Go To Market Lead
Rely.io
António Araújo
On this page
Contributors
Previous post
There is no previous post
Back to all posts
Next post
There is no next post
Back to all posts
Our blog
See related articles
Why does improving Engineering Performance feel broken?
Why does improving Engineering Performance feel broken?
Improving engineering performance often feels broken due to challenges like misaligned goals and resistance to change. Explore insights and strategies to effectively enhance performance and foster growth.
Samir Brizini
Samir Brizini
December 17, 2024
10
 min
Top Backstage Alternatives
Top Backstage alternatives
The top alternatives and respective trade-offs to the traditional IDP experience offered by Spotify's Backstage
Samir Brizini
Samir Brizini
December 11, 2024
5
 min
Internal Developer Portals: Autonomy, Governance and the Golden Path
Internal Developer Portals: Autonomy, Governance and the Golden Path
Explore how to empower development teams by balancing autonomy and governance, enhancing productivity, and enforcing standards with Internal Developer Portals
Tiago Barbosa
Tiago Barbosa
October 31, 2024
15
 min