Background
Site Reliability Engineering (SRE) and its interconnected areas such as Observability, Platform Engineering, and DevOps, have typically operated without Product Managers. I believe that’s happened because IT Operations was seen solely as a cost center and not as a source of competitive advantage.
With the rise of technology giants such as Google, Amazon or Facebook, other companies started adopting similar SRE practices that improve efficiency, security, development speed, and the reliability performance of large-scale systems. Everyone is trying to move at the same speed as big tech and nimble startups. Bets on SRE or DevOps are now seen as investments with positive returns, rather than sunk costs.
There’s little to no literature coming from Google describing how Product Managers can be part of an SRE team. Although there’s been lots to say about the SRE Team Lifecycles and their different topologies, there hasn’t been much around bringing non-engineers into this function. I think that’s going to change soon.
Why do SRE teams need Product Managers?
What do SRE Product Managers do?
Product Managers supporting SRE and Platform teams are asked to bring traditional product management techniques, such as user research, roadmap prioritization, and stakeholder alignment into the reliability world. According to several job descriptions I’ve analyzed, their responsibilities often include:
- Partnering with engineering and product leads to build product roadmaps for SRE
- Creating a long-term strategy for observability and tooling investments, including managing vendor relationships
- Implementing and maintaining Service-Level Indicators (SLIs) and Service-Level Objectives (SLOs)
- Creating profiles of users (software engineers) and ensuring SRE’s products addresses their needs
- Championing reliability ownership across non-SRE teams and enabling them to account for & track reliability of the services they’re responsible for
- Owning the vision and strategy for: incident management, disaster recovery, performance testing, chaos engineering, etc.
Note: Responsibilities will vary from one organization to another, as well as job titles — SRE Product Lead, Technical Program Manager, SRE Product Owner, etc.
Below is a visual example of how a Product Manager might be part of an SRE team and some of their responsibilities — don’t take the SRE’s work areas as an absolute truth, I know there are many missing and some of these are always shared responsibilities across the team!
Given SRE’s principle of applying software to manage and automate IT, the function has successfully taken on many areas of responsibility. And it has been able to do so with less people than it would normally have been needed to move at the same speed reliably. That means complexity has increased drastically and now there’s a need for a focused strategy, planning and management function within SRE.
I believe that we will start seeing more and more product managers step into this area or, most likely, more engineers formally take on a technical product management role within reliability. My second hypothesis is that the SLO methodology will become the product manager’s best friend because it will allow them to:
- Agree with non-engineering functions on the reliability goals needed to meet or exceed customer expectations
- Communicate about reliability performance with SLIs/SLOs as a standardized language
- Prioritize roadmap according to SLO historical performance
- Design better alerting and incident management strategies with burn rate alerting
- Enable teams to own reliability of their services with out-of-the-box service SLIs
- Monitor data-driven KPIs/OKRs, allowing for weighted, justified and fast decision making
More on the above with demos of Rely.io on a future blog post coming soon!
Further readings
Jen Wohlner, Fastly
SRE & Product Management: How to Level up Your Team (and Career!) by Thinking like a Product Manager
Grant Smith, nextgendevops.com
Site Reliability Engineering needs product managers
Isabel Lilles, PagerDuty