Weekly Recap of All Things Site Reliability Engineering (SRE)
Welcome to the first post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at firstname.lastname@example.org.
SRE Items of the Week
INTRO TO POST
For several years I managed the 3rd line site reliability operation for many of the world’s busiest gambling sites, working for a little-known company that built and ran the core backend online software for several businesses that each at peak could take tens of millions of pounds in revenue per hour. I left a couple of years ago, so it’s a good time to reflect on what I learned in the process.
In many ways, what we did was similar to what’s now called an SRE function (I’m going to call us SREs, but the acronym didn’t exist at the time). We were on call, had to respond to incidents, made recommendations for re-engineering, provided robust feedback to developers and customer teams, managed escalations and emergency situations, ran monitoring systems, and so on.
The team I joined was around 5 engineers (all former developers and technical leaders), which grew to around 50 of more mixed experience across multiple locations by the time I left.
I’m going to focus here on process and documentation, since I don’t think they’re talked about usefully enough where I do read about them.
INTRO TO POST
What happens when DevOps methods meet hybrid environments? Following are some emerging trends and my commentary on each.
There are two major casualties as the pace of innovation in IT continues to accelerate: manual processes (non-DevOps) and tightly-coupled software stacks (non-hybrid).
We are changing some things much too quickly for developers and operators to keep up using processes that require human intervention in routine activities like integrated testing or deployment. Furthermore, monolithic platforms—our traditional “duck-and-cover” protection from pace of change—are less attractive for numerous reasons, including slower pace, vendor lock-in and lack of choice.
RECENT SRE AND DEVOPS EVENTS
- Videos of all Sessions from Woodland Hunter
CloudNativeCon + KubeCon 2017 March 29-30, 2017 in Berlin
IBM Interconnect March in Las Vegas, NV
- Christopher Ferris, IBM CTO Open Technology and Rob Hirschfeld “Open Cloud Architecture: Think You Can Out-Innovate the Best of the Rest” – SLIDES
- “Best Practices in Operating Hybrid Infrastructure that Spans Clouds and the Data Center” – BLOG / SLIDES
UPCOMING MEETUPS & PODCASTS
Continuous Discussions (#c9d9) Episode 66: Scaling Agile and DevOps in the Enterprise – April 11, 2017 at 10am PT. Rob Hirschfeld a guest in this Electric Cloud podcast.
UPCOMING EVENTS FOR RACKN
Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email email@example.com.
DockerCon 2017 : April 17 – 20, 2017 in Austin, TX
DevOpsDays Austin : May 4-5, 2017 in Austin TX
OpenStack Summit : May 8 – 11, 2017 in Boston, MA
Interop ITX : May 15 – 19, 2017 in Las Vegas, NV
Open Source IT Summit – Tuesday, May 16, 9:00 – 5:00pm : Rob Hirschfeld to speak
Gluecon : May 24 – 25, 2017 in Denver, CO
- Surviving Day 2 in Open Source Hybrid Automation – May 23, 2017 : Rob Hirschfeld and Greg Althaus