So, what is special about Engineering that leads us to drive away so many promising engineers, and allow so many others to achieve less than their potential?
We need to balance respect for our culture, with an openness to change it as needed.
We know that a strong sense of culture, shared understandings and common values are required to succeed. So we need to be able to balance that respect for our culture, with an openness to change it as needed. A team – initially happy to work from home – needs to change how they work if they take on some interns. A team – proud that every engineer is on-call for their service – may need to professionalize around a smaller team of operations-focused engineers as the potential production impact of an outage grows.
We need to be thoughtful about how we balance work people love, with work the company needs to get done. Good managers are proactive about moving on an engineer who is a poor fit for their team’s workload. Great managers expand their team’s remit to make better use of the engineers they have, so they feel their skills and talents are valued. Engineers whose skills go unused grow frustrated. Engineers given work they are ill-equipped to succeed, will feel setup to fail.
It’s hard to give 100% if you spend mental energy pretending to be someone else. We need to make sure people can be themselves by ensuring we say something when we witness disrespect. David Morrison (Australia’s Chief of the Army) captured this sentiment perfectly, in his “the standard you walk past is the standard you accept” speech.
Being thoughtless about people’s feelings and experiences can shut them down. Some examples where I’ve personally intervened:
· Someone welcomes a new female project manager to the team, assumes they aren’t technical and uses baby words to explain a service. I highlight the new PM has a PhD in CS. No harm was intended, and the speaker was mortified that their good-humored introduction was taken the wrong way.
· In a conversation about people’s previous positions, someone mentioned they worked for a no-longer-successful company, and a teammate mocked them for being “brave enough” to admit it. I pointed out that mocking people is unprofessional and unwelcome, and everyone present understood a ‘line’ that hadn’t been visible previously.
· A quiet, bright engineer consistently gets talked over by extroverts in meetings. I point out to the “loud” people that we were missing an important viewpoint by not ensuring everyone speaks up. Everyone becomes more self-aware.
It’s essential to challenge lack of respect immediately, politely, and in front of everyone who heard the disrespect. It would have been wonderful had someone reminded Karen’s director, in front of the group, that the outage wasn’t a big deal, and the team should improve their test coverage.
Some companies talk of 20% time. Intercom has “buffer” weeks, in between some of our 6-week sprints. People often take that chance to scratch an itch that was bothering them, without impacting the external commitments the team has made. Creating an expectation that everyone on the team should think outside the box, and ensuring that the whole team can go off-piste at the same time, is a powerful message.
Be careful that “innovation time” isn’t the only time people should take chances. One company in the transport industry considers “innovation time” to be 2:30 p.m. on Tuesdays.
Imagine how grateful Karen would have been, had a senior engineer at the Engineering Review offered to work on her design with her, so it was more acceptable to the team. Improve people’s ideas, rather than discounting them.
I love how my team writes goals on Post-It notes at our daily standups and weekly goal meetings. These visible marks of success can be cheered as they are moved to the “done” pile.
But we can also celebrate glorious failure. Many years ago, when I was running one of Google’s storage SRE team, we were halfway through a three-year project to replace the old Google File System.
We can also celebrate glorious failure.
Through a confluence of bad batteries, bad firmware, poor tooling, untested software, an aggressive rollout schedule and two power cuts, we lost a whole storage cell for a number of hours, and though all services would have had storage in other availability zones, the team spent three long days, and three long nights rebuilding the zone. Once it was done, they – and I – were dejected. Demoralized. Defeated. An amazing manager who was visiting our office realized I was down, and pointed out that we’d just learned more about our new storage stack in those three days, than we had in the previous three months. He reckoned a celebration was in order.
I bought some cheap sparkling wine from the local supermarket, and with another manager, took over a big conference room for a few hours. Each time someone wrote something they learned on the whiteboard, we toasted them. The team that left that room was utterly different to the one that entered it.
I’m sure Karen would have loved appreciation for discovering the team’s weak non-code test coverage, and their undocumented love of uptime-above-all-else.
Rather than yelling at an engineering team each time they have an outage, help them build tools to measure what an outage is, a Service Level Objective that shows how they are doing, and a culture that means they use the space between their objective, and reality, to choose the work that will have the most impact.
Ask for a specific commitment, rather than assuming everyone agrees on its urgency.
When discussing failures, people need to feel safe to share all relevant information, with the understanding that they will be judged not on how they fail, but how their handling of failures improved the team, their product and the organization as a whole. Teams with operational responsibilities need to come together and discuss outages and process failures. It’s essential to approach these as fun learning opportunities, not root-cause obsessed witch-hunts.
I’ve seen a team paralyzed, trying to decide whether to ship an efficiency win that would increase end-user latency by 20%. A short conversation with the product team resulted in updates to the SLO, detailing “estimated customer attrition due to different latency levels”, and the impact that would have on the company’s bottom line. Anyone on the team could see in seconds that low-latency was far more important than hardware costs, and instead drastically over-provisioned.
If you expect someone to do something for you, ask for a specific commitment – “When might this be done?”, rather than assuming everyone agrees on its urgency. Trust can be destroyed by missed commitments.
Karen would have enjoyed a manager who told her in advance that the team considered reliability sacred, and asked her to work on reliability improvements, rather than optimizations.
If you are inspired to make your team feel more psychologically safe, there are a few things you can do today:
Treat psychological safety as a key business metric, as important as revenue, cost of sales or uptime. This will feed into your team’s effectiveness, productivity and staff retention and any other business metric you value.