Friday, April 30, 2021

Measuring Productivity
by Ron Lichty

 


Never, ever, ever, let story points or velocity be used as a performance metric.

In fact, don’t share points and velocity outside the team.

They’re not only not useful outside the team, they can be counterproductive.

Because points (and velocity) are team-unique, they are useless for comparing teams. One team’s points and velocity have no validity to any other team. None. Zero. Points are a measure of pace, not a measure of performance. They’re team-useful, not of use to anyone outside the team. Not for any purpose I can think of.

Nor are they useful for externally measuring a team’s productivity, as team velocity will vary naturally based on factors outside the team’s control.

Story points derived from rapid, effective relative sizing, combined with velocity, can be very useful to teams themselves, and to delivering predictability to teams’ stakeholders. Points and velocity enable teams to be predictable: they offer the ability to walk down the team’s project backlog and draw a watermark - a predictor of where the team will likely be - three to four months from now.

Then we can do some agile product planning: if we draw a line there, a watermark, we can ask ourselves and our stakeholders, do we have the right stuff above the line?

Predictability is not a principle in agile development. Just a result. We get better predictability from agile development - from relative sizing plus velocity - than anything else I've ever seen used in software. Of course, if we’re truly agile, we’re likely to insert stories and change story order before we get to the watermark. Each of those is a conversation we can have about priorities and the effect on the watermark of swapping stories in and out.

Story points are also really useful to product managers and product owners (and managers!) to understand the ease or difficulty of features and stories. They give product owners particular insight into backlog ordering, enabling them to enable the team, at every sprint planning, to always be delivering the most customer delight the team can provide.

But back to my warning: Points and velocity are not a performance measure! Attempting to use them to measure performance is not only useless, but terribly, terribly counterproductive. Knowing points and velocity are being watched will cause smart people to game them. (Note: all software teams are full of smart people!) Gamed points are useless not only as a metric but worse, gaming points makes them useless for helping teams be predictable!

I quickly copied down a client's rule of thumb a few years ago: What gets measured gets manipulated. It’s the most succinct summary of the biggest problem with metrics - and certainly of using points and velocity as a metric - that I've ever heard or read. (I'm a collector of useful Rules of Thumb. There are 300 of them in our book, Managing the Unmanageable, and we’ve continued to collect them online.)

Based on my client’s rule of thumb, let me re-state the previous observation: Use points as a metric, and the number of points delivered in every sprint will go up. Productivity won't, but points will. Any team of smart people who are aware that management thinks points matter will game them. Measure points, and the inevitable gaming will make points useless as a metric. With the added injury - a major one! - that gamed points are then made useless for the team to leverage internally to be predictable. Gamed, they’ve become meaningless.

As Sally Elatta, cofounder of AgilityHealth, says, "If you ever use metrics to punish or reward, you’ll never see the truth again."

Ethan Bernstein’s Harvard Business School study, The Transparency Paradox field experiment, showed dramatic hits to quality and performance just from workers being aware of being watched. Workers who were encouraged to experiment to improve their process but whose process was constantly monitored for productivity not only saw teams game the system but showed significant performance and quality degradation over giving the teams autonomy to just show results. Todd Lankford kindly translated Bernstein’s factory-floor study to software development in his post, How Transparency Can Kill Productivity in Agile.

Add to all that the cost of morale due to flawed management conclusions based on points measures. "Your team isn't delivering as many points as that other team." Or "Work harder so your velocity goes up!"

Using points and velocity to measure productivity is as counter-productive as measuring lines of code or butt-hours-in-seats.

Productive teams are happy teams. Measuring team happiness - and team health - is a much better metric to gauge productivity than points and velocity.

And ultimately, what we need to care about is customer happiness: not inputs and outputs but outcomes. Are we delighting customers? Are we delivering the most value with the highest quality? Are we delivering the right things, and delivering them right?

As Sanja Bajovic pointed out when I first fostered this message as an online discussion, “One of the issues may be that measuring story points is so easy. All tools support it. Measuring customers’ happiness is more complex.”

Sanja’s point is one of the core ones that Jerry Muller cites in his book, The Tyranny of Metrics. To paraphrase Muller (only slightly), the appeal of metrics is based in good part on the notion that development teams will be unresponsive if they are opaque, and more effective if they are subject to external monitoring. That’s not a useful notion. Muller quotes Andrew Natsios in defining an increasingly problematic, increasingly common human condition that Natsios labeled “Obsessive Measurement Disorder: an intellectual dysfunction rooted in the notion that counting everything… will produce better policy choices and improved management.” Muller devoted his book to debunking that belief.

In the same online discussion about points and velocity, Jeremy Pulcifer added color to my own arguments when he observed that “points are useful in helping order the backlog, the value-proposition. Leaking that metric is a very bad practice.”

Points and velocity, unwatched outside the team, give the team and its product owner the ability to, when management asks what you're planning, walk them over to your card wall and say, Things will likely change - that's the point - but with our velocity today as a measure, we're likely to be here 3 months from now. Do you agree, knowing what we know today, that this is the right order and that we have the right stuff above the line?... That's a useful conversation.

I should, perhaps, note that I find predictability does require stable teams and truly relative sizing to be able to leverage velocity to set predictable watermarks. Given stable teams and truly relative sizing as pre-requisites, I repeatedly see teams deliver to the watermark, plus or minus 20% (with the caveat, of course, that if/when the backlog changes, they’ve adjusted the watermark to match). In software development, that’s a remarkable level of predictability.

Product owners have responsibility to keep stakeholders clued in to what to expect. Walking them up to the card wall and walking them through the ordered backlog of upcoming and future stories can be useful. Sharing velocity charts with stakeholders, on the other hand, is pretty unuseful: velocity is not meaningful outside the team; what stakeholders really want to know is what value they can expect and when.

So what metrics are worth focusing on?

I do find some usefulness to measuring, end of each sprint, the number of stories a team finishes vs. the number they committed to, or the number of story points a team finishes vs. the number of story points they committed to. With the caveat that what’s being measured is not productivity but the team’s ability to plan. Teams good at planning regularly finish somewhere between 85 percent and 110 percent of their points - regularly complete their plan 80-plus percent of the time.

Everybody, just everybody, knows whether teams are, end-of-sprint, delivering what they said they would at the sprint’s beginning. When teams regularly deliver the stories they promised, when they honestly say at the beginning, this is the high-value customer stuff we believe we can deliver, and then 80+ percent of the time demo that stuff visibly end-of-sprint, everyone relaxes and lets them keep delivering value without (or with less) interference.

Teams find it terribly counterproductive when outside voices pressure teams with messages like, "c'mon, you can do more.”

As an engineering manager, I watch commitment-vs-completion primarily both to make sure team members are not under some false idea that they should pressure themselves to increase velocity - and to make sure someone else isn't doing that sort of pressuring to them. In the absence of either of those two, it's to coach them to be more effective at planning.

Effective sprint planning is core to building trust with stakeholders. Only if the team demonstrates predictability in its sprint planning and delivery can the team be convincing to stakeholders with regard to months-out watermarks drawn in the backlog.

Again, outside forces can undercut teams. We’ve probably all experienced otherwise well-meaning managers and project managers who push their teams to plan for more points than they’ve been delivering. When you see this happening, you may want to suggest what I do: Adding paper to a printer doesn’t make it print faster.

Velocity is a measure of pace. If you think your team is capable of a higher pace, then invite the team to retrospect on what might make them more effective and happier; remove the impediment that’s standing in their way; bring in a trainer or coach to tune weak practices to be more effective; or facilitate your team’s engagement as a team (Google’s Aristotle Study calls out “psychological safety” as the differentiator, and how to watch for it; Em Campbell-Pretty calls out culture-first agile in her book Tribal Unity).

Here are other metrics I consider:

1) Outcomes. I want to see visible progress - product increments - being demo'd end-of-every-sprint - progress delivering some increment(s) of the product functionality that customers value most.

2) Happiness. I seek to find measures for both customer happiness and team happiness.

3) Tripartite metrics. I’m attracted to measures advocated by one of scrum's creators, Jeff Sutherland, who suggests measuring cycle time, escaped defects, and team happiness. Important: In my opinion, neither of the first two is useful without the other two.

4) Team engagement. Progress in team ownership and team engagement (and progress in identifying effective practices and adopting and learning them) is critical. Fundamentally, software development is a team sport; we said this in our book Managing the Unmanageable eight years ago, but it continues to hit home for me that software development is a team sport. While measures of team happiness may be representative of ownership and engagement, in my opinion evaluating progress toward team ownership and team engagement relies mostly on judgment from experienced leaders: managers, scrum masters and coaches. It's the bane of our analytical engineering brains that we must rely on experienced judgment over metrics we can analytically measure in evaluating software development. But I’ve seen nothing better.

5) Psychological safety. Google’s study told us we can observe it: when everyone at the table feels like they have the opportunity to speak up, we see “equality in distribution of conversational turn-taking”: no one dominates, no one is silent.

6) Finally, as I said above, I have suggested to any number of product people and teams (with caveats, mind you) that they consider measuring number of stories delivered.

As I (and so many others) have noted, when human beings know they are being measured for performance, they’ll game whatever metrics you’re measuring (even if it’s subconsciously - we innately know what's good for us!). So before we use a metric, we need to really think deeply about how it might be manipulated.

Regarding number of stories delivered… that means we must ask how people might game measuring stories-delivered. One obvious way would be to split a story into multiple, smaller stories: same work, more stories. But good news! Smaller stories are better! While splitting stories can be hard, there's pretty universal agreement that smaller stories (or if we're not doing agile, more granular requirements or smaller tasks) are better for a variety of reasons, from clarity to develop-ability to debug-ability to faster validation that we're on the right track.

Gaming story throughput by making stories smaller not only benefits a product team’s members but also benefits the software development itself. It's one of the very few metrics for which human-manipulation has such a positive side effect. (I've heard stories told of teams that proclaimed they could not split stories further, only to very creatively find useful new ways to split stories after management started measuring numbers of stories.)

But I would add a caution: this positive side effect is not the only side effect. Another way to put more stories into production is to spend less time on their quality and on testing them.

Regardless of your metrics, be very careful. Side effects will likely be rampant.

There has been some really good writing on metrics - articles and posts that explore both possibilities and concerns. Here are a few I review from time to time:
• from Pivotal Labs: don't measure velocity but volatility, cycle time, rejection rate
• genius overview of metrics by Ron Jeffries
• genius overview of metrics by Sean McHugh
• wonderful study of team productivity through the lens of DevOps metrics: Accelerate, by Nicole Forsgren, Jez Humble and Gene Kim - a must-read, in my opinion

Monday, September 07, 2020

Taking Agile Remote: Process and Tools
by Ron Lichty


“In-person is the gold standard,” I heard a colleague say. We were discussing the impact of the pandemic on team communications.

The impact of the pandemic, of course, is that there’s almost no in-person to be had. Our software development world is now almost entirely remote.

Our world was a long way from the gold standard prior to the pandemic. I was far from first to the party when, at Apple 30 years ago, I outsourced development to a programming team in Ohio. Despite the best intentions of the Agile Manifesto’s signers, when they stated, “the most efficient and effective method of conveying information to and within a development team is face-to-face conversation,” the trend continued with organizations scattering teams around the world, and some - some I managed - entirely remote.

But let me step back…

I was first introduced to agile in the form of XP 21 years ago, 1999, when I was at Schwab, and I soon began managing, leading and coaching agile teams. My first consulting engagement introducing agile was 10 years later. It was 2009, and I was advising a startup trying to find its path. While it had been following a north star that had mostly stayed the same, the path had veered every which way - a real challenge for a product team trying to execute what had been a six-month waterfall plan that was now into its 15th month.

Even a modicum of agile practices, I thought, would help this startup and its team. The team already had the building blocks: feature names in a spreadsheet. If we just ordered them effectively and developed in short iterations, the team ought to deliver a cadence of product increments with much earlier customer outcomes.

Before ordering features, I facilitated discussion of what “done” should mean - a definition team members could apply to every feature. Then we transcribed their features out of Excel onto 3x5 cards so we could relatively size them, snaking the story cards back and forth on a conference room table until we had them in size order, waiting to make a second pass to add “points” numbers. The sizing was crucial to backlog ordering, in which we stack-ranked the features so they were ordered not just by value but, taking size into account, ROI. The ordered backlog then supplied fodder for the team to plan sprints, each of which would deliver a product increment - the highest-value features we could complete in two weeks. With customer-focused plans each targeting just two weeks, the team executed.

A colocated team plus practices and mindset. Relatively easy transition to relatively agile. Dramatically better than what they’d done before.

Lots of teams weren’t colocated, of course, maybe even most, but most of the ones reaching out to me, early days, were colocated in one or several distributed locations.

Scrum for Distributed Teams


When product development was in several locations, I sometimes found myself on the road, while other times was teaching remotely. I was soon delivering the presentation parts of training using some of Zoom's predecessors - Skype, WebEx, Google Hangouts and Adobe Connect. Jira and a ton of other tools provided a facsimile of cards on a wall. Definitions of Done could be collaboratively composed in Google Docs or Confluence. Wikis worked reasonably well for capturing retrospective observations and learnings.

But a key part of agile backlog grooming relies on ordering by ROI, or “bang for the buck”, which in turn relies on relative sizing of stories to supply the “buck” — the “I” in “ROI” — the relative investment required. I wasn’t much impressed with Planning Poker - I’d much earlier learned a technique much more powerful — the Team Two-Pass Relative Sizing method that Steve Bockman devised - snaking.

Snaking is a two-step process: first the entire team sorts the stories by relative time and complexity on a conference table, resulting in a snake of 80 or 100 stories in ascending order by how long they’ll take relative to each other.


Here is what sizing looks like, when the team is in the room: 120 stories in a snake, smallest to largest
Agile two-pass sizing by a colocated team: in a typical case, it takes a team 3-4 hours to snake 80-150 stories (in this case 120), from smallest story to largest epic and add points

Then, after labeling the smallest story card a ‘1’, the team continues to label stories 1s until there is a card that is clearly no longer a 1 but twice that, so labels it a ‘2’; and so on.

Less than half a day. Simple, fast, collaborative, and powerful when the team is in person. Even most distributed teams were in person - they were essentially groupings of in-person teams distributed from each other. But some weren’t. Some were entirely remote.

As simple as card-sorting and card-labeling seems, I’d found no tool to support it for entirely remote teams. I had been looking for years. Could card wall tools suffice? Nope, they’d never considered my use case. (No, not even Trello. Not even close.) Google Draw? Not really. Spreadsheets? Not on your life. List tools? Hard, very hard, to swap card order. PowerPoint, maybe? Put each story on a slide and switch to Slide Sorter view? But PowerPoint begins with cards in a grid - very different from starting from a stack of 3x5s and putting one at a time onto the sorting “table” in the relative position it belongs.

And then, an entirely remote team

I’d stopped looking for a workable tool when, in 2015, a team in rural Maine asked if I could fly out to help their product team be more predictable. One problem - while headquarters was in rural Maine, the programmers were not. At least most of them weren’t. Turns out there aren’t a lot of .Net developers in rural Maine. There lay the problem. The programmers were scattered across the country.

I described relative sizing to my new client - creating a snake of cards on a conference room table. And they described this tool called RealtimeBoard (now renamed Miro) that they  were using for retrospectives - virtual stickies on a virtual whiteboard - that they thought might do the trick.

I was stoked.

Miro was the first tool I’d encountered that really let entirely remote teams accomplish relative sizing.

It was pretty easy to get started with Miro. Much as we’d transcribed feature names out of Excel onto a stack of 3x5 cards, now we were scribing them onto a stack of virtual cards in Miro. (A few months later, Miro and Jira had API integration, at which point Miro auto-generated a bevy of cards, each an instance of a ticket in Jira.)

Relative Sizing

To teach teams the sizing technique, I start with a warm-up exercise, asking students to size fruits. We start with 12 fruits. Agile stories typically have a “why” and the why for all 12 fruits is the same — “I want to eat some fruit” — it’s just the fruits that differ.

Fruit-sizing exercise — when teams are colocated

The exercise is for the team to put the 12 fruit cards in order: not based on how long to eat the fruit, but based on the effort required - the combined “cost” of preparing a serving of the fruit and cleaning up after eating it (much as, in software, we need to combine development and testing efforts). Twelve cards are a small enough number to get a quick first experience with snaking. I give teams five minutes to put them in order by effort, three more minutes to number them with the usual modified set of Fibonacci numbers.

It turned out that Miro was a pretty good tool for remote teams to snake the relative cost of 12 fruit.

Fruit-sizing exercise — 4 teams worked simultaneously, each on their own “sorting table”

The more complicated next exercise is a scrum-ified version of the XP Game, in which teams size then order a backlog of puzzle and game activity "stories" (for example, sorting cards or calculating a bunch of sums), then plan and execute short sprints, their goal to deliver the most customer value. Here, Miro was able to not only emulate conference-table sizing, but also a card wall from backlog to sprint plan to user-acceptance-test to done, as well as the activities themselves.

The Scrum Game —three teams working simultaneously, each with their own stories, sorting table and card wall

Finally, I facilitate workshops during which teams size stories from their own software projects. It’s common for a team to have fifty or eighty or a hundred or more stories in its project backlog. Provided we limit the number to a maximum of 150, we size them all.

The most recent Study of Product Team Performance - a survey of teams all over the world - revealed that higher performing teams tend to work from backlogs more than three months long. And those teams have sized not just the stories selected for the next iteration but all of their backlog’s features, epics and stories.

Comparing Miro with a real conference table

During snaking, when the team gets beyond a dozen stories and wants to insert a story somewhere in the middle, the difference between cards on a table and an online Scrum board becomes apparent. Making physical space on a table is something we learned to do as children. Whereas we have to learn the interface to leverage an online tool to move a bunch of cards at a time.

On a real-life table, we usually snake the cards back and forth. But I discovered that, on a virtual table, rather than snaking cards back and forth, it is easier to organize the cards in rows, one above the next, snaking from the end of each row of cards back to the beginning of the next row, each row arranged from smallest to largest, left to right.

Snaking a virtual team's stories is more easily done in rows

In my experience, where snaking is easier on a table, rows are easier in virtual space. If the team developing Miro ever delivers a feature to automatically insert a card into a matrix of rows of cards, it may make virtual sizing easier than real-space sizing! But for the moment, in-person - people proximity - the gold standard - still wins. But Miro is pretty good. It integrates with Jira. And I’m delighted to report that Miro is no longer alone in providing this functionality. A year ago I was engaged by a client already using a remarkably similar tool, Mural.

Delivering a Scrum experience

One of the things my clients love about my scrum trainings is that they're immersive. I run classes as agile projects. I put up a card wall with a backlog of relatively sized "learning stories" that have been ordered to always be delivering the next-highest-value learning. Much as we do with software project stories, my learning stories have relative story points. At the end of one-hour “sprints,” I update a Burn-Up Chart, yielding emergent velocity that predicts how many of the learning stories in our backlog we will likely complete by the end of class. This makes the class experiential — I’m not just teaching about scrum, but immersing my classes in it.

Setting all that up for real-world teaching is time-consuming. A few days beforehand, learning stories get handwritten onto scores of giant stickies for each class. Day of class, I arrive 45 minutes beforehand to transform a classroom wall into a backlog of learning stories poised to, one by one, march across the card wall, from “Backlog” through “In Progress” to “Done”. I prep a second wall with flipchart pages: one the burn-up chart, others blank to record students’ hopes and wants from the class, and later what they’ve learned and will take away.

While learning Miro and getting my first remote workshops set up was as tedious and slow as getting ready for my first classroom trainings years before, it provided a remarkable facsimile. Even better, it turned out downright handy for subsequent classes: once I'd set up my online Scrum board for the first class, I found I could save it off and reload the setup as the basis for subsequent classes.

Training scrum board of learning modules, each notated with a relative story-point size, as class begins
Scribing student hopes for their learning - and, hourly, updating a burn up chart - were similarly straightforward

Retrospectives, too

That first client five years ago that introduced me to Miro had been using it for Retrospectives. Bobbie Manson from Mingle Analytics had been leveraging Miro to run one of the best agile Retrospectives I had yet seen. (I emulate her approach when, at the end of my virtual classes, we retrospect on the training, both to help students cement their learnings and to get feedback on what I can improve.)

Retrospective by students, after three days’ agile training & workshops

Because I think it’s been the stumbling block for distributed teams, though, it’s snaking - the agile relative sizing practice, in which we use cards on a virtual whiteboard in place of cards on a table - that makes using Miro and Mural so expedient. Relative sizing, because it forms the backbone on which velocity and predictability are based, is one of the practices I see teams continue to heavily leverage long after class completes. And the tools' usefulness is significantly enhanced by their integration with Jira: epics and stories are easily exported from Jira into Miro or Mural; when the team determines relative points and writes them on cards on the Miro board, they are automatically updated through the API to the tickets in Jira.

As Steve Bockman, relative ordering’s creator, has noted, the ordering technique is equally useful for relative valuing:
  • Product owners snake stories from most-value-to-customers to least. 
  • Product organizations snake project opportunities from most contribution to company objectives to least. 
  • Tech leads and architects snake tech debt and other technical product backlog items from most urgent and highest risk to least. 
  • Engineering leaders snake engineering, infrastructure and debt projects the same.
Relative value divided by relative size yields ROI - return on investment - bang for the buck. It’s a useful guide to seeing what to do first and next and next after that.

Prior to the pandemic, I had trained teams spread across as many as a dozen different geographies as well as remote teams on other continents. Then, a combination of Skype and Miro had not only let me train teams in remote locations and scattered across geographies, but also enabled those distributed teams to continue to use agile’s powerful, collaborative techniques and practices after I was gone.

Having five years of experience with a remote collaborative tool gave me a major headstart to serving suddenly remote teams with the onset of the pandemic. A combination of high quality conferencing like Zoom and virtual whiteboarding like Miro and Mural provide a serviceable stand-in for physical cards, card walls, sizing, ordering, charts and all the rest - for every team.


Saturday, May 30, 2020

High Performance Teams Know Where They’re Headed
by Ron Lichty


Our 2019 Study of Product Team Performance, released this month, reveals that high performance teams tend to work from backlogs more than three months long. And those teams have sized all of their backlog’s features, epics and stories, not just the stories selected for the next iteration.

Each Study of Product Team Performance - this is the sixth study we’ve undertaken - surveys team members on thousands of product teams around the world, asking them to characterize their team’s performance - high performing, low performing, or something in between - and to share their experiences and approaches. Our data analyst then looks for and identifies correlations between practices and team performance.

This year’s study - based on our survey that wrapped up in December - identified six practices and characteristics that highly correlate with high performance teams:
  • awareness of and alignment with their company’s business strategy (only a quarter of teams!)
  • accountability to customer satisfaction targets (barely more than half)
  • innovation process maturity company-wide (less than 10% report their companies are mature)
  • product managers spend at least 30 percent of their time in the field (only 11 percent do)
  • using profitability as a criterion to prioritize requirements (less than two-fifths do)
  • sizing all of the stories in a requirements backlog that is sufficient in size to represent more than three months’ effort (less than a fifth do)

The question we asked in the survey about the latter:
Do individual contributors size all the stories or requirements in the backlog or just those that have been selected for the next iteration?


Reading the chart from the right, the answers revealed that almost 16% of teams don’t size their stories at all. Sadly, 17.9% have only an iteration or two of stories in their backlog at any time. Two-thirds have quarter-plus-length backlogs, but only 17% size that entire backlog up front.

What stunned us were the correlations:
    ▪    the 17% that had quarter-plus-length backlogs and sized the whole backlog correlated with the highest performing teams
    ▪    the teams with only a sprint or two of stories in their backlogs correlated with the lowest performing teams

To the low-performing teams, clearly product managers and product owners providing their teams with a bare minimum of stories are struggling. By barely staying ahead of their teams’ development capacity, product people aren’t providing enough stories to be able to ensure the team is focused on the highest value work, there’s no ROI-based stack-ranking to be had, they’re likely guessing with regard to what to do next, and stories are likely not well-formed with their acceptance criteria likely incomplete.

To the high-performing teams, we have no way of knowing that they’re using low-cost, no-waste estimating techniques like Steve Bockman’s relative sizing method (also known as snaking, laddering, and the team estimation game). In fact, sizing a quarter-plus backlog seems counter-intuitive given we all know that few teams deliver much more than half of a backlog that long, what with estimates being guesstimates and particularly the introduction of new work and adjustments to work incumbent from getting early feedback and iteratively delivering the highest-value increments of the product.

But given the correlation with high performance, it’s clear that the cost of sizing the whole backlog is offset by product managers being able to fold size impacts into their thinking. They’re able to avoid the waste of stack-ranking unreasonably costly stories at the top of the backlog. And given that relative sizing takes less than half a day and doubles as an exercise during chartering to familiarize the team with where product managers think development needs to be headed, sizing the entire backlog can be a low-cost entry point to high performance.

Take a look at all of our survey results - and at the six correlations with high performance - by getting a copy of the study itself. My own web page devoted to the Study of Product Team Performance has a pointer to the study just released - pointers to several earlier ones - and summaries of all the earlier studies.

There are also callouts to correlations we found in each previous year’s Study of Product Team Performance specific to software development performance, among them these practices and characteristics that correlate with high performance teams:
    ▪    definitions of done crafted by the team
    ▪    effective standups held daily
    ▪    standout team onboarding
    ▪    quality product management
    ▪    cross-functional collaboration and trust

Read more!

Monday, December 16, 2019

Better Standups
by Ron Lichty

What makes a standup effective?

We know, from the Study of Product Team Performance, that the highest performing teams hold effective standups daily. Not every other day, not every third day, not occasionally. Daily.

But what constitutes an effective standup?

I’ve stepped into scores of software development organizations in the last seven years, between taking interim VP Engineering roles, advising business and product leaders on team effectiveness, and training and coaching teams and executives in agile. I’ve definitely seen highly ineffective standups: ones with no sense of urgency, no 15-minutes-or-less timebox (or no enforced timebox), no focus. Those are the egregious problems.

But there are several fundamental steps to high effectiveness I see all too few teams taking.

These days, while most teams have a notion of “the three questions” - answers to which each team member shares with the team each day - few teams address those questions effectively. And almost none have a sense of what underlies those questions - what the standup is actually for: How are we doing? Are we on track to successfully deliver the plan we set out at the beginning of our sprint? If not, how can we adjust?

Let’s start with the three questions: When the three questions are answered perfunctorily - as I’m sorry to say I mostly see - the standup is not a re-planning meeting. It’s just a status meeting. I’m even more sorry to occasionally find it relegated from face-to-face to Slack. I get it, when little more is happening than sharing status - but these teams are losing so much opportunity for it to be so much more.

Of the three questions, the one that is almost always dealt with effectively is the one about impediments: every day, every standup, sharing anything that’s standing in my way - with follow-ups by any and all who can help. Calling out impediments daily and responding to them actively will speed our team and our delivery.

But the other two of the three questions are too often answered with “What did I do yesterday?” and “What will I do today?” That’s not what I want to hear. What I want to hear is “What I accomplished yesterday” and “What I’m going to accomplish today.” “Did” and “doing” lend themselves to answers like “I worked on the subsystem yesterday” and “I’m going to keep working on the subsystem today” - which are wholly uninformative. They give our team much less insight than if we answer “What part of the subsystem did I accomplish yesterday?” and “What part will I accomplish by tomorrow?” 

The power of basing the two questions on “accomplish” is twofold:

1) It signals my team when I’m in trouble. If yesterday I told the team I intended to accomplish yy part of xxx, and I didn’t, that’s a heads-up that I’m not on track for my part of the sprint plan (and if anyone else on the team has time, maybe they might want to offer to help me).

2) By telling my team what I intend to accomplish, I’m exercising one of the core principles of time management: If I tell myself what I’ll accomplish a short time from now, I’m more likely to; if I tell teammates, I’m even more likely to.

The point of the standup - and the point of the two questions - is to see how we’re doing against our plan and to re-plan if necessary. Software can be wildly unpredictable - if we’ve hit a rough patch, we want our teammates to know that the work is more than we anticipated, that we’re likely now overcommitted, that maybe we could use help, or perhaps we need to re-think how we’ve divvied up the work, or to re-plan what we can reasonably finish by end-of-sprint.

Standups too easily devolve to be status meetings. Just this nuance begins to bring opportunity for teamwork back into the standup routine - opportunity for all of us to together consider how to keep our sprint plan on track.

But even with improved wording, just doing a go-round of the three questions can be all too status-ish and too me-ish, and not enough about us, about our plan, about how we’re doing as a team. To bring team and teamwork into focus, for the past few years I’ve been coaching a standup-closing practice I learned from Cathy Simpson, who learned it from Kalpesh Shah at the scrum gathering 2015: Cathy and I both got excited realizing that just by leveraging a simple fist-to-five, we could shift the conversation from “me” to “us” while getting a daily sense of the team’s confidence in its plan.


If you’ve never used fist-to-five, it’s a quick way to get the sense of a group of people in response to a question. On a given signal, each team member raises a hand and votes their answer with a number of fingers. In this case, the question is “How confident are we that we’ll make our sprint plan?” Five fingers signals total confidence we’ll make the plan. Four fingers signals high confidence. A fist is zero confidence - basically, I don’t think we have any chance of making our plan; one or two fingers not much better. 

If, in response to the question, there are any votes other than 4s and 5s, my practice is to ask the team to discuss what it would take to get all votes to 4s and 5s. Often it’s just one or two developers who are struggling - is there someone who can come to their aid, someone who is on or even ahead of schedule with the work they’ve taken on who can help. 

On the other hand, we may face the situation where there’s no recovery possible. How useful to know that at this point, as opposed to discovering end-of-sprint that we didn’t finish one or more stories. Knowing earlier in the sprint lets our product owner be intentional about which story or stories we should put back in the backlog. It won’t be a random story that’s not finished at the end of the sprint, but one that has the least value for the sprint, and for which our product owner now has time to reset expectations with stakeholders that it will not be completed in this sprint. 

Once we’ve adjusted our plan to get it back on track, a fist-to-five should give us 4s and 5s. And we’re back on our way.

The nuances I’ve called out are the kinds of things we might call “hacks,” but they really ought to be referred to as making our standups more effective - getting standups back to their intended purpose.

If you look up the definition of “scrum”, you’ll find, in its rugby definition, that it’s a way “to restart the game.” No surprise, then, that the daily standup is itself sometimes called a daily scrum. It lets us re-start our “game” every day. But the standup only works as a restarting and replanning meeting when we’re all engaged together working as a team, focused together on making our software development hum.

Friday, June 29, 2018

Scaling Teams
by Ron Lichty


Ask any scrum coach about ideal team size and you’ll likely get the same answer: 7 plus or minus 2. That is, 5 to 9 team members doing the actual planning and work of the sprint: developers, testers, sometimes designer or writer or some other role in the 7 +/- 2, maximum-9-sized team (actually 11 with the scrum master and product owner).

So what do we do when we grow from maximum-9 to a 10th team member?

Splitting into two (or three!) teams seems fractious, siloing, so why do we want to cap teams at nine? Why would we split them?

First, let’s recognize that suggesting that the ideal team size is “7 +/- 2” is just plain wrong. Even the smallest of those is too many for “ideal.” By a lot.

The ideal team is much smaller. Software development is a team sport. Team sports are gated by collaboration and communication (the daily scrum: gee, a team sport, maybe we should all talk with each other once a day, huh?). So given communication is gating, what’s the ideal team size?

One. By that definition 1 person is the ideal team. When the team is a single person, all the communication is internal to a single brain - neuron-to-neuron.

But not much software these days gets built by teams of one! In fact, you can argue that one is not a team. My coauthor Mickey Mantle observes, from his dozens of years managing software development, that the number of programmers on an ideal team is 3-4.  “Assuming the teams are competent, a small team will usually outperform a larger team, hands down, and by a significant amount,” he notes. And former Medium VP Engineering Daniel Pupius, now co-founder & CEO of Range, protests that for team sports, diversity of perspectives is as vital as communication. “A sole genius isn't going to solve problems in the way a group can.” But to reduce the noise and friction while driving toward lossless communication, Daniel, like Mickey, would opt for teams of just 3 if 3 were enough to solve the problem at hand.


So agile’s “7 +/- 2” is a maximum “ideal” team size.

Where’d maximum-9 come from? Team theory. Again, team sports are gated by collaboration and communication, so think about the number of lines of communication required for various-sized teams: two people require only one line of communication; three people require three lines of communication; four people, six lines of communication; five people, ten lines of communication… Lines of communication are triangular. Somewhere around 8 or 9 or 10 people, and lines of communication have exceeded any and all likelihood that necessary communication will take place.

Mickey observes, “Rarely have I seen productive cross functional teams that number more than a dozen people.”

Daniel notes that his experience is more aligned “with 3 to 7, or 5 +/- 2 as ideal team size. And I think there are papers that suggest group dynamics shift at 8 people.”

So what do we do when we grow beyond our “maximum 7” or “maximum 8” or “maximum 9” team-size boundary – when we add one more team member?
There are several things that can be done:

1.    Keep the team intact
2.    Split the team
3.    Cell division
4.    Hybrid split

1.  Keep the team intact
First recognize that the “maximum of 9” is really just a guideline – it’s not a law!

In certain cases it may make more sense to simply add an additional team member. When this is done consciously, and recognizing the increased communication burden that an additional team member adds, you can take steps to make communication as effective as possible.

This may not be the best solution, but it is one to consider.

2.  Split the Team
How do we split into two (or three) teams?

There’s an (unfortunate) tendency to split by components. It’s a tendency because it seems to make sense to put like-minded, like-tooled, common-best-practices people together, and because it makes for a management model: We can have a team of database developers managed by a former database developer; a team of front-end developers managed by a former front-end developer; a team of business logic developers managed by a former business logic developer. In that way, each of those teams gets a manager, mentor and coach who understands them.

But our goal is not to deliver components. We were (almost every single one of us) hired to deliver customer functionality that delights users. Component groups cannot deliver features or epics or stories without multiple component teams working together to do so. The lines of communication within component teams are optimized for sharing best practices within the specialty; but the teams end up having fundamental dependencies on each other. The communications overhead - the high bandwidth communication required to deliver delight to customers - is between teams. Expensive. Ouch.

The most effective scaling models I’ve seen leverage cross-functional teams. Each cross-functional team has all the skillsets, from interface layer to business logic to data, all on the same team, to deliver customer functionality that delights users. While same-skilled folks are scattered across cross-functional teams, we still need managers who understand and can mentor and coach them, so we assign managers not to teams but to same-skilled folks.

I know several models that leaders and teams have found workable. All typically divide developers into cross-functional teams based on interface or functionality - most easily by how the interface splits functionality to present it to users.


Henrik Kniberg: Splitting teams based on how the interface exposes functionality.

Henrik Kniberg draws a picture of dividing up Spotify’s interface in just this way in his paper on scaling at Spotify.

Henrik Kniberg: Squads and Chapters

Notice in his organizational drawing that teams (which he calls squads) are vertical and don’t have managers; same-skilled folks (whom he organizes horizontally into “chapters” that span squads), on the other hand, are led by a same-skilled manager. So database developers are each assigned to teams, but all the database developers are also members of one of those chapters, and formally report to a database manager for mentoring, HR, best practices sharing, and assignment purposes.

3.  Cell Division
One approach to split a growing team is what my former Razorfish colleague Steve Gray calls cell division; as he describes the typical scenario, when a team has exceeded its effective size, a smaller area of functionality is identified that a smaller part of the team can be spawned off to address.

Former Medium VP Engineering Daniel Pupius notes, “I do feel the 9-12 person range is a really awkward size for an engineering team. I've had success with the "cell division" model, where instead of creating even splits at each point in time you peel off more stable islands, while a larger core group deals with a larger and less-defined surface area. In Medium's case it was a small team peeling off to focus on the needs of the professional publisher. That small team eventually grew to be 25 people and went through its own sequence of cell-division.

4.  Hybrid Split
When I was interim VP Engineering for a Portland company last fall, I inherited one of those larger teams and we invented a “team/subteams” hybrid model that both kept the larger team intact and split it into smaller ones. The team numbered half again more than the ideal max. Product management had identified three workstreams of customer functionality that needed addressing. The team divided into three sub-teams, each of which was (mostly) cross-functional and could deliver the three feature areas.

But it wasn’t clear that the three feature areas would be long-running streams of features that would support long-running, stable teams. Not only was stability at the larger team level, still, but team members were all working within a pretty monolithic code base. And large as it was, there was good sharing in a highly functional standup that the larger team held each morning.

So we kept the large-team standup for the sharing phase (the standup’s three questions, and identifying resolution discussions that needed to occur, particularly cross-subteam ones; this phase of the standup took the larger team of 15-18 typically 11-12 minutes). It was followed immediately by the (re)planning phase of the standup: a few minutes of (re)planning by sub-teams, each in front of its own physical card wall, in which each small team viewed its own backlog and discussed how it was doing on its sprint plan and who needed help, and moved cards across its board. The approach maintained the camaraderie and transparency of the larger team, while accomplishing (re)planning work in smaller teams.

The Take-Away
Given software development is a team sport, and team sports are gated by communication, we should all be constantly observing how our teams are communicating, and we should expect that we’ll need to evolve our team structures as we add (or subtract) people.

Have you seen effective models for splitting teams other than those I called out?

The models I’ve called out subdivide into cross-functional teams that take on lifetime ownership of functions users want to accomplish. By so doing, we give teams end-to-end ability to deliver those functions and avoid dependencies, handoffs and high-bandwidth communication overhead that is characteristic of dividing into component teams.

Regardless of approach, remember that it is communication that prevents siloing. Keeping the larger team intact while breaking out sub-teams - the hybrid model - is one mechanism that worked for one team. It will likely work for them for a while, until they grow their team too large for that model, too. At that point keeping communication flowing becomes a challenge both to product management (translating vision to features to stories) and to engineering management (translating customer wishes to design approach and architecture). Daniel Pupius notes, “The next super awkward phase hits around 30 persons.”

(Many thanks to Daniel Pupius, Rich Mironov, Steve Gray and Mickey Mantle for their insights and thoughts on this stuff!)