The Software Development Performance Index (SDPI) framework codifies a balanced set of outcome measures that, when used within CA Agile Central® Unlimited Edition, can give you feedback on your own teams and organization. This document explains the SDPI and how these metrics are calculated. To learn more, visit www.rallydev.com.
This topic includes the following:
Definitions for Calculations
These definitions describe how metrics are calculated.
Time Buckets
Each metric is calculated for a particular time bucket. The summary SDPI charts are most commonly shown in quarters. The drilldown charts are most commonly shown in months.
Difference Between Real Teams and Projects
The project entity in CA Agile Central is the team container but its hierarchical nature means that some projects represent other organizational entities (metateams, divisions, departments, and so on). Some may even represent projects. To determine which project entities are actually teams, we use a Bayesian classifier that looks at how much work is contained in the project, how close to the leaf of the hierarchy it sits, and a number of other characteristics.
Team Size
We heuristically extract team membership by looking at who is working on what items and who is the owner of those work items. We then determine what fraction of the time each person is working on each team. The team size is the sum of these fractions.
Percentile Scoring
The units for each raw metric are different. For some metrics higher is better whereas lower is better for others. To make it easier to interpret the metric and enable the aggregation of dissimilar units into a single index, raw metrics are converted into a percentile score across the entire distribution of all similar metrics. Higher is always better for percentiles.
Calculating the Index
The SDPI is made up of several dimensions. Each raw metric is percentile scored and one or more of those are averaged to make up a particular dimension (for example, the quality dimension is the percentile score of defect density for defects found in production averaged with the percentile score of defect density for defects found in test). To calculate the overall SDPI, we take the average of the contributing dimensions' scores. If there are four dimensions, then the maximum contribution of any one will be 25 to this final SDPI score.
Responsiveness Score From Time in Process (TiP)
Time in Process (TiP) is the amount of time (in fractional days) that a work item spends in a particular state. Weekends, holidays, nonwork hours are not counted. We take the median TiP of all the work items that completed in a particular time bucket (say January, 2013) and record that as the TiP for that time bucket. While other parameters are possible, we primarily look at the TiP of User Stories and we define In Process as ScheduleState equals In Progress or Completed.
Quality Score From Defect Density
Defect density is the count of defects divided by man days, where man days is team size times the number of workdays in that time bucket. This results in a metric that represents the number of defects per team member per workday.
CA Agile Central looks at both the defects found in production as well as those found in test and other areas as indicated by the Environment field in CA Agile Central. We sense whether defects are typically being recorded in CA Agile Central for each of these types for each team over a time period and only use it if it passes this test. We will take either as the quality score or the average of the two if both are reliably recorded.
Productivity Score From Throughput and Team Size
Throughput is the count of user stories, defects, and features completed in a given time period. The productivity score is the percentile scoring of this throughput normalized by the team size. While defects and features are shown in the drilldown charts, currently only user stories contribute to the productivity score of builtin scorecards.
Predictability Score From Throughput Variability
Throughput variability is the standard deviation of throughput for a given team over three monthly periods divided by the average of the throughput for those same three months. This is referred to as the Coefficient of Variation (CoV) of throughput. Only user stories are considered for this predictability score.
Decision Versus Outcome Measurements
The measurements below are generally targeted at characterizing a decision or an outcome. An organization either decides to split people across many projects or they dedicate them to one. The Percent Dedicated Work measurement extracts this decision. Defect density is an example of an outcome measurement.
Although not strictly accurate, they can be thought of as input and output variables in a correlation analysis.
Scores
Raw outcome measures are translated into a score so they can be easily interpreted as indicators of performance. Measures closer to 100 are good, measures closer to 0 are bad. The raw measure and the score are both available for analysis.
Timebox Granularity
Unless otherwise specified, each metric specified below is calculated for each of the following timeboxes:
 Month
 Quarter (Calendar)
 3month (sliding)
 6month (sliding)
 12month (sliding)
 Iteration (coming soon)
The sliding window measurements are useful when trying to identify a correlation where the impact of a decision measurement for a given month might correlate with the outcome measurement over the course of several following months. For instance, fieldreported defects will come in over time. So, logically, we would expect a change in this measurement to be evident for several months after the impacting decision. The empirical evidence supports this trailing effect because baddecision metrics (nondedicatedness) correlate best with the sixmonth, trailingdefect density metric.
Snapshots and the Temporal Data Model
We do not directly measure things like Percent Dedicated Work. It and the other measurements specified in this document are built from snapshots of changes representing transactions of users working with artifacts in their project management, source code management, build, or bug tracking systems. A detailed discussion of this data model including its data structures, constraints, and operations can be found here. Many of the details of calculating these metrics cannot be understood without at least a basic understanding of this underlying snapshot data structure and temporal data model.
Real Teams
In addition to being associated with a timebox, every measurement in the data set is also associated with a team. Our data set does not have a strict definition of a team. Rather, it includes the concept of a team or project hierarchy, where higherlevel entries might represent divisions or teams of teams and lowerlevel entries represent the team itself. It is also common for a team to break their work down into project streams. This is a typical team or project tree:
 Division ABC
 Team A
 Team A – project 1
 Team A – project 2
 Team B
 Team C
 Team D
 Division XYZ
Since the data is nonattributable and large (25,000 projects), we have no way of determining which entries in this tree represents a real team. Instead, we heuristically extract this using a Bayesian classifier. The features that the classifier keys off of include:
 The number of levels from the leaf nodes of the current branch of the project tree. Real teams tend to be at the leaf nodes, which is 0, or one level up, which is 1.
 The number of work items in progress in the node.
 The fulltime equivalent value for the node. Real teams tend to have between 5 – 8 members, and outside of this range, the probability of being a real team decreases.
Measurements
Percent Dedicated Work
This measurement indicates how much of the work for a given team is done by people dedicated to that team.
Type: Decision
Formula
 Find all transactions (snapshots) for stories, defects, and tasks that are in progress, have no children, have an owner (user), and are not blocked.
 Sum all transactions by user U _{total }, project P _{total }, and user contribution to a project U_{project} where U_{total}> 5 (for example, users with a total trans action count less than or equal to five are not counted towards U _{project }or P _{total}).
 Find the percent of a user’s total work each project represents:
 Count as dedicated for a given project the users whose U percent is greater than 70% for that project. This threshold was determined by experimentation with a training set of data from teams with known, dedicated members.
 For each project sum the dedicated user transactions: P _{dedicated }= Σ U _{project }, for all dedicated members.
 Find the percent of dedicated work for each project:
Data Cleaning
The transactions of any user with five or less transactions in a given timebox or project pair are ignored when calculating U _{project} or P _{total}. This omits data from people that are not true team members (managers, administrators, and so on).
FullTime Equivalent
This measurement is an indicator of team size including contributions from parttime contributors to the team.
Type: Decision
Formula
 Find all transactions (snapshots) for stories, defects, and tasks that are in progress, have no children, have an owner (user), and are not blocked.
 Sum all transactions by user U _{total}, project P _{total }, and user contribution to a project U_{project} where U_{total }> 5 (for example, users with a total transaction count less than or equal to five are not counted towards U _{project} or P _{total}). If the revision editor and owner are different people, the edit is associated to both people.
 Find the fraction of a user’s total work each project represents:
 Sum the fulltime equivalent for each project: P _{fte }= Σ U _{fte }.
Team Stability
This is an indication of the team's stability. For example, given:
Month n:
George: 90% dedicated
Joe: 50%
Jen: 80%
Month n + 1:
George: 75% (15% delta)
Jen: 100% (+20%)
Jeff: 25% (new) (+25%)
Joe: missing (50%)
The TeamGrowth metric for the team would be .2 + .25 = .45 divided by the current team size (2) or 22.5%.
The TeamShrinkage metric for the team would be .15 + .5 = .65 divided by the old team size (2.2) or 29.54%.
The total volatility would be the sum of the two prior metrics or roughly 52% and Team Stability would be 100  52/2 = 74.
Type: Decision
Formula
 Find all transactions (snapshots) for stories, defects, and tasks that are in progress, have no children, have an owner (user), and are not blocked.
 Sum all transactions by user U _{total}, project P _{total}, and user contribution to a project U_{project} where U_{total}> 5 (for example, users with a total transaction count less than or equal to 5 are not counted towards U _{project} or P _{total}).
 Find the fraction of a user’s total work each project represents for all time periods:
 Sum the fulltime equivalent for each project for all time periods: P _{fte} = Σ U _{fte}.
 For each project and each pair of adjacent time periods (t and t − 1 ) compute:
 Team growth by
 Team shrinkage by
 Team stability by
Process Type
This measurement is an indicator of what type of agile process a team is using.
Type: Decision
Formula
 Find all snapshots for stories whose ScheduleState >= InProgress and have no children.
 Sum the total number of unique stories S_{total} for each project in each time period.
 Sum the total number of unique stories that have a nonnull field S_{field}for each project in each time period where field is each of c_KanbanState, Iteration, TaskActualTotal, TaskRemainingTotal, TaskEstimateTotal, and PlanEstimate.
 For each project in each time period, divide the sum for each field by the total number of unique stories and multiply by 100 to get the percent of stories with the field:
 After calculating the percent of stories with each field, the project is assigned a value for process type T _{process}as specified in the following table:
T_{process} 
if... 

Kanban, ScrumBan 
P_{kanbanState }≥ 90 ⋀ P_{iterations }≥ 90 
Kanban, No Iterations 
P_{kanbanState }≥ 90 ⋀ P_{iterations} < 90 
Iterative, Scrum, Full 
P_{kanbanState }< 90 ⋀ P_{iterations }≥ 90 ⋀
P_{planEstimate}≥ 50 ⋀ P_{taskEstimateTotal }≥ 50

Iterative, Scrum, Story points only 
P_{kanbanState }< 90 ⋀ P_{iterations }≥ 90 ⋀
P_{planEstimate}≥ 50 ⋀ P_{taskEstimateTotal }< 50

Iterative, Scrum, Tasks only 
P_{kanbanState }< 90 ⋀ P_{iterations }≥ 90 ⋀
P_{planEstimate}< 50 ⋀ P_{taskEstimateTotal }≥ 50

Iterative, Other 
P_{kanbanState }< 90 ⋀ P_{iterations }≥ 90 ⋀
P_{planEstimate}< 50 ⋀ P_{taskEstimateTotal }< 50

Other, Estimates 
P_{kanbanState }< 90 ⋀ P_{iterations }< 90 ⋀
(P _{planEstimate }≥ 50 ⋁ P_{taskEstimateTotal }≥ 50)

Other, No estimates< 
P_{kanbanState }< 90 ⋀ P_{iterations }< 90 ⋀
P_{planEstimate}< 50 ⋀ P_{taskEstimateTotal }< 50

Time in Process (TiP) and Responsiveness Score
Time in process (TiP) is a measure for an individual work item (story, defect, feature) indicating how much workday time (excluding nonwork hours, weekends, and holidays) it spent in process. For stories and defects, in process is defined by the ScheduleState field being either InProgress or Completed (often means InTest). For features, in process is when ActualStartDate is set and PercentDoneByStoryCount is less than 100%.
Although not calculated exactly the same, it is analogous to the common definition of cycletime or leadtime. For a given project or timebox pair, an aggregation (median, or p50) of the TiP of the work items that completed during that timebox for that project is computed. The responsiveness score is based on the percentile of the median. Higher values will result in lower scores, and vice versa.
The median (p50) is used rather than the arithmetic mean as the aggregation because the distribution of TiP measurements for individual work items is far from normal and frequently includes outliers. Median deals well with the nonnormal distribution and does not allow a single outlier to greatly impact the measurement like an arithmetic mean would. The data set also includes p75, p85, p95, p99 representing the 75th, 85th, 95th, and 99th percentile coverage levels for the set of completed work items but we currently only use the p50 (median) to calculate the score.
Type: Outcome
Variations:
Stories, defects, and features
Formula
 Find all stories, defects, and features that were InProgress, then moved to Completed within the time frame under consideration.
 Stories and defects are considered completed when ScheduleState ≥ Accepted.
 Features are considered completed when PercentDoneByStoryCount → 100% .
 Calculate a TiP value for each of those stories, defects, and features.
 Story and defect TiP is the duration where In Progress ≤ ScheduleState <Accepted.
 Feature TiP is the duration between ActualStartDate and when PercentDoneByStoryCount → 100%.
 The responsiveness score is the percentile rank of the p50 TiP value for stories.
Defect Density and Quality Score
Defect density is the count of defects over some normalizing size measurement. In our case we use the team's mandays (FTE * the number of working days in the period) as a proxy for size.
Type: Outcome
Variations:
All defects (Defect) or just defects found in production (ReleasedDefect)
Formula
 Count all defects D_{all}and defects released to production D_{released}for each project.
 Calculate defect density E for each project by:
where P_{fte }is the project’s fulltime equivalent and W is the number of working days in the time period under consideration.
 For each project, determine if either defects or released defects are being tracked by checking if the defect count is greater than zero for the year granularity that ends at the same time as the granularity under consideration. For example, if the granularity is a quarter ending on 20130101, we check the full year ending on 20130101 to see if the defect count for the year is nonzero.
 Compute defects per 1000 man days by:
S_{all}= 1000 ∙ E_{all}
S_{released}= 1000 ∙ E_{released}
 For each project where defect data is tracked, compute the quality score. Defect density is scored based on percentiles. If a project has the highestmeasured value for defect density, it is in the 99th percentile, therefore its score is 99 − 99 = 0. If a project has the lowest measured value for defect density, it is in the 0th percentile, therefore its score is 99 − 0 = 99 .
Q_{all}= percentile(S_{all})
Q_{released}= percentile(S_{released})
 The total quality score is the quality score for all defects: Q_{total}= Q_{all}. Projects not tracking defects will have no quality score.
Throughput and Productivity Score
Throughput is a measure of how much work is completed in a given time period. Within a single team, throughput can be compared over time. However, the size of a work item can vary greatly by context so it is difficult to compare this across teams. It can also be compared across teams when the size of a work item is controlled. For instance, some organizations will require that each story should be between 0.5 and 3 man days of work. We do not know this information, so when calculating the score we look at number of completed stories normalized by the team size (FTE). Throughput per team member is scored based on percentiles. Higher values result in higher scores, and vice versa.
Type: Outcome
Variations:
 Defects, stories, or features
 Counts or story points: The formula below describes the computation by counts of these items. However, we also compute throughput (or velocity) for stories and defects using the sum of the story points of all work items that make the appropriate transition. We do not yet have a good mechanism to identify which teams consistently use story points so the counts are the preferred variation at this time. The development of iterationbased measures is underway and includes research to explore better use of story points.
Formula
 For each project, compute throughput T as the sum of:
 The count of all stories and defects that transitioned forward into the accepted state minus the count of all stories that transitioned backwards out of the accepted state.
 The count of all features that transitioned forward to 100% complete by story count minus the count of all features that were 100% complete by story count but transitioned backward into < 100% complete by story count.
 Compute the throughput per team member by dividing throughput by fulltime equivalent:
 Score T _{fte }based on its percentile. If a project has the highest measured value for T_{fte}, it is in the 99th percentile, and 99 is its score. If a project has the lowest measured value for T _{fte}, it is in the 0th percentile, and 0 is its score.
Throughput Variation and Predictability Score
Having a stable throughput can be as important as having a high throughput. The coefficient of variation of throughput across several time periods is calculated and translated into a score.
Formula
 For each project, compute throughput for each month T_{i}as the count of all stories that transitioned forward into the accepted state minus the count of all stories that transitioned backwards out of the accepted state.
 For each group of three and six adjacent months T, compute the:
 average avg(T )
 standard deviation std(T )
 coefficient of variation
 Score CoV based on its percentile. If a project has the highestmeasured value for CoV , it is in the 99th percentile, therefore its score is 99 − 99 = 0. If a project has the lowest measured value for CoV , it is in the 0th percentile, therefore its score is 99 − 0 = 99.