Video details

Improving Performance of Agile Teams using Statistical Analysis by Naresh & Sriram

Agile
10.09.2022
English

Large-scale software development generates a lot of data when teams use tools like Jira or Azure Devops. We could analyze the data with statistical tools and techniques in order to understand how to improve delivery performance. Naresh and Sriram have been attempting this at one of their clients. In this talk, they’ll share the techniques they used and the lessons they have learnt so far.
More details: https://confengine.com/conferences/agile-india-2022/proposal/17462
Conference Link: https://2022.agileindia.org

Transcript

Welcome everyone, to today's Talk, which is about data informed transformation improving performance of Agile team using static analysis by Narrate and Trina without further delay. Over to you, narrate and Trina. Great to see all the familiar names. Unfortunately, we can see your faces, but we can see the names. And good to see all of you. Thanks for joining in. This is going to be a 60 minutes session, so Sharam and I have been partnering at a client and some of this will be based on that. And some of this is based on the previous experiences we've had. So we'll kind of try and hit all the key lessons so that it is useful for folks in the journey that we've gone through. And it's still an ongoing journey. We've not arrived at the destination, and I don't think there is a thing called destination in this evolving world. Let me quickly share my slides and get started here. All right. We also want to thank Rakesh, who I do see in the audience, but Rakesh has been instrumental part of helping this work. So it is kind of a joint presentation between Sriram Rakish and myself. Sriram, you want to say a couple of words about yourself? Sure. Am I audible? Yeah. Okay. Yeah. Hello everybody. For the last one year or so, I have been an independent consultant. But prior to that, I was with Thoughtworld for a long time. And when I quit Thoughtworlds, I was a VP of Transformation Advisory, advising clients on improving the performance of their organizations in the context of digital transformation, but along the topics of moving from projects to products, organization design, metrics and those kind of things. So there's quite a bit of variety in the kind of consulting I do. And this particular engagement, I've been working with Narration Rakesh, for the last several months, it has gone through, it has evolved, the work has evolved. And yes, here the context is more around software delivery, like a data informed look at software delivery and trying to extract patterns out of that data and then use that to inform the future of the transformation efforts. So I've enjoyed this journey so far and I'm excited to be sharing our experience so far with all of you. Thank you. Absolutely. All right, thanks Sriran, for that introduction. Let's dive in straight into the topic. We've broken this broadly into three sections. The first one, we've had some background in the context of what we're trying to do. Then we'll look at some analysis that was actually done and what were the challenges we faced. And then finally we'll get into some lessons learned. That when we applied actual analysis, what did we learn from it, and maybe some of the lessons that other folks can apply in their own context. So generally, when we talk about improving performance of organizations, specifically from a software delivery perspective, CXOs would typically talk about, hey, I want faster delivery I want more reliable delivery, I want frictionless delivery which means I want things to seamlessly flow through in the organization and not keep getting stuck and things like that. And so when we look at the common performance improvements objectives they may be like these three at the top level for a lot of organizations and we kind of in this presentation, we'll break it down a little bit and kind of get into a few more next levels inside this. And how do we then use data to make informed decisions around improving or introducing interventions as part of the transformation? Again, just a disclaimer here that this is for folks involved in the large scales of a delivery. I think in our case we are looking at upwards of 40,000 engineers working on things and so it's a fairly large scale software delivery effort here and I think Sriram had a talk earlier today where he talked about basically the impact on business outcomes and things like that. So in this particular talk we've intentionally not going to focus on the business outcome side of things, but purely on the software engineering and delivery side of things, which also I think is a very important area that needs to be dealt into. So if you missed three rounds earlier talk, it will be recorded available, you can have a look at that. With that I think just quickly jumping in, most of you might be either driving some kind of transformation or involved in some kind of a transformation. That is why I'm hoping you're part of this session or you've joined the session and the typical kind of interventions that 1 may do when you go into an organization or you're trying to basically transform a team, you might look at introducing certain practices like Scrum of Scrum, Pi, planning, several other technical practices listed over here. So you have like a menu full of practices at your disposal that you could use as ways to influence teams and kind of try and introduce some interventions in the ways of working that they have. The question that we asked ourselves is that how do we know what teams could benefit from these kinds of interventions? Right? How do we kind of figure that out? Usually we just expect that everyone will adopt everything and we will assign some kind of fluency or some kind of maturity rating to the teams and basically get them to check off the boxes, go through series of training etc etc etc trying. And that unfortunately is the state of transformation in a lot of places and I think that's a little bit of a disservice in my opinion because if we don't really have data to back what we are doing and we are not using that as a way to drive certain things. We could just be very prescriptive about it and we could have this attitude of one size fits all. But we all know that the very essence of Agile is that one size does not fit all. Each team has its context and things have to be as per their specific context. And I think we've heard this from Andy Stock yesterday as well. And so what happens is typically people start off and then at some point the CXO will wake up and they will ask hey, we originally started with faster delivery or more reliable delivery or fiction less has this transformation helped us achieve this? And people feel like they are caught off guard when such questions are asked. Or maybe even people question how do you even quantify things like this? Because it's a transformation and so forth, right? So we want to kind of deep dive a little bit into it in terms of how teams could actually approach this and how they could show to the CXOs that the transformation are in fact helping achieve the objectives that they are trying to drive. From a software delivery perspective, of course you require analysis and you need to deep dive to be able to do this. But before we jump in, let me kind of quickly step back a little bit and just look at what I would say is the overall from idea to cash or idea to go live, right? If you look at the overall thing, that's kind of what we call as the lead time and basically from idea to cash, various stages which would have some activity, some work centers and then some wait stages in between. Right? So typically you could visualize this but the one that we are specifically interested in this particular talk is this little blue or whatever this color is development box. So let's zoom in into that a little bit and basically see what that is. But also I know a lot of you might be thinking hey, Agile software development is not like a linear waterfall kind of a model. It should be iterative. These are all intermixed and there are feedback loops going back and forth which is absolutely right and that's how it should be. But anyone who's worked in large software delivery organizations will realize that unfortunately it effectively ends up being linear, not necessarily as iterative as you would like. Right? So now if we kind of zoom in a little bit and then we'll define a couple of more terminologies here just so we are all on the same page. If you zoom into the development thing you will see that there is some amount of discovery that needs to happen. Then solutioning happens, then you do your planning, then the actual development and end sprint automation and then you would have things like integration testing and other kinds of chaos testing and reliability testing and finally you have to wait to get into production, right? So that's the area that we're going to double click today. And so this entire cycle is what we would call as or anybody would call as the delivery lead time from the discovery to actually up to going live. And even within that, if you further zoom in, it's basically once the dev teams are done, once the developers are done, till the time it actually goes live, is what we call as a change lead time. Again, for folks who read Dora reports, will be familiar with this term CLT or change lead time. And that's kind of the area we want to further zoom in today and look at it. But the question we have is have these times improved by our interventions? And if they are improved, then can we quantify by how much has those improvements happen? That's the kind of question that we want to answer and be able to answer to the CXOs. So they understand that the investment they are making in terms of the transformation is actually a fruitful reasonable ROI for them. So let's first look at the question number one, right? Has these times improved and by how much? To be able to answer this question, we have two prerequisites. The first prerequisite is basically we need to be able to establish a baseline or some kind of historical data against which you would be able to compare and say whether it's improved or not, and if it's improved by how much it is improved. And also when we are doing this, we should be aware that we need to do a like to like comparison or like for like comparison, right? What do I mean by that is, let's take CLD, for example, the change lead time that we were talking, if you had a certain feature and you could measure the CLT for that, then you could say, hey, the CLT is dependent on the size of the feature, right? Because different sizes, features may require different amount of CLT. It may depend on what kind of release train or release bundle it's actually getting shipped out, how many bugs were found, how much the testing time went in, and once bugs were found, how much effort went in from the dev point of view. So you might say, hey, here are few of the factors that influence the CLT. And now I want to compare two features maybe three months apart, right? So if you just compare the two features as is, without looking at some of these factors in terms of size and in terms of the bugs and so forth, then it may not be like for like comparison, right? So you probably need to normalize the CLT. Again, I've just given a simple formula here, but it could be something more complicated. But the point here that we are trying to make is these are the prerequisite. These are the things one needs to think about while trying to answer the first question, which is have these the time, the CLT times and the delivery lead times, have they actually improved? You should be able to establish a baseline or you need to establish a baseline and you need to have a like for like comparison. I'll jump ahead a little bit and I will talk about the second question which is I think even more interesting as you get into it. How much of the reduction is due to the intervention that we are doing? Right? You might be introducing a set of practices and you might want to say okay, by introducing these practices, how much have I reduced the change lead time? And sometimes that number does not move as quickly as you would like. It sometimes takes time. So you might say okay, let me kind of decompose that a little bit and try and understand what all actually does entail for a CLP to be a certain number. And then you might say hey, the amount of time it takes for integration testing to happen is one factor that goes in. The amount of time it takes to fix the bugs, the amount of time it takes for actually deploying your changes and the amount of waiting in all this process. This is where the friction piece that we were talking earlier comes in. So you might say hey, I need to be able to look at all of these areas to understand how the CLT is getting impacted. Now, as you start breaking this down, one of the things you will realize is that these guys are then further dependent on few more attributes or a few more factors and you slowly start seeing kind of a contribution tree building out from here, which is basically change lead is dependent on some things and those things are then further dependent on. So if you take the integration testing time, one of the factors that will influence integration testing time is the size of the feature, the extent of automation, test automation that you have currently, how available and productive are the testers to work on that particular features, testing and so forth. Again, this is not an exhaustive list. In fact we have a much larger list of things that can influence and the contribution tree, this is just a subset of it is presented here just so you understand the concept, right? And then when you start looking at each of these you could think of like hey, I can introduce, I can do certain things to improve these things, right? So you could say hey, I need to have an initiative to shift left things or I need to invest more in terms of test automation. I need to hire and upskill people, I need to reduce my batch size and so forth or invest in better test environments, fully automated Ephemeral environments, et cetera, et cetera. So you could start saying okay, I can further introduce these kinds of concepts into the organization to improve some of these factors that influence the CLT. And you could then think of specific practices or techniques that could help with this, right? Like for example, if feature size is a factor, we all know that we would like to introduce feature slicing from a shift like point of view, of course continuous integration is important, but you might also want to do contract testing. You might want to do other kinds of things to reduce the batch size. You may also want to introduce practices like independent deployment. I think Nelation, I spoke earlier today about how we're using feature hub as ways to influence independent deployment and so forth, right? So you can then add the last layer of this contribution tree in terms of several practices that you could introduce to influence this, right? So far, I hope everyone is with me in terms of how you start thinking about any kind of a metric and breaking it down into its contribution tree. Then of course the question is how much of improvement is due to our intervention? And perhaps more importantly, it's not. While understanding the historical data and understanding what has been influencing what is the relationship between these things and for this particular organization and for this particular team, what factors play more weightage, have a more weightage than something else would be important to understand, but I would argue that it would be even more important. The whole reason you're trying to do this is you want to figure out in future where should you invest, where should you focus, right? What is going to give you the biggest bang for the buck. So you want to know what should be your focus areas in the future. Based on this data and based on this analysis, this is kind of where at least we were headed and we were trying to figure out these questions. And unfortunately, there's no silver bullet answer here. There is no simple follow the book kind of a recipe, unfortunately in this space. So this is where I think we turned to statistical measures and statistical methods. And I'd request now Sriram to quickly take over and walk us through this part of the journey. Thank you. Nourish, let me just bring up my screen and turn off the floating controls. Is this visible? Nourish? Yes. All good. All good. Okay. So, yes, what we saw is that there are multiple factors that contribute to the metrics that matter at the top of the tree. And because there are multiple factors, there is potential to use some statistics to understand the contribution of those factors. Of course, that requires data. And most likely, if you are a large scale software setup, most likely you're using something like jira your DevOps or something like that. And so that becomes the source of data. And you know, when you're doing statistical analysis, you need enough data points, right? So per team or per portfolio, if you're trying to do this, you need at least, I would say at least 100 data points. Let's say you have that data or you can obtain that data. Because I think this kind of data is reasonable to expect that you can obtain it from your systems, right? Like how long did it take after it was marked as development complete? From that time, how long did it take to go live? What was the feature that you basically total the point of all the stories in that feature? In whichever bundle it was released, what was the bundle size of that? So if one feature was released independently then that's a bundle size of one. If multiple were released together in one bundle then you'll have a bundle size greater than one. How many bugs were reported after the sprint, after the development complete stage? So during integration testing or other kinds of testing, the later stages of testing and how long did it take, how many testing days were required to perform that kind of testing and how many developer days were required to fix the bugs that were reported? Right? Again, this might not be readily available from your systems, but with some amount of custom reporting and data manipulation you should be able to arrive at this picture. So once you have this picture, what's next? Then we can run a statistical analysis in particular that is called as a multiple regression analysis. Now, unfortunately, given the nature of this talk, we expect you to have some knowledge of statistics. We don't have the time here to explain what all these analysis mean. We're not like a statistics tutorial. So I'm hoping that at least some of you will have some familiarity with this kind of analysis. So basically what we do over there is we identify what is the dependent variable which is CLD variable that is dependent on one or more of these factors, right? We don't know exactly how they are dependent in the case of our teams, but logically, using our experience and our knowledge of software delivery, we know that the time it takes from development to go live would depend on the size of the feature, the size of the bundle, the number of bugs that were found and the effort it took to test and develop. There is some relationships between these variables itself and we will come to that later. As long as the relationship is not too strong we can still model them as independent variables. And so before we get to the result of the analysis, we need to ensure what do we need to ensure to do a proper analysis? One is we have to normalize all the variables, all the data ages. They have to be normalized within the same range so that when you get the output of the regression the coefficients are comparable and we have to make sure that there is no multi collinearity between the independent variables that we are modeling. That is the sort of preparation stage once you run the analysis to validate that before we can interpret the results we have to validate that it is meaningful and for that we use two measures, primarily the p values and the adjusted r square in case of multiple aggression. It's also referred to as statistical significance and explanatory power. That is one part. Secondly, we also run some predictions and before that we actually split our data set into our training data set and a test data set and we kind of developed the model on the training data set and we also run it in prediction mode on the training data set and we observe what the prediction errors are. Then we run the prediction afresh on the test data set and again we observe what the errors are and we make sure that the errors are small and similar in both the cases, right? So if you do all this, you can be reasonably sure that the analysis is on firm ground, that it is now worth interpreting. So when you do all this, you might sometimes find that if the statistical significance itself is not there, the p values itself are not great, then of course then it kind of throws the whole analysis into question. But sometimes, because we are using our expertise, we are not just trying to correlate random variables, right? We know that these variables influence CLT. So it might often happen that you get good scores for statistical significance. But our models might have poor explanatory power as in the adjusted our square values might not be great. So something like 85 85 is considered good, right? But you might have much lower than that. And if that happens, that indicates that maybe there are other factors that influence cldE which we have not taken into account. So a typical example of these other factors is wait time. In most organizations the way you set up your jira or your DevOps does not allow you to measure wait time does not allow you to figure out what the wait time was. And therefore, unless you make changes to your workflows, wait time is sometimes not available. And although it might be a significant factor that influences the CLD, another thing is so far we are assuming all features are more or less the same except for their size and number of bugs and so on. But it could be that the different domains, features in one domain take much longer to release than features in another domain. That might be the case. We are factoring the development time, tech time spent on a feature, but all developers and testers are not the same. You might have inexperienced developers and experienced developers and testers and so on, skill developers, less skilled developers and so on. So we are not really factoring their competence into this model because the data, again, the data is not readily available. But if you come up with a poor result with poor exponential power, then you might want to consider rerunning the analysis after incorporating these additional variables into the model. So let's say we did all that we got a meaningful result. And let's say usually multiple regression will throw up, multiple linear regression, we'll throw up results like this where it will say that CRT is actually a missed the intercept here. So there will be some constant plus a few variables minus a few variables. Right. So plus means like as feature size goes up, change lead time goes up, as bundle size goes up, changing lead time goes up and so on. And minus means as the number of testers increases or the amount of testing effort increases, salt goes down or as the number of developers increase. Actually, I've made a mistake in this thing. This is not test days. If it's test days, it's a plus. But it should actually be number of testers. Think of this as number of tests. If you have more testers then the CFT will go down. Right. So that is roughly how you interpret the positive and negative signs. And then the coefficients, because you've normalized the data ranges to begin with, the coefficients are now comparable. And what this equation says is that for the data ranges in the analysis, the highest coefficients, numerically highest, ignore the sign, but the highest coefficients have the greatest influence on the CRT. So in this case, the three greatest influencers are bundled size, number of developers and feature size. Yeah. Because that has a 8.65.5 and four 4.6. So that is what this indicates. What it means is that if you want to reduce CLT, maybe you should focus on these three factors. Because these three factors have the highest coefficients, right. So they have the greatest influence on CLD. Yeah. Now, in this particular case, the first is bundle size, second is number of developers. Third is feature size. Right. Out of this, not everything increasing number of developers is potentially a management decision or a staffing decision. Whereas introducing practices that will reduce feature size or that will reduce bundle size is not necessarily a management decision. The principal engineers on the team or senior technical people can take this call and try to do something about it. And therefore this gives us the answer. If you see basically if it says, okay, the most in our control, what is in our control is the bundle size and feature size. Then you can go back to this map and say okay, so which factor influences bundle size? Okay? That's independent deployment, right. And which factor influences feature size? We can do something like feature slicing and that is how we come to know that for the data and the investigation that if it belongs to a particular team or a particular portfolio, then we can say that that team or that portfolio will most likely get the greatest benefit from these sort of interventions. Right. So that is how the statistical analysis helping come up with the answer of what we should focus on in the future. But in a way it's a prediction it's a prediction based on the past data. Of course, you run the model on the past data, so it's selling. At least that is what the past is telling you. Right, but we can verify that once we adopt this recommendation and we actually introduce these practices into the teams, then if those practices are really making an effect, then they should have a trickle up effect on the top of the tree, right? So if this is having an effect, then everything else constant integration, test time should go down. Similarly, if this is having an effect and everything else constant wait time should go down, right? So again, after a few months, we can see did test integration, test time wait time decrease for comparable features. When I say comparable features, just like what narrat said earlier, you'll need some sort of normalization activity to make them comparable. But once you do that, we can answer this question did it reduce? And similarly, if they reduced, then how much as a result of them reducing, what was the effect at this level? Right? How much did that help improve CLT? That is how we verify the result of our actions. And our actions themselves were based on the result of the regression analysis. So what we've seen essentially is a data informed transformation loop. You might have come across Bill Measure Learn as the loop to build products. Like you build something, you measure its effect with users or in the market and then you learn. And then that informs your next round of building functionality. Right? Now when you're talking about transformation initiatives, you're not building something in a transformation, but what you're doing is you're designing interventions. You are saying maybe we should adopt this practice, maybe we should use this technique, and so on. And there are interventions, but you could use the same data informed approach to transformation where it's like you intervene a little, you measure the results of the intervention and then you learn about that and that helps you decide your next round of interventions, right? And ideally, before you begin this whole process, you might want to baseline, like in the metrics that you are interested in, CLT delivery, lead time, reliability and so on. You might want to baseline them and then start designing interventions and then keep executing it maybe quarterly iterations of this loop, because transformation efforts, typically weekly iterations may be unrealistic, but maybe quarterly iterations, six monthly iterations, that kind of because you need time for the data to accumulate and then make inferences from that data. So this was the sample analysis. Now we'll get into the actual analysis, what we actually did for a client, and for that, again for the next few slides, I'm going to ask Narish to come in and talk about we did not begin with statistics, we began with the Excel based analysis. And I'll let Nerves speak about the first part of this. Cool. Thanks, Gradam. Just before we jump in, I see there is one question. We'll probably just quickly make sure that we've answered that before we move ahead. So there is a question from Shravan and he's asking why we logically know the CLT contributing factors. How do we know the identified contributors is really contributing to CLT or not? I think that's kind of maybe you might have asked this slightly before and that's kind of what Sriram actually went through and explained. So hopefully sherene that is actually covered. If not, please let us know and we will maybe circle back. All right, so I just wanted to make sure that we've addressed that piece. Yes. Okay, cool. Please confirm that it is unfortunately we can't see you, but we can still get that feedback. All right, perfect. So the Frugal innovation thought process, right? You want to start with the simplest possible thing and of course a lot of people will look down saying you're using Excel or whatever, but I think it's actually a pretty powerful tool and it can give you a lot of insights and you can actually iterate on it quite a lot before you decide what you might want to deep dive in. Right? So in our case, we basically said, hey, we want to understand essentially the impact of CLD month on month, how is the CLT doing and how does it compare to the feature counts? Right? Like the feature velocity, if you will, that is being completed. So what we started doing is we basically started plotting this data out using simple Excel and we started looking at basically month on month, what does our CLT look like and essentially how does that compare to the features and is there any correlation between the two and what are the trends looking like? Unfortunately, as you can see from the graph at least, we couldn't see a direct pattern. One of the thought process was basically as the feature count increases, CLT will also increase was kind of one of the assessment or at least hypothesis that we had. But the data did not 100% concur to that and we had a lot of things that were not matching that. So we wanted to get a little bit more deeper and try and understand what is going on. So if we quickly move to the next slide, what we try to do is two things here. One is we basically took for each month, instead of just looking at the CLT as a whole number, we started breaking it down into what are the various components that contribute to that CLT, right? Like basically the dev time, the waiting time, the sip time, etc. Now started breaking those and we wanted to see if there was any correlation between them. The other thing also we tried to do is instead of looking at it as an absolute number, we basically started looking at it from a percentage point of view. So relatively we wanted to see if certain phase is actually leading to a larger contribution. And what we quickly realize is when you are trying to do this month on month analysis, especially for something like this where your CLT itself is much longer than any given month, you will have the carry over effects and you will have other kinds of things where something would have a delayed effect showing at a later point in time and vice versa. And so you can't really come up with any clear conclusions basis. This, we decided that this month on month view of looking at data is actually not the right way to do this. And so if you move to the next level I will kind of then pivoted a little bit. Now you will see here in the bottom, these are basically feature IDs. So we are looking at for a given feature, what are all the basic contributors to CMT? And essentially and then we started laying over like a couple of influences that we thought for example feature size or things like bundle size. And what we did see in some cases the orange is basically the time it takes in testing in sight or the yellow one that you will see is actually the time it takes for testing in replica environment or staging environment. And what we could see is in a few cases at least if the size of the feature was big then essentially the testing time, either the Sid time or the replica time put together. That was a big number but that was a big portion of the CLT. So this sounded very promising to us. We said okay, this is great, right. And what we try to do is we said hey, but we also know that just feature size is not the only contributor, there are other contributors. So let's start overlaying all those contributors onto this and see if we can easily find some patterns and hooray then we have the answer. Unfortunately, when we started doing that, things became very fuzzy. Things were no longer as simplistic as saying okay, if the feature size is big then the testing time is more. And as experts you can obviously relate to that and say yes, this makes sense. So our data is in fact adhering to our mental model of things. But as you started laying more data, more influences, those correlations didn't really hold up right and it started becoming too complicated for us to drive this. And this is the point where I think against Sri Lanka and Rakesh said well, we have outlived what we could do with Excel and at this point we need to turn to a little bit more statistical tools like our or things like that. And again I'll kind of quickly pass it back to Sriram to narrate the story from here what happened. Yes. Thank you, Narrate. So the next iteration we started with statistical analysis. So we basically had relatively speaking good quality data for these three variables, right? We did not directly have feature size in points or whatever but we had a proxy metric for that and then we had buck count and we had bundled size. So we said CLT is a function of these three variables, let's do analysis and see how it holds up. Right? So we split the data into three portfolios because like I said, it's a large scale set up and there were different lines of businesses. So here every portfolio corresponds to one line of business. So when we split it up then in two out of three portfolios we found that the regression result could explain about 60% of the variation in CLT which in other words they adjusted our square was about 0.6%. And that's not great, but that's not disappointing either. It just means that there are still some more variables that influence the Tea and so we need that. So for our next iteration we want to have data on the actual development days and test days and for that we are in talks with the team that manages their processes and tools to use the Capacity Management module in Azure DevOps and also maybe do some lightweight time logging of the actual time it took on different features. I know that's not great. But here is the sort of ideally we want to minimize manual data entry but on the other hand. If you want to have sustained budget for your transformation efforts then at some point you're going to have to show the demonstrate results and if you want to demonstrate results in a somewhat rigorous manner then you have to do all this and for this you need the data. And where will the data come from? Sometimes you can just through the actions of people the data is generate a but other times you have to ask for a little bit of data entry discipline, right? So that's what we are going to do as the next iteration. But even so far in whatever we have done so far, we've faced quite a few challenges and a lessons learned in the areas of data extraction, data availability and data quality. And jointly I've been thinking and writing about this topic in other contexts as well. Jointly all these challenges in some ways they're all measurement challenges, the ability to measure things and they contribute to measurement debt which is similar to technical debt or tech debt if you've heard of it. Right? So you know that tech debt slows things down and it basically reduces the rate of change and it makes code less maintainable and so on. It has all those negative effects. Similarly measurement debt has negative effects. Measurement debt means you can't measure the result of what you're doing and therefore you can't learn from it. Right? But it is very common, this is probably even more common than tech debt in most organizations. And so if I want to define it a little bit formally, I would say that an organization takes on Measurement Debt when it implements initiatives. Any kind of initiative change initiative, or a new product, or a new set of features for a product, whatever it is, it's an investment, it represents an investment. So they are investing in the initiative, but they are not investing in the measurement infrastructure that is required in order to validate the benefits to be delivered that will be delivered by those initiatives. Right? So if this happens, then you're taking on measurement. And so in our case, in our case, what is the initiative that we're talking about? It's basically the I'll maybe give you a second to think about this. We're talking about like if you invest in some kind of initiative, whether it's a technology initiative or in this context it's a transformation initiative, right? You are still investing in it. You may be hiring some coaches, maybe you're investing in some tooling and so on. They represent investments. So investing in a transformation initiative. But if you don't have the corresponding measurement infrastructure to validate if it is making a difference, then you're basically shooting blind. And that is the state in many organizations. Transformation is article of eight. They say, oh, we are doing stand ups, we are doing CICD, we are doing this right. But we don't know if it's really making a difference. It's because we don't have the rigorous measurement practices in place. So measurement debt breaks these loops. Earlier we talked about the intervene measure. Learn loop. And if that loop is active, it will be great, it will accelerate learning. It will give you hopefully your transformation will be more fruitful. But if you have Measurement Debt, you can't measure things, so it breaks these loops. And so you're basically doing one thing after the other in the hope that it is going to make a difference without any means to verify it. So now that was a little bit of reflecting on the whole process. Now I want to talk about the specific challenges. Data extraction is something that Narration Raki spent a lot of time in. And without that effort, none of this would have been possible. But they are most closely familiar with this. I'll again request a narrator to talk about this. Cool. I mean, data extraction sounds awesome, right? With data extraction comes its own set of challenges. The very first one that one can imagine if you're trying to pull all this kind of like lots of these different kinds of data, unfortunately, most tools don't give you one ready query or API that you can just call and get all of this data you'd have to be making. In our case we were making I think now we are up to about 100. We were about 10,000. Now I think we're close to 100,000 API calls we are making and trying to stitch this data together. And it's not as simple as just stitching the data together if you click on the next thing. Sriram, we also in many cases have to perform complex data transformation steps to then aggregate this data and reshape the data to be able to present that. And once we have that, you might think, okay, you've got it, but only to realize that you have a very low signal to noise ratio and you need to discard a lot of data and only pick few important parameters out of the GB's of data that you pull through these API calls. And if you go next, one of the other challenges that we ran is that across projects, because no one was originally analyzing this data from this lens, different teams ended up doing things differently, both in terms of their workflows, in terms of custom fields that they were using and so forth. And so we had to really dig in and pull out this data and then have a layer of logic on top of it to interpret this data specific to each project. So a lot of you can imagine configuration sitting, saying for this team what data to be considered as end of deployment to a certain stage or so forth. And one other thing as this was happening is because I think Sri LAN also explained that you start influencing how this data is, you start introducing certain inventions, intervention, so the data itself keeps evolving. And now you have to have custom logic in your extraction, which has to be sensitive to what time period this data belong to, and appropriately massage the data so that you can then make sense out of it. So, again, there are lots of other things, but I think these are the few things that top of my mind that I think we had to do deal with in terms of improving the ability to itself extract the data and get it in a form where we could do analysis on top of it. Great. Back to you, Shrink. Yes, thanks. Then moving on to data availability again, like people do capacity planning, and from capacity planning you can figure out what is the expected number of developer days and our tester days to be spent on a feature, but that might be different from the actual number of days spent, right? So that is where you have to come up with some additional mechanisms to obtain that kind of data. The other one is time spent in various queues. We talked about this, right? Most workflows don't model for wait times. And therefore, if you want to start getting this data, then you have to introduce those waiting states into those workflows. And in teams where people multitask on features, it's hard to plan and it's harder to understand what really happened actually. Right? So that's where a bit of time logging might help. Like if you see in this table, we are saying on day number six, developers spent 0.5 days on a task or maybe two developers spent a quarter day each on that particular story, right, whatever it is. But you might need some of those data collection of that nature to get to enough data that you can start making interpreting it meaningfully with statistics. There was a quality challenge where, for example, some of the things we were reliant on when did this state change? Right? Like if you're calculating CLT, like, you know, the start of CLT is the complete to go live and there are some other states in between. So sometimes we found that the state change dates are missing and how are they missing, how can they miss? If you have a workflow, then you should not be missing the state case dates, right? Then we realize there is an anti pattern that they are not using the dates based on the state transitions. Instead they have a whole bunch of custom date fields which people have to populate and if they forget to populate or they omit to populate, then we will have this kind of problem. In other cases we found like bugs are not closed after fixing or in some cases we were initially puzzled by this, that we found that for some bugs the close date is earlier than the creation date. And then we realize it's because people are cloning the bugs when they create a new bug report. They are cloning the earlier bug report, earlier bug and just changing the description. Now, if you clone it like that and if you are using custom fields for your dates, then the close date also gets cloned. And therefore now the creation date of the new bug is later than the close date. So all these kinds of things slowly we had to figure out why when we were doing these kinds of things and this is something that I already refer to, that in some cases you need to have some kind of data entry discipline. Even though, yes, we've all been at least narration. I have been developers in the past and we know that developers don't like to do any kind of manual data entry. But that is where if you explain the context and say in order to continue these efforts, we need budget. In order to get budget, we have to make the case that this is actually resulting in a benefit. Otherwise, if you don't have the budget for all these things, then basically the delivery pressures are still going to be there and we will just have to soak up all that pressure and we will not be able to invest in all these interventions, right? So if you explain it like that, it's part of a change management process, then I think you can get more buying for the data entry discipline. So in a way it's not necessarily a bad thing. You can look at it as an additional benefit that along the way you are improving data quality and the way work is getting tracked, right? So instead of this is the textbook loop where you just say intervene, measure, learn. But in practice, what we do is we try to measure. Then we find that a whole set of challenges. So we either improve the processes or the tooling in order to improve the data quality. And then we are now in a position to measure. We learn from that and design our next set of interventions and that becomes the transformation group. In practice, that's pretty much what we wanted to cover to quickly summarize our key points. In conclusion, measurement is necessary. Without measurement, transformation efforts might lose credibility, right? They might get investments for a year or two and after that you might not get any further investments. Even otherwise, measurement is essential in order to execute. If you want to do data informed transformation groups, interview measures kind of loops. But even with measurements in place, it's not straightforward. It's not like A and B kind of inference, right? So it's not straightforward to demonstrate the impact of our interventions. And that is where statistical methods can help. And with the right statistical methods, we can answer what factors are most influential to the metrics that matter. Then we can choose to focus on the interventions that improve those set of factors, right? And if you have sufficient data, this can be this does not have to be done just at a complete organization level. If you have enough data, you can do it on a per team basis or at least on a per portfolio or per line of business basis. When you do this, when you do all of this, you will usually uncover gaps in data quality and availability. And these gaps may be addressed through a continuous improvement of the processes, the workflows, and the tooling. That brings us to the end of everything that we wanted to share with you on this topic. Welcome. Your comments and questions. Cool. Two minutes before time. That's pretty good. I've been trying to address questions along the way, so I have typed out a bunch of questions and responded already. If there are any other questions, happy to take, but I hopefully have answered most of the questions along the way through chat. If there's anything else, let us know. I think there's some confusion around cycle time and lead time. I think I clarified that. I see Pradeep saying that he likes the last chart where we are trying to measure. Other than that, I think we've addressed most of these things. Yeah. Our objective, again, one of the questions was are we going to share specific insights in terms of the interventions that we did? The objective was not to specifically talk about this particular team, this intervention we did because that's going to be different for different teams and different organizations. Our objective here was a bit more metalevel in some sense and trying to talk about the approach that you might want to take in this context. I see one question is popped in, which is Tom. Thanks for asking the question. Tom is actually the guy who invented software metrics. So it's great to have him here. And I think again, Tom is saying what about other critical measures like security and usability? Shirt, you want to take a stab at this? What about other measures like security and usability? Well, I guess you could use the same process, right? Like what is your measure of at a top level, for example, right? What is the measure of security? Right? Somebody might say based on number of incidents and you might have some sort of weighted score for your incidents, right. And you come up with some kind of waiting logic and say, okay, in the last quarter our security score was this much, right? So you have a measure at the top and then you try to figure out the contributing factors, what are the build a contribution tree like we did for security and figure out what are the low level interventions that ultimately kind of bubble up and make a difference at the top. Right. And so basically I think the same kind of method can be applied for other we just need to figure out what is the metric that matters and how it breaks down into a contribution tree and we need to have the data to do the regression analysis. Yeah, also, I don't think we are saying that these things can be just bolted on at the end. Of course, our objective is to try and build this into the whole thought process and improve it. But given you are starting with an organization at a certain point in time, what are the kinds of interventions that you would want to introduce so that eventually these things let it be security, let it be usability. Other critical aspects that influence the product itself is actually baked in, is weaved in or built into what people are doing. It could be right from I think, Tom, you've talked a lot about just improving the quality of verifying the quality of the requirement itself. Is the requirement itself of good quality or not? And so certainly some of those thought process can be built in. But where do you start with and where do you try to move the needle? I guess how we were approaching this saying how do you build identify the metric, then build the contribution tree and then slowly introduce the interventions and keep measuring whether what you're doing is helping you move in the right direction or not. Because all of these things are not going to be overnight change in any organization. Every intervention will also have side effects. And if we are not measuring holistically, you may be driving off the cliff saying you're going really fast, but you might be just driving off a cliff or in the wrong direction. So I think the point here is that how do you establish this kind of a thought process where you can use this continuous learning that comes from measuring the data and being data informed, at least if not data driven. I hope that answers your question, Tom. I know we are out of time, but we're happy to pop into the Hub out area and happy to answer more questions and maybe even show some of the other stuff that we've been doing, if anyone is interested.