How to know if your software quality is being measured correctly.
Quality Over Quantity
Absent adoption of ISO standards, procedures and practices that drive staff bloat and bureaucracy with the intent of improving code quality. Audits, consultants, process and large quality assurance teams don’t actually make software products better. At best they document some of what is wrong and usually establish meaningless metrics and meatless improvement plans. Most of these processes only pacify acceptance of code products as being “good enough”. Good code comes from well run organizations that enable good developers to do what they do best. PERIOD
Good software is made from good code, which is written by good developers. It is that simple. Which is why it is important to know what a good developer is too.
It is common today to measure a developers skills based on things like “time to market”, total number of code commits or the fewest bug reports that come out of your testing framework. Most of these measurements are misleading and oversimplified. Looking a bit deeper into the practice of writing code will produce better metrics. Getting to know your developers is more important than the metrics you use to “measure” them. Simply understanding their strengths and weaknesses will help you build better cross functional teams. A manager must understand how to write code to understand both the developers and the tasks those developers must perform .
Metrics
Common misconceptions:
< total lines of code > != < total amount of system resources used > (Inverse also true)
< fewer lines of code > != < better code >
< total commits and pushs > != < amount of work done >
The problem with metrics really boils down to this. It is common for mediocre people to manage themselves based on their perception of their organization’s metrics used to measure value. Highly skilled and confident developers just want to produce the best product possible and could care less about metrics. These two mindsets rarely clash but this situation can lead to issues at some point. Hard workers will get tired of watching the mediocre get recognized and rewarded for projects they ended up having to fix. Metrics applied with a more meaningful goal meant for the team, not the outside world is what you should strive for.
When my team released a major architectural and functional release of software that required “all hands on deck” in the event of a problem. We were assigned the “best” unix system administrator to help with the changes than needed to be implemented. Part of this change involved a method of procedure (MOP) to be executed on several Solaris based systems. These changes were trivial but required “root” privileges on the servers. We were told we had the “best” person on the job, so we spent little time worrying about this effort. Our assigned system administrator ended up not being capable of performing the tasks and simply gave my team root access. My team performed the steps and the unix admin just listened in on the phone. I was a bit confused and concerned, so I reached out to the manager of the systems administration team and got the scoop. This manager stack ranked their employees based on the number of trouble tickets that individual resolved. The manager also pointed out that this systems administrator came into the office every morning at 6 AM, a full two hours before they were required to. The manager was very impressed with this individuals work ethic and dedication. Come to find out many of the tickets this person closed were automated tickets that required little effort and because they came in earlier than everyone else, they got first pick of the tickets (Early bird gets the worm). Meanwhile the others were forced to sort out the more difficult tickets that required more knowledge and skill. This was how they earned the badge of “The Best” on that team. It was the system used to determine pay raises and bonuses as well so there were some pretty significant morale issues on the team.
Metrics alone can reward people for gaming the system and punish those who work hard. Using a peer review system alone can be almost as detrimental. Although peer reviews are superior to metrics it is also susceptible to bias, politics and personality tendencies. A good team starts with good leadership, good culture and integrity. Allowing a team to think for itself and feel empowered, appreciated and respected is fundamental. Done correctly, self managed teams run the best. Done incorrectly you could end up with a “Lord of the Flies” situation.
Taking the time to get to know details about your team along with, metrics and peer review system seems to work. You have to know that developer X is a bit insecure and speaks poorly of those they feel threatened by. Developer C is best friends with developer X and has a tendency to be a follower. Developer A who is quiet, softly spoken and also spends their spare time contributing to opensource projects is probably actually the best developer but flys under the radar. Developers alone don’t make for good or bad code. Good developers, good management, good culture and good process all have a part.
Establishing Meaningful Metrics
Metrics in a development team or pod can drive behavior in many directions. Typical management styles are built around the metrics those managers feel they are being tracked to. A manager generally wants to provide metrics that put their team in the most favorable light possible. If a company tends to not trust their employees and worry about them not working hard enough. You have much bigger issues than code quality and that should be addressed first. The metrics used on a team can set the foundation of culture for that team. Some basic areas that can be looked at are:
- Quantity and make-up of code.
- Quality of code methods and framework.
- Efficiency of CICD pipelines. (Bugs identified and remediated, releases per week/day/hour)
- Delivery of features.
Quantity of Code
Total number of commits can be a handy metric as “what” is being committed and by whom. So if you dig into each commit and see what the changes were, there is some valuable data. Adding comments to their code looks similar to new code. Adding, changing and deleting files means something but not usually a direct correlation to level of effort. Code push, merges and forks also tell a story. It comes down to knowing “how” your team uses git or whatever code/versioning management software is in practice.
This is where it can be confusing because everything is not binary in software development, even though everything is actually binary eventually. Using total lines of code or total commits can be used to drive behavior. This can be both good and bad. If you desire your developers to document their work better, tracking total lines of code can be good. You will have 10 comments to 3 lines of code, which is fine if the comments are meaningful. Developers generally don’t like to document their code so this actually can be useful. I liked to maintain a ratio of comments to code to drive behavior around documenting code. This ratio would vary based off the problem we were addressing.
Quality of Practice
Quality is tricky and seems to be something intangible and hard to measure. It isn’t at all if you really think about what you are measuring, what it means and the impact of using it as a metric to drive behavior. If this is well understood then quality is well understood.
All that we have established is quantity and makeup of code, which by itself is not a meaningful set of metrics without looking at quality. Code quality is very debatable and can be driven by opinion and egos. There can be many ways to solve the same problem, so debating how a problem gets solved over determining that the code has actually solved the problem is an exercise in ego not quality. Determining what good code practice looks like, comes down to answering these questions:
- Does the code complete the “work” that the code was intended to do?
- Are all of the inputs and outputs well documented and well understood?
- Does the code follow a standard dictionary of terminology and a disciplined use of attributes/framework?
- Is the code repeatable, reusable and sharable?
- Does the code use systems resources efficiently?
- Is the code portable?
- Is the code secure or can it be made to be secure?
- Is the code well documented in the code itself?
- Iproper error handling being used?
- Is your CICD process based on constantly improving your CICD process?
So if you can answer yes to these 10 questions then you are looking at good code that will continue to get better.
Completing the “work” the code was intended to do is a bigger challenge than most assume. There is an assumption that communication of requirements is always clear and understood. This is actually where software fails the most but is more of a process issue than a development issue. Correcting this issue will allow you to focus on the code itself. Look that the output of the code is well documented for others. This could be a class/object definition or a simple attribute value pair but this is an area that causes issues with other software modules up and down the code tree. This is a problem common in opensource projects, especially with large opensource code bases. I will use OpenStack as an example which is a fantastic product where several opensource APIs have come together to provide a really awesome product. The problem? Every API works differently and uses different terminology, methods, syntax and nomenclatures. Well and Horizon still sucks so, OpenStack users are forced to build their own UIs against a wild west of APIs. Following strict dictionaries that establish common terminology, naming and framework is important. This dictionary should be fed by Ops and CICD testing. As issues are resolved they should be documented, coded and integrated into your overarching processes.
Reusable, repeatable and sharable code should be embraced by every developer. The practice of coders maintaining their own code libraries to simplify their job and their job alone, should be over. Once you have invented the wheel, share it with the world. This makes every developer more efficient as well as making the overall code base more supportable. Once bug remediation occurs in one place you will understand this better. Doing regex searches across a vast code base looking for “cut and paste” implementations of bad code is not pleasant nor efficient.
Resource Consumption
Use of system resources is commonly abused, mostly because developers use resources as it is easier than doing it the right way. Because CPU, RAM and Disks are so cheap it is perceived as not being an issue worth tracking. But if every developer does this, you end up with bloated systems with ridiculous resource requirements as well as sluggish performance. This is where it can be as simple as using 30 lines of code to solve a problem that a python module could have solved in two. Importing modules can introduce security issues as well as expanding demands on memory and CPU. Weighing this is important. If you do any artifact management where you look at libraries, modules and functions as artifacts, then they must be tracked and scanned for vulnerabilities. IS this worth the use of the module? Exploding data models in memory to make code easier to write is also common and usually unnecessary. These practices are getting more attention as the desire to make workloads lighter to support micro services is gaining popularity.
Code portability is much easier today than in years past. Declarations, prerequisites and backwards compatibility are really all that has to be looked at. Physical hardware is abstracted and generalized to the point where good “Infrastructure as Code” methodologies go a long way. Keep in mind if your code base requires more than one version of an interpreter / compiler, you are doing it wrong.
Security
Security is a pain and usually puts a huge burden on a development team only because it is usually implemented wrong. Much of this burden can be addressed through automation and artifact management. It can also be addressed by how you manage Infrastructure as Code. Establishing good policy around things like Apparmor and SELinux can prevent most security issues. Ensuring that web services and API end points are secure and have basic RBAC functionality is an indicator that developers are thinking about security. If code pushes have a dependency that requires changes to AppArmor, SELinux or distribution packaging, could be an indicator of sloppy security practices.
Documentation
Code documentation impacts every aspect of the Ops side of DevOps. Making the intent of the coder relevant when code is being reviewed can help identify issues much faster. At a minimum any deviation outside of common practice should be well documented but really every section of code should have the developers comments documenting their intent. The other challenge poor documentation poses is when you have developer churn where new developers pick up support for a code base. Well documented code is usually accepted and at times complimented. Poorly documented code tends to be rewritten. It is sometimes easier to rewrite it than to sort through the “What were they thinking when they wrote this?” part of reviewing code. I find documenting helps me with the “What was I thinking when I wrote this” part of reviewing my past code. Done correctly comments in code can feed a wiki that “self” documents software. This is great for Ops teams, project managers or just developing traditional manuals for a product. Putting comments that this section of code was changed to address a bug fix is nice as well. Putting too much rigor into documentation can also be a problem. Finding the right balance of documentation is key. Documentation should be in the code not in a word document on a shared disk.
Forests are Made of Trees
Looking at a system end to end can help a developer understand what error handling is important and needs to be addressed. In many cases 90% of the time required to develop an application is spent on error handling. Oddly much of error handling is not necessary if application inputs and outputs are well managed. Error handling should focus on dependencies outside of the realm of control of your CI/CD process and on logic used in the code written. “While” loops that can hit a state of “while PIDs available” on the system, continue to run until PIDs are exhausted are hard to find. Testing for loops that could be infinite is difficult without intent of the developer being known (see documentation). Testing an attribute for an expected value or attribute type is good. Assuming you will always get an integer from another module really sucks when you are now getting a float and your code is miscalculating something. A simple test that logs, “I was expecting an integer but got a float “ then exiting with an error is usually good enough. Maybe the float was intended to express a subclass? Or maybe the float is an indicator that something upstream is broken? Testing, logging and exiting should always be the first practice. Trying to correct errors through assumptions can be dangerous unless you can establish “intent” of the change of data format. If you have ever chased a software bug through multiple levels of proactive “error handling” you will know exactly what problem this presents. It is better to fail fast than fail at some inopportune moment in the future. Finding what new feature broke code is challenging when developers try to “un-break” the code through error handling. This can also cause errors to go undetected for a long period of time, potentially corrupting down stream data. Focusing on “no errors” because I don’t want my code to be the problem is a bad practice but very common. This is why measuring the number of bugs as an output of testing is a bad metric. Bugs are most common in software touch points and exposing these bugs in testing is good….not bad! Right!
Automating Automation
So everyone thinks their CICD process is awesome because it is an improvement on what they did before. Automating build, unit, integration and deployment testing is nice. Tracking bugs through automation and creating scores for products that triggers a production release is also awesome. But many forget about the intent of a CICD pipeline is not just to provide automation but to also always improve. Managing your CICD with CICD sounds silly but it really isn’t and should be done by everyone. As software bugs are identified they should be integrated into your testing framework through defining new tests. Any failure of a CICD chain should be addressed just like a failure of any product CICD is meant to support. Bugs from the field that are captured through humans should be managed just like a bug captured by automated testing, just a bit slower. The first question that should be asked is why did our automated testing miss this? What changes need to be instituted in our CICD process to mitigate this from occurring again with future releases. Organizations don’t like to do this as this new testing causes fallout in software modules that passed testing previously. To correct these issues sometimes requires a better application of the test case but can also require past code to be remediated. This is no different than the discovery of a security vulnerability in a third-party software module. New tests will be identified and remediation will have to be made on code that passed testing previously. This remediation is just as important as new features and should be prioritized the same.
So in summary if all 10 questions are something you can answer with an honest yes then this is a good product that will get better. No code is perfect and all code requires improvement.
Nice article, very detailed and comprehensive presentation. I Learned a lot.