Software Quality Metrics: Code Coverage, Complexity, and Defect Tracking
Software quality metrics are quantifiable measures used to evaluate the quality of code, processes, and deliverables — providing objective data to guide decisions about refactoring, testing, and resource allocation.
What You’ll Learn
- The most important code quality metrics and how to measure them
- How cyclomatic complexity affects testability and defect rates
- Why code coverage targets can be misleading
- How defect density, MTTR, and MTBF track quality over time
- How to build a quality metrics dashboard
Why Quality Metrics Matter
What gets measured gets managed. Without quality metrics, teams make decisions based on intuition and anecdote — which modules need refactoring? Is quality improving or declining? Are we testing the right things? Metrics provide objective answers. Microsoft found that modules with cyclomatic complexity above 15 had 2x more defects than simpler modules. Measuring quality helps teams focus improvement efforts where they have the most impact.
DodaZIP tracks code coverage and cyclomatic complexity for every compression module — a complex algorithm with insufficient coverage is a risk that must be addressed before release.
Learning Path
flowchart LR
A[Code Review] --> B[Static Analysis]
B --> C[Quality Metrics<br/>You are here]
C --> D[Defect Management]
D --> E[Continuous Improvement]
style C fill:#f90,color:#fff
Code Coverage
Code coverage measures what percentage of your code is executed during testing.
Types of Coverage
| Type | What It Measures | Formula |
|---|---|---|
| Line coverage | Lines of code executed | Lines executed / total lines |
| Branch coverage | Decision points (if/else) tested | Branches exercised / total branches |
| Function coverage | Functions called | Functions called / total functions |
| Path coverage | Unique execution paths | Paths tested / total paths |
# Measure with pytest
pytest --cov=myapp --cov-report=html
# Generates htmlcov/index.html with visual coverage report# Example: line vs branch coverage
def process_order(amount, is_member):
if is_member: # Branch 1
return amount * 0.9
return amount # Branch 2
# Test covers only the member path
def test_member_discount():
assert process_order(100, True) == 90
# Line coverage: 100% (both lines execute)
# Branch coverage: 50% (only one of two branches tested)Key insight: 100% line coverage with 50% branch coverage is dangerously incomplete. Always track branch coverage for conditional logic.
Cyclomatic Complexity
Cyclomatic complexity measures the number of linearly independent paths through a function’s source code. It’s calculated from the control flow graph:
M = E - N + 2P
Where:
M = cyclomatic complexity
E = edges in control flow graph
N = nodes in control flow graph
P = connected componentsInterpreting Complexity
| Score | Risk | Meaning |
|---|---|---|
| 1-5 | Low | Simple, easy to test |
| 6-10 | Moderate | Manageable, needs care |
| 11-20 | High | Hard to test, high defect risk |
| 21+ | Very high | Untestable, must refactor |
# Complexity: 1 — single path
def add(a, b):
return a + b
# Complexity: 4 — multiple branches
def categorize(age):
if age < 0: # +1
return "invalid"
if age < 13: # +1
return "child"
if age < 20: # +1
return "teen"
return "adult"
# Complexity: 7 — nested conditions
def validate_user(user, order, payment):
if not user or not user.is_active: # +1
return False
if not order or not order.items: # +1
return False
if payment.amount <= 0: # +1
return False
for item in order.items: # +1
if item.price <= 0: # +1
return False
if item.quantity <= 0: # +1
return False
return True# Measure with radon (Python)
radon cc myapp/ -s
# Output:
# myapp/module.py
# F categorize 4 A
# F validate_user 7 B
# F add 1 ADefect Density
Defect density measures the number of known defects per unit of code size:
Defect Density = Total Defects / Total Lines of CodeIndustry benchmarks:
| Quality Level | Defects per KLOC (thousand lines) |
|---|---|
| Excellent | < 1 |
| Good | 1-4 |
| Average | 5-10 |
| Poor | 10-20 |
| Unacceptable | > 20 |
Critical note: Compare defect density within the same project over time, not across projects. Different languages, domains, and complexity levels make cross-project comparisons misleading.
MTTR and MTBF
MTTR (Mean Time to Recover)
Average time to restore service after a failure:
MTTR = Total downtime / Number of incidentsTarget: Minutes, not hours. Teams with good observability and deployment pipelines achieve MTTR under 30 minutes.
MTBF (Mean Time Between Failures)
Average time between system failures:
MTBF = Total uptime / Number of failuresTarget: Weeks or months. High MTBF indicates stable, well-tested software.
# MTTR/MTBF calculator
def calculate_reliability(incidents):
total_uptime = sum(
i["resolved_at"] - i["occurred_at"]
for i in incidents
)
total_downtime = sum(
i["downtime_minutes"]
for i in incidents
)
return {
"mttr_minutes": total_downtime / len(incidents),
"mtbf_hours": (total_uptime / len(incidents)) / 3600,
"availability": (
1 - total_downtime / (total_uptime / 60)
) * 100,
}
incidents = [
{"occurred_at": 100, "resolved_at": 10000, "downtime_minutes": 45},
{"occurred_at": 200000, "resolved_at": 210000, "downtime_minutes": 120},
]
print(calculate_reliability(incidents))Expected output:
{'mttr_minutes': 82.5, 'mtbf_hours': 29.17, 'availability': 99.53}Building a Quality Dashboard
# quality_dashboard.py
class QualityDashboard:
def __init__(self):
self.metrics = {}
def add_coverage(self, module, line_pct, branch_pct):
self.metrics[f"{module}_line_coverage"] = line_pct
self.metrics[f"{module}_branch_coverage"] = branch_pct
def add_complexity(self, module, avg_complexity, max_complexity):
self.metrics[f"{module}_avg_complexity"] = avg_complexity
self.metrics[f"{module}_max_complexity"] = max_complexity
def add_defects(self, module, count, severity_breakdown):
self.metrics[f"{module}_defect_count"] = count
self.metrics[f"{module}_critical_defects"] = severity_breakdown.get("critical", 0)
def health_score(self, module):
score = 100
if self.metrics.get(f"{module}_line_coverage", 100) < 80:
score -= 20
if self.metrics.get(f"{module}_branch_coverage", 100) < 70:
score -= 15
if self.metrics.get(f"{module}_max_complexity", 0) > 15:
score -= 15
if self.metrics.get(f"{module}_critical_defects", 0) > 0:
score -= 25
return max(0, score)
dashboard = QualityDashboard()
dashboard.add_coverage("auth", line_pct=92, branch_pct=85)
dashboard.add_complexity("auth", avg_complexity=3, max_complexity=12)
dashboard.add_defects("auth", count=2, severity_breakdown={"critical": 1})
print(f"Auth module health: {dashboard.health_score('auth')}/100")Expected output:
Auth module health: 65/100Common Quality Metrics Mistakes
1. Vanity Metrics
Coverage targets that teams game by writing trivial tests. Branch coverage and mutation score are harder to game.
Fix: Track multiple metrics and look for gaming patterns.
2. Cross-Project Comparisons
Comparing defect density or coverage between Python and JavaScript projects is meaningless.
Fix: Compare the same project over time, or projects in the same language and domain.
3. Ignoring Trend Direction
A single measurement is noise. The trend over time is signal.
Fix: Chart metrics weekly or monthly and watch for trends.
4. Measuring Everything
Too many metrics create noise. Focus on the 3-5 that drive decisions.
Fix: Start with line coverage, branch coverage, cyclomatic complexity, and defect count on critical modules.
5. Not Acting on Metrics
Collecting metrics without acting on them wastes everyone’s time.
Fix: Define thresholds that trigger action — refactor when complexity exceeds 15, improve testing when branch coverage drops below 70%.
6. Only Measuring Output, Not Outcome
Lines of code is an output. Defect reduction and deployment frequency are outcomes.
Fix: Track both activity metrics (tests written, coverage) and outcome metrics (defect rate, MTTR).
7. No Context in Metrics
A module with 90% coverage but 2000 lines is different from 90% coverage with 50 lines.
Fix: Present metrics with context — module size, complexity, and change frequency.
Practice Questions
1. What is cyclomatic complexity and what score indicates high risk?
The number of independent paths through code. Scores above 15 are high risk; above 21 require refactoring.
2. Why can 100% line coverage be misleading?
It doesn’t mean all branches or paths are tested. You can have 100% line coverage with 50% branch coverage.
3. What is the difference between MTTR and MTBF?
MTTR (Mean Time to Recover) measures how fast you recover from failures. MTBF (Mean Time Between Failures) measures how long the system runs between failures.
4. How is defect density calculated and what is a good target?
Defects per thousand lines of code. Under 1 defect/KLOC is excellent; under 5 is good.
5. What should you do when cyclomatic complexity exceeds 20?
Refactor the function into smaller, focused functions. Each function should have complexity under 10.
Challenge: Run radon or a complexity tool on your project. Identify the top 5 most complex functions. Refactor each to reduce complexity below 10. Measure the impact on test coverage and readability.
FAQ
What’s Next
| Tutorial | What You’ll Learn |
|---|---|
| Defect Management Process | Bug lifecycle and triage workflows |
| Code Quality Tools Guide | Automated tools for measuring quality |
| Static Code Analysis Tools | Deeper look at analysis tools |
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-20.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro