Skip to content
Software Quality Metrics: Code Coverage, Complexity, and Defect Tracking

Software Quality Metrics: Code Coverage, Complexity, and Defect Tracking

DodaTech Updated Jun 20, 2026 7 min read

Software quality metrics are quantifiable measures used to evaluate the quality of code, processes, and deliverables — providing objective data to guide decisions about refactoring, testing, and resource allocation.

What You’ll Learn

  • The most important code quality metrics and how to measure them
  • How cyclomatic complexity affects testability and defect rates
  • Why code coverage targets can be misleading
  • How defect density, MTTR, and MTBF track quality over time
  • How to build a quality metrics dashboard

Why Quality Metrics Matter

What gets measured gets managed. Without quality metrics, teams make decisions based on intuition and anecdote — which modules need refactoring? Is quality improving or declining? Are we testing the right things? Metrics provide objective answers. Microsoft found that modules with cyclomatic complexity above 15 had 2x more defects than simpler modules. Measuring quality helps teams focus improvement efforts where they have the most impact.

DodaZIP tracks code coverage and cyclomatic complexity for every compression module — a complex algorithm with insufficient coverage is a risk that must be addressed before release.

Learning Path

    flowchart LR
  A[Code Review] --> B[Static Analysis]
  B --> C[Quality Metrics<br/>You are here]
  C --> D[Defect Management]
  D --> E[Continuous Improvement]
  style C fill:#f90,color:#fff
  

Code Coverage

Code coverage measures what percentage of your code is executed during testing.

Types of Coverage

TypeWhat It MeasuresFormula
Line coverageLines of code executedLines executed / total lines
Branch coverageDecision points (if/else) testedBranches exercised / total branches
Function coverageFunctions calledFunctions called / total functions
Path coverageUnique execution pathsPaths tested / total paths
# Measure with pytest
pytest --cov=myapp --cov-report=html
# Generates htmlcov/index.html with visual coverage report
# Example: line vs branch coverage
def process_order(amount, is_member):
    if is_member:          # Branch 1
        return amount * 0.9
    return amount          # Branch 2

# Test covers only the member path
def test_member_discount():
    assert process_order(100, True) == 90

# Line coverage: 100% (both lines execute)
# Branch coverage: 50% (only one of two branches tested)

Key insight: 100% line coverage with 50% branch coverage is dangerously incomplete. Always track branch coverage for conditional logic.

Cyclomatic Complexity

Cyclomatic complexity measures the number of linearly independent paths through a function’s source code. It’s calculated from the control flow graph:

M = E - N + 2P

Where:
  M = cyclomatic complexity
  E = edges in control flow graph
  N = nodes in control flow graph
  P = connected components

Interpreting Complexity

ScoreRiskMeaning
1-5LowSimple, easy to test
6-10ModerateManageable, needs care
11-20HighHard to test, high defect risk
21+Very highUntestable, must refactor
# Complexity: 1 — single path
def add(a, b):
    return a + b

# Complexity: 4 — multiple branches
def categorize(age):
    if age < 0:                          # +1
        return "invalid"
    if age < 13:                         # +1
        return "child"
    if age < 20:                         # +1
        return "teen"
    return "adult"

# Complexity: 7 — nested conditions
def validate_user(user, order, payment):
    if not user or not user.is_active:   # +1
        return False
    if not order or not order.items:     # +1
        return False
    if payment.amount <= 0:              # +1
        return False
    for item in order.items:             # +1
        if item.price <= 0:              # +1
            return False
        if item.quantity <= 0:           # +1
            return False
    return True
# Measure with radon (Python)
radon cc myapp/ -s
# Output:
# myapp/module.py
#   F categorize 4 A
#   F validate_user 7 B
#   F add 1 A

Defect Density

Defect density measures the number of known defects per unit of code size:

Defect Density = Total Defects / Total Lines of Code

Industry benchmarks:

Quality LevelDefects per KLOC (thousand lines)
Excellent< 1
Good1-4
Average5-10
Poor10-20
Unacceptable> 20

Critical note: Compare defect density within the same project over time, not across projects. Different languages, domains, and complexity levels make cross-project comparisons misleading.

MTTR and MTBF

MTTR (Mean Time to Recover)

Average time to restore service after a failure:

MTTR = Total downtime / Number of incidents

Target: Minutes, not hours. Teams with good observability and deployment pipelines achieve MTTR under 30 minutes.

MTBF (Mean Time Between Failures)

Average time between system failures:

MTBF = Total uptime / Number of failures

Target: Weeks or months. High MTBF indicates stable, well-tested software.

# MTTR/MTBF calculator
def calculate_reliability(incidents):
    total_uptime = sum(
        i["resolved_at"] - i["occurred_at"]
        for i in incidents
    )
    total_downtime = sum(
        i["downtime_minutes"]
        for i in incidents
    )
    return {
        "mttr_minutes": total_downtime / len(incidents),
        "mtbf_hours": (total_uptime / len(incidents)) / 3600,
        "availability": (
            1 - total_downtime / (total_uptime / 60)
        ) * 100,
    }

incidents = [
    {"occurred_at": 100, "resolved_at": 10000, "downtime_minutes": 45},
    {"occurred_at": 200000, "resolved_at": 210000, "downtime_minutes": 120},
]

print(calculate_reliability(incidents))

Expected output:

{'mttr_minutes': 82.5, 'mtbf_hours': 29.17, 'availability': 99.53}

Building a Quality Dashboard

# quality_dashboard.py
class QualityDashboard:
    def __init__(self):
        self.metrics = {}

    def add_coverage(self, module, line_pct, branch_pct):
        self.metrics[f"{module}_line_coverage"] = line_pct
        self.metrics[f"{module}_branch_coverage"] = branch_pct

    def add_complexity(self, module, avg_complexity, max_complexity):
        self.metrics[f"{module}_avg_complexity"] = avg_complexity
        self.metrics[f"{module}_max_complexity"] = max_complexity

    def add_defects(self, module, count, severity_breakdown):
        self.metrics[f"{module}_defect_count"] = count
        self.metrics[f"{module}_critical_defects"] = severity_breakdown.get("critical", 0)

    def health_score(self, module):
        score = 100
        if self.metrics.get(f"{module}_line_coverage", 100) < 80:
            score -= 20
        if self.metrics.get(f"{module}_branch_coverage", 100) < 70:
            score -= 15
        if self.metrics.get(f"{module}_max_complexity", 0) > 15:
            score -= 15
        if self.metrics.get(f"{module}_critical_defects", 0) > 0:
            score -= 25
        return max(0, score)

dashboard = QualityDashboard()
dashboard.add_coverage("auth", line_pct=92, branch_pct=85)
dashboard.add_complexity("auth", avg_complexity=3, max_complexity=12)
dashboard.add_defects("auth", count=2, severity_breakdown={"critical": 1})

print(f"Auth module health: {dashboard.health_score('auth')}/100")

Expected output:

Auth module health: 65/100

Common Quality Metrics Mistakes

1. Vanity Metrics

Coverage targets that teams game by writing trivial tests. Branch coverage and mutation score are harder to game.

Fix: Track multiple metrics and look for gaming patterns.

2. Cross-Project Comparisons

Comparing defect density or coverage between Python and JavaScript projects is meaningless.

Fix: Compare the same project over time, or projects in the same language and domain.

3. Ignoring Trend Direction

A single measurement is noise. The trend over time is signal.

Fix: Chart metrics weekly or monthly and watch for trends.

4. Measuring Everything

Too many metrics create noise. Focus on the 3-5 that drive decisions.

Fix: Start with line coverage, branch coverage, cyclomatic complexity, and defect count on critical modules.

5. Not Acting on Metrics

Collecting metrics without acting on them wastes everyone’s time.

Fix: Define thresholds that trigger action — refactor when complexity exceeds 15, improve testing when branch coverage drops below 70%.

6. Only Measuring Output, Not Outcome

Lines of code is an output. Defect reduction and deployment frequency are outcomes.

Fix: Track both activity metrics (tests written, coverage) and outcome metrics (defect rate, MTTR).

7. No Context in Metrics

A module with 90% coverage but 2000 lines is different from 90% coverage with 50 lines.

Fix: Present metrics with context — module size, complexity, and change frequency.

Practice Questions

1. What is cyclomatic complexity and what score indicates high risk?

The number of independent paths through code. Scores above 15 are high risk; above 21 require refactoring.

2. Why can 100% line coverage be misleading?

It doesn’t mean all branches or paths are tested. You can have 100% line coverage with 50% branch coverage.

3. What is the difference between MTTR and MTBF?

MTTR (Mean Time to Recover) measures how fast you recover from failures. MTBF (Mean Time Between Failures) measures how long the system runs between failures.

4. How is defect density calculated and what is a good target?

Defects per thousand lines of code. Under 1 defect/KLOC is excellent; under 5 is good.

5. What should you do when cyclomatic complexity exceeds 20?

Refactor the function into smaller, focused functions. Each function should have complexity under 10.

Challenge: Run radon or a complexity tool on your project. Identify the top 5 most complex functions. Refactor each to reduce complexity below 10. Measure the impact on test coverage and readability.

FAQ

Is 100% code coverage necessary?
No. 100% coverage does not mean 100% bug-free. It’s better to have 85% coverage with meaningful assertions than 100% with trivial tests.
How often should I measure quality metrics?
Run automated measurements on every build. Review trends weekly or monthly.
What is a good cyclomatic complexity score?
Aim for average complexity under 5 and no function above 15. Higher scores correlate with more defects.
Can metrics replace code review?
No. Metrics find patterns and trends. Code review catches logic errors, design issues, and knowledge gaps that metrics miss.
What is the single most important quality metric?
For most teams, branch coverage combined with defect density on critical modules gives the best signal.

What’s Next

TutorialWhat You’ll Learn
Defect Management ProcessBug lifecycle and triage workflows
Code Quality Tools GuideAutomated tools for measuring quality
Static Code Analysis ToolsDeeper look at analysis tools

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-20.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro