Learn Software Quality Metrics: Code Coverage, Complexity, and Defect Tracking

Software Quality Metrics: Code Coverage, Complexity, and Defect Tracking

DodaTech Updated Jun 20, 2026 7 min read

Software quality metrics are quantifiable measures used to evaluate the quality of code, processes, and deliverables — providing objective data to guide decisions about refactoring, testing, and resource allocation.

What You’ll Learn

The most important code quality metrics and how to measure them
How cyclomatic complexity affects testability and defect rates
Why code coverage targets can be misleading
How defect density, MTTR, and MTBF track quality over time
How to build a quality metrics dashboard

Why Quality Metrics Matter

What gets measured gets managed. Without quality metrics, teams make decisions based on intuition and anecdote — which modules need refactoring? Is quality improving or declining? Are we testing the right things? Metrics provide objective answers. Microsoft found that modules with cyclomatic complexity above 15 had 2x more defects than simpler modules. Measuring quality helps teams focus improvement efforts where they have the most impact.

DodaZIP tracks code coverage and cyclomatic complexity for every compression module — a complex algorithm with insufficient coverage is a risk that must be addressed before release.

Learning Path

    flowchart LR
  A[Code Review] --> B[Static Analysis]
  B --> C[Quality Metrics<br/>You are here]
  C --> D[Defect Management]
  D --> E[Continuous Improvement]
  style C fill:#f90,color:#fff

Code Coverage

Code coverage measures what percentage of your code is executed during testing.

Types of Coverage

Type	What It Measures	Formula
Line coverage	Lines of code executed	Lines executed / total lines
Branch coverage	Decision points (if/else) tested	Branches exercised / total branches
Function coverage	Functions called	Functions called / total functions
Path coverage	Unique execution paths	Paths tested / total paths

# Measure with pytest
pytest --cov=myapp --cov-report=html
# Generates htmlcov/index.html with visual coverage report

# Example: line vs branch coverage
def process_order(amount, is_member):
    if is_member:          # Branch 1
        return amount * 0.9
    return amount          # Branch 2

# Test covers only the member path
def test_member_discount():
    assert process_order(100, True) == 90

# Line coverage: 100% (both lines execute)
# Branch coverage: 50% (only one of two branches tested)

Key insight: 100% line coverage with 50% branch coverage is dangerously incomplete. Always track branch coverage for conditional logic.

Cyclomatic Complexity

Cyclomatic complexity measures the number of linearly independent paths through a function’s source code. It’s calculated from the control flow graph:

M = E - N + 2P

Where:
  M = cyclomatic complexity
  E = edges in control flow graph
  N = nodes in control flow graph
  P = connected components

Interpreting Complexity

Score	Risk	Meaning
1-5	Low	Simple, easy to test
6-10	Moderate	Manageable, needs care
11-20	High	Hard to test, high defect risk
21+	Very high	Untestable, must refactor

# Complexity: 1 — single path
def add(a, b):
    return a + b

# Complexity: 4 — multiple branches
def categorize(age):
    if age < 0:                          # +1
        return "invalid"
    if age < 13:                         # +1
        return "child"
    if age < 20:                         # +1
        return "teen"
    return "adult"

# Complexity: 7 — nested conditions
def validate_user(user, order, payment):
    if not user or not user.is_active:   # +1
        return False
    if not order or not order.items:     # +1
        return False
    if payment.amount <= 0:              # +1
        return False
    for item in order.items:             # +1
        if item.price <= 0:              # +1
            return False
        if item.quantity <= 0:           # +1
            return False
    return True

# Measure with radon (Python)
radon cc myapp/ -s
# Output:
# myapp/module.py
#   F categorize 4 A
#   F validate_user 7 B
#   F add 1 A

Defect Density

Defect density measures the number of known defects per unit of code size:

Defect Density = Total Defects / Total Lines of Code

Industry benchmarks:

Quality Level	Defects per KLOC (thousand lines)
Excellent	< 1
Good	1-4
Average	5-10
Poor	10-20
Unacceptable	> 20

Critical note: Compare defect density within the same project over time, not across projects. Different languages, domains, and complexity levels make cross-project comparisons misleading.

MTTR and MTBF

MTTR (Mean Time to Recover)

Average time to restore service after a failure:

MTTR = Total downtime / Number of incidents

Target: Minutes, not hours. Teams with good observability and deployment pipelines achieve MTTR under 30 minutes.

MTBF (Mean Time Between Failures)

Average time between system failures:

MTBF = Total uptime / Number of failures

Target: Weeks or months. High MTBF indicates stable, well-tested software.

# MTTR/MTBF calculator
def calculate_reliability(incidents):
    total_uptime = sum(
        i["resolved_at"] - i["occurred_at"]
        for i in incidents
    )
    total_downtime = sum(
        i["downtime_minutes"]
        for i in incidents
    )
    return {
        "mttr_minutes": total_downtime / len(incidents),
        "mtbf_hours": (total_uptime / len(incidents)) / 3600,
        "availability": (
            1 - total_downtime / (total_uptime / 60)
        ) * 100,
    }

incidents = [
    {"occurred_at": 100, "resolved_at": 10000, "downtime_minutes": 45},
    {"occurred_at": 200000, "resolved_at": 210000, "downtime_minutes": 120},
]

print(calculate_reliability(incidents))

Expected output:

{'mttr_minutes': 82.5, 'mtbf_hours': 29.17, 'availability': 99.53}

Building a Quality Dashboard

# quality_dashboard.py
class QualityDashboard:
    def __init__(self):
        self.metrics = {}

    def add_coverage(self, module, line_pct, branch_pct):
        self.metrics[f"{module}_line_coverage"] = line_pct
        self.metrics[f"{module}_branch_coverage"] = branch_pct

    def add_complexity(self, module, avg_complexity, max_complexity):
        self.metrics[f"{module}_avg_complexity"] = avg_complexity
        self.metrics[f"{module}_max_complexity"] = max_complexity

    def add_defects(self, module, count, severity_breakdown):
        self.metrics[f"{module}_defect_count"] = count
        self.metrics[f"{module}_critical_defects"] = severity_breakdown.get("critical", 0)

    def health_score(self, module):
        score = 100
        if self.metrics.get(f"{module}_line_coverage", 100) < 80:
            score -= 20
        if self.metrics.get(f"{module}_branch_coverage", 100) < 70:
            score -= 15
        if self.metrics.get(f"{module}_max_complexity", 0) > 15:
            score -= 15
        if self.metrics.get(f"{module}_critical_defects", 0) > 0:
            score -= 25
        return max(0, score)

dashboard = QualityDashboard()
dashboard.add_coverage("auth", line_pct=92, branch_pct=85)
dashboard.add_complexity("auth", avg_complexity=3, max_complexity=12)
dashboard.add_defects("auth", count=2, severity_breakdown={"critical": 1})

print(f"Auth module health: {dashboard.health_score('auth')}/100")

Expected output:

Auth module health: 65/100

Common Quality Metrics Mistakes

1. Vanity Metrics

Coverage targets that teams game by writing trivial tests. Branch coverage and mutation score are harder to game.

Fix: Track multiple metrics and look for gaming patterns.

2. Cross-Project Comparisons

Comparing defect density or coverage between Python and JavaScript projects is meaningless.

Fix: Compare the same project over time, or projects in the same language and domain.

3. Ignoring Trend Direction

A single measurement is noise. The trend over time is signal.

Fix: Chart metrics weekly or monthly and watch for trends.

4. Measuring Everything

Too many metrics create noise. Focus on the 3-5 that drive decisions.

Fix: Start with line coverage, branch coverage, cyclomatic complexity, and defect count on critical modules.

5. Not Acting on Metrics

Collecting metrics without acting on them wastes everyone’s time.

Fix: Define thresholds that trigger action — refactor when complexity exceeds 15, improve testing when branch coverage drops below 70%.

6. Only Measuring Output, Not Outcome

Lines of code is an output. Defect reduction and deployment frequency are outcomes.

Fix: Track both activity metrics (tests written, coverage) and outcome metrics (defect rate, MTTR).

7. No Context in Metrics

A module with 90% coverage but 2000 lines is different from 90% coverage with 50 lines.

Fix: Present metrics with context — module size, complexity, and change frequency.

Practice Questions

1. What is cyclomatic complexity and what score indicates high risk?

The number of independent paths through code. Scores above 15 are high risk; above 21 require refactoring.

2. Why can 100% line coverage be misleading?

It doesn’t mean all branches or paths are tested. You can have 100% line coverage with 50% branch coverage.

3. What is the difference between MTTR and MTBF?

MTTR (Mean Time to Recover) measures how fast you recover from failures. MTBF (Mean Time Between Failures) measures how long the system runs between failures.

4. How is defect density calculated and what is a good target?

Defects per thousand lines of code. Under 1 defect/KLOC is excellent; under 5 is good.

5. What should you do when cyclomatic complexity exceeds 20?

Refactor the function into smaller, focused functions. Each function should have complexity under 10.

Challenge: Run radon or a complexity tool on your project. Identify the top 5 most complex functions. Refactor each to reduce complexity below 10. Measure the impact on test coverage and readability.

FAQ

Is 100% code coverage necessary?

No. 100% coverage does not mean 100% bug-free. It’s better to have 85% coverage with meaningful assertions than 100% with trivial tests.

How often should I measure quality metrics?

Run automated measurements on every build. Review trends weekly or monthly.

What is a good cyclomatic complexity score?

Aim for average complexity under 5 and no function above 15. Higher scores correlate with more defects.

Can metrics replace code review?

No. Metrics find patterns and trends. Code review catches logic errors, design issues, and knowledge gaps that metrics miss.

What is the single most important quality metric?

For most teams, branch coverage combined with defect density on critical modules gives the best signal.

What’s Next

Tutorial	What You’ll Learn
Defect Management Process	Bug lifecycle and triage workflows
Code Quality Tools Guide	Automated tools for measuring quality
Static Code Analysis Tools	Deeper look at analysis tools

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-20.

Previous Security Testing: SAST, DAST, Dependency Scanning, and OWASP Guide Next Defect Management: Bug Lifecycle, Severity vs Priority, and Triage

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Software Quality