CRM Analytics & Customer Insights -- Measuring Relationship Performance

DodaTech Updated 2026-06-22 6 min read

In this tutorial, you'll learn about CRM Analytics & Customer Insights. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

CRM analytics applies data analysis to customer relationship data, measuring pipeline performance, customer lifetime value, and retention trends to optimize revenue.

What You'll Learn

In this tutorial, you will learn how to build CRM analytics dashboards for pipeline velocity, customer lifetime value (CLV), churn prediction, and support ticket analysis using SQL, Python, and popular BI tools.

Why It Matters

CRM systems hold a goldmine of customer interaction data, yet most companies use them only as address books. Analyzing CRM data reveals which sales activities produce revenue, which customer segments are most profitable, and where churn risk is highest. Companies using CRM analytics see 25% higher sales quota attainment and 30% lower churn rates.

Real-World Use

DodaTech integrated CRM data from Salesforce with product usage data from PostgreSQL. The analysis revealed that customers who used the API within the first 7 days of trial had a 78% conversion rate versus 22% for non-API users. The sales team now prioritizes API activation during onboarding, increasing trial-to-paid conversion by 41%.

CRM Analytics Architecture

flowchart TD
    A[Salesforce] --> B[CRM Data Lake]
    C[HubSpot] --> B
    D[Support Tickets] --> B
    E[Product Usage] --> B
    B --> F[Pipeline Analysis]
    B --> G[CLV Modeling]
    B --> H[Churn Prediction]
    B --> I[Support Metrics]
    F --> J[Sales Dashboard]
    G --> J
    H --> J
    I --> J
    J --> K[Forecasting]
    J --> L[Territory Planning]

Pipeline Velocity Analysis

Pipeline velocity measures how quickly deals move through your sales stages:

WITH deal_stages AS (
  SELECT
    deal_id,
    stage_name,
    created_date,
    amount,
    LAG(created_date) OVER (PARTITION BY deal_id ORDER BY created_date) AS previous_stage_date,
    MIN(created_date) OVER (PARTITION BY deal_id) AS deal_created_date,
    MAX(CASE WHEN stage_name = 'Closed Won' THEN created_date END) OVER (PARTITION BY deal_id) AS won_date
  FROM salesforce_opportunity_stage
  WHERE created_date >= '2026-01-01'
)

SELECT
  stage_name,
  COUNT(DISTINCT deal_id) AS deals,
  ROUND(AVG(EXTRACT(DAY FROM created_date - COALESCE(previous_stage_date, created_date))), 1) AS avg_days_in_stage,
  ROUND(SUM(amount) FILTER (WHERE won_date IS NOT NULL) / NULLIF(COUNT(DISTINCT deal_id), 0), 2) AS avg_deal_size,
  ROUND(COUNT(DISTINCT deal_id) FILTER (WHERE won_date IS NOT NULL) * 100.0 / COUNT(DISTINCT deal_id), 1) AS win_rate_pct
FROM deal_stages
GROUP BY stage_name
ORDER BY MIN(created_date);

Expected output: A table showing each sales stage, the number of deals, average days spent in that stage, average deal size, and win rate. Stages with high days and low win rates are bottlenecks requiring Process improvement.

Customer Lifetime Value Prediction

Calculate CLV using historical data with a predictive model:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

def calculate_clv(transactions_df, prediction_months=12):
    df = transactions_df.copy()
    df["purchase_date"] = pd.to_datetime(df["purchase_date"])

    now = df["purchase_date"].max()

    customer_metrics = df.groupby("customer_id").agg(
        first_purchase=("purchase_date", "min"),
        last_purchase=("purchase_date", "max"),
        total_revenue=("revenue", "sum"),
        purchase_count=("Transaction_id", "count"),
    ).reset_index()

    customer_metrics["customer_lifetime_days"] = (
        customer_metrics["last_purchase"] - customer_metrics["first_purchase"]
    ).dt.days

    customer_metrics["days_since_last_purchase"] = (
        now - customer_metrics["last_purchase"]
    ).dt.days

    customer_metrics["avg_revenue_per_month"] = (
        customer_metrics["total_revenue"]
        / (customer_metrics["customer_lifetime_days"] / 30)
    ).fillna(customer_metrics["total_revenue"])

    customer_metrics["predicted_clv"] = (
        customer_metrics["avg_revenue_per_month"] * prediction_months
    )

    def clv_segment(row):
        if row["predicted_clv"] > 1000:
            return "High Value"
        elif row["predicted_clv"] > 300:
            return "Medium Value"
        else:
            return "Low Value"

    customer_metrics["clv_segment"] = customer_metrics.apply(clv_segment, axis=1)

    return customer_metrics

clv_data = calculate_clv(pd.read_csv("crm_transactions.csv"))
print(clv_data["clv_segment"].value_counts())
print(clv_data.groupby("clv_segment")["predicted_clv"].mean())

Expected output: A segment distribution showing how many customers are high, medium, and low value. High-value customers (top 20%) typically generate 80% of predicted revenue, confirming the Pareto principle in your customer BASE.

Churn Prediction with Logistic Regression

Identify at-risk customers before they leave:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score

def build_churn_model(events_df, churn_label_col="is_churned"):
    features = events_df.groupby("user_id").agg({
        "active_days": "sum",
        "support_tickets": "sum",
        "days_since_last_login": "min",
        "feature_uses": "sum",
        "revenue": "sum",
    }).fillna(0)

    features["login_frequency"] = features["active_days"] / 90
    features["ticket_rate"] = features["support_tickets"] / features["active_days"].clip(lower=1)
    features["is_high_ticket"] = (features["ticket_rate"] > 0.5).astype(int)

    y = events_df.groupby("user_id")[churn_label_col].max()

    X = features.select_dtypes(include=[np.number])
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_State=42, stratify=y
    )

    model = LogisticRegression(max_iter=1000, class_weight="balanced")
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    y_prob = model.predict_proba(X_test)[:, 1]

    feature_importance = pd.DataFrame({
        "feature": X.columns,
        "coefficient": model.coef_[0],
    }).sort_values("coefficient", ascending=False)

    print(classification_report(y_test, y_pred))
    print(f"ROC AUC: {roc_auc_score(y_test, y_prob):.3f}")
    print("\nTop churn predictors:")
    print(feature_importance.head(5))

    return model, feature_importance

churn_model, importance = build_churn_model(
    pd.read_csv("crm_user_events_90_days.csv"),
)

Expected output: A classification report showing precision, recall, and f1-score for churn prediction. The ROC AUC score should be above 0.75 for a useful model. The top predictors reveal which behaviors most strongly correlate with churn.

Tool Comparison

Feature	Salesforce CRM Analytics	HubSpot Analytics	Tableau CRM	Custom (Python + SQL)
Pipeline velocity	Yes	Yes	Custom	Custom SQL
CLV calculation	Yes	Yes	Yes	Custom Python
Churn prediction	Einstein AI	Yes	Custom	ML models
Support analytics	Yes	Yes	Custom	Custom SQL
Data integration	Salesforce only	HubSpot only	Any source	Any source
Cost	$$$	$$	$$$	Infrastructure

Common Errors

1. CRM Data Quality Issues

CRM data is often incomplete, duplicated, or stale. Missing deal amounts, duplicate contacts, and outdated stage names produce unreliable analytics. Implement data validation rules in your CRM and run quarterly data cleanup.

2. Ignoring the Sales Cycle Length

CLV predictions using short observation Windows (30 days) are unreliable for products with annual or multi-year contracts. Match the observation window to at least 3 times your average sales cycle length.

3. Survivorship Bias in Win Rate Analysis

Analyzing won deals without considering lost and open deals inflates win rates. Include all deals in pipeline analysis. A 90% win rate on 10 deals that ignored 100 stalled deals is misleading.

4. Treating All Support Tickets Equally

Billing inquiries, feature requests, and bug reports require different responses. Categorize tickets by type before measuring resolution time and customer satisfaction. A fast resolution on billing is LESS impactful than a fast resolution on a critical bug.

5. Overfitting Churn Prediction Models

Churn models trained on historical data may not generalize to future customers with different behaviors. Regularly retrain models on recent data. Monitor feature importance for drift and recalibrate when model accuracy drops below threshold.

Practice Questions

1. What is pipeline velocity and how is it calculated? Pipeline velocity measures the speed at which deals move through the sales pipeline. It is calculated as (number of deals x win rate x average deal size) / average sales cycle length in days.

2. Why use Pareto principle (80/20 rule) in CRM analytics? The Pareto principle typically applies to customer value: 20% of customers generate 80% of revenue. CRM analytics identifies these high-value customers so resources can be focused on retention and expansion for the most profitable segment.

3. What is customer lifetime value and why does it matter? CLV predicts the total revenue a customer will generate during their relationship with your business. It determines how much you can spend on customer acquisition and which customer segments deserve premium support and retention investment.

4. How does support ticket analysis improve retention? Support ticket data reveals product friction points, documentation gaps, and feature requests. Analyzing ticket volume by category identifies which issues cause the most customer frustration, allowing proactive fixes that reduce churn.

5. Challenge: Export 12 months of CRM data including opportunities, contacts, activities, and support tickets. Build a pipeline velocity dashboard showing stage-by-stage conversion and time-in-stage. Calculate CLV for all active customers using the predictive model. Build a churn prediction model and identify the top 100 at-risk customers. Create a retention campaign targeting those customers with personalized outreach based on their churn drivers.

Mini Project

Build a complete CRM analytics dashboard that ingests data from a CRM API (Salesforce or HubSpot) into PostgreSQL. Include four views: pipeline velocity with stage-by-stage conversion, CLV distribution with segment breakdown, churn risk list with prediction scores, and support ticket trends by category. Configure weekly email reports to the sales team with at-risk customer alerts. Add a lead scoring model that ranks new leads by predicted conversion probability based on historical lead-to-win patterns.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

← Previous User Segmentation Strategies -- Behavioral, Demographic & Predictive Analytics Next → Business Intelligence Tools -- Tableau, Metabase, Superset & Power BI

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Analytics