CRM Analytics & Customer Insights -- Measuring Relationship Performance
In this tutorial, you'll learn about CRM Analytics & Customer Insights. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
CRM analytics applies data analysis to customer relationship data, measuring pipeline performance, customer lifetime value, and retention trends to optimize revenue.
What You'll Learn
In this tutorial, you will learn how to build CRM analytics dashboards for pipeline velocity, customer lifetime value (CLV), churn prediction, and support ticket analysis using SQL, Python, and popular BI tools.
Why It Matters
CRM systems hold a goldmine of customer interaction data, yet most companies use them only as address books. Analyzing CRM data reveals which sales activities produce revenue, which customer segments are most profitable, and where churn risk is highest. Companies using CRM analytics see 25% higher sales quota attainment and 30% lower churn rates.
Real-World Use
DodaTech integrated CRM data from Salesforce with product usage data from PostgreSQL. The analysis revealed that customers who used the API within the first 7 days of trial had a 78% conversion rate versus 22% for non-API users. The sales team now prioritizes API activation during onboarding, increasing trial-to-paid conversion by 41%.
CRM Analytics Architecture
flowchart TD
A[Salesforce] --> B[CRM Data Lake]
C[HubSpot] --> B
D[Support Tickets] --> B
E[Product Usage] --> B
B --> F[Pipeline Analysis]
B --> G[CLV Modeling]
B --> H[Churn Prediction]
B --> I[Support Metrics]
F --> J[Sales Dashboard]
G --> J
H --> J
I --> J
J --> K[Forecasting]
J --> L[Territory Planning]
Pipeline Velocity Analysis
Pipeline velocity measures how quickly deals move through your sales stages:
WITH deal_stages AS (
SELECT
deal_id,
stage_name,
created_date,
amount,
LAG(created_date) OVER (PARTITION BY deal_id ORDER BY created_date) AS previous_stage_date,
MIN(created_date) OVER (PARTITION BY deal_id) AS deal_created_date,
MAX(CASE WHEN stage_name = 'Closed Won' THEN created_date END) OVER (PARTITION BY deal_id) AS won_date
FROM salesforce_opportunity_stage
WHERE created_date >= '2026-01-01'
)
SELECT
stage_name,
COUNT(DISTINCT deal_id) AS deals,
ROUND(AVG(EXTRACT(DAY FROM created_date - COALESCE(previous_stage_date, created_date))), 1) AS avg_days_in_stage,
ROUND(SUM(amount) FILTER (WHERE won_date IS NOT NULL) / NULLIF(COUNT(DISTINCT deal_id), 0), 2) AS avg_deal_size,
ROUND(COUNT(DISTINCT deal_id) FILTER (WHERE won_date IS NOT NULL) * 100.0 / COUNT(DISTINCT deal_id), 1) AS win_rate_pct
FROM deal_stages
GROUP BY stage_name
ORDER BY MIN(created_date);
Expected output: A table showing each sales stage, the number of deals, average days spent in that stage, average deal size, and win rate. Stages with high days and low win rates are bottlenecks requiring Process improvement.
Customer Lifetime Value Prediction
Calculate CLV using historical data with a predictive model:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
def calculate_clv(transactions_df, prediction_months=12):
df = transactions_df.copy()
df["purchase_date"] = pd.to_datetime(df["purchase_date"])
now = df["purchase_date"].max()
customer_metrics = df.groupby("customer_id").agg(
first_purchase=("purchase_date", "min"),
last_purchase=("purchase_date", "max"),
total_revenue=("revenue", "sum"),
purchase_count=("Transaction_id", "count"),
).reset_index()
customer_metrics["customer_lifetime_days"] = (
customer_metrics["last_purchase"] - customer_metrics["first_purchase"]
).dt.days
customer_metrics["days_since_last_purchase"] = (
now - customer_metrics["last_purchase"]
).dt.days
customer_metrics["avg_revenue_per_month"] = (
customer_metrics["total_revenue"]
/ (customer_metrics["customer_lifetime_days"] / 30)
).fillna(customer_metrics["total_revenue"])
customer_metrics["predicted_clv"] = (
customer_metrics["avg_revenue_per_month"] * prediction_months
)
def clv_segment(row):
if row["predicted_clv"] > 1000:
return "High Value"
elif row["predicted_clv"] > 300:
return "Medium Value"
else:
return "Low Value"
customer_metrics["clv_segment"] = customer_metrics.apply(clv_segment, axis=1)
return customer_metrics
clv_data = calculate_clv(pd.read_csv("crm_transactions.csv"))
print(clv_data["clv_segment"].value_counts())
print(clv_data.groupby("clv_segment")["predicted_clv"].mean())
Expected output: A segment distribution showing how many customers are high, medium, and low value. High-value customers (top 20%) typically generate 80% of predicted revenue, confirming the Pareto principle in your customer BASE.
Churn Prediction with Logistic Regression
Identify at-risk customers before they leave:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score
def build_churn_model(events_df, churn_label_col="is_churned"):
features = events_df.groupby("user_id").agg({
"active_days": "sum",
"support_tickets": "sum",
"days_since_last_login": "min",
"feature_uses": "sum",
"revenue": "sum",
}).fillna(0)
features["login_frequency"] = features["active_days"] / 90
features["ticket_rate"] = features["support_tickets"] / features["active_days"].clip(lower=1)
features["is_high_ticket"] = (features["ticket_rate"] > 0.5).astype(int)
y = events_df.groupby("user_id")[churn_label_col].max()
X = features.select_dtypes(include=[np.number])
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_State=42, stratify=y
)
model = LogisticRegression(max_iter=1000, class_weight="balanced")
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]
feature_importance = pd.DataFrame({
"feature": X.columns,
"coefficient": model.coef_[0],
}).sort_values("coefficient", ascending=False)
print(classification_report(y_test, y_pred))
print(f"ROC AUC: {roc_auc_score(y_test, y_prob):.3f}")
print("\nTop churn predictors:")
print(feature_importance.head(5))
return model, feature_importance
churn_model, importance = build_churn_model(
pd.read_csv("crm_user_events_90_days.csv"),
)
Expected output: A classification report showing precision, recall, and f1-score for churn prediction. The ROC AUC score should be above 0.75 for a useful model. The top predictors reveal which behaviors most strongly correlate with churn.
Tool Comparison
| Feature | Salesforce CRM Analytics | HubSpot Analytics | Tableau CRM | Custom (Python + SQL) |
|---|---|---|---|---|
| Pipeline velocity | Yes | Yes | Custom | Custom SQL |
| CLV calculation | Yes | Yes | Yes | Custom Python |
| Churn prediction | Einstein AI | Yes | Custom | ML models |
| Support analytics | Yes | Yes | Custom | Custom SQL |
| Data integration | Salesforce only | HubSpot only | Any source | Any source |
| Cost | $$$ | $$ | $$$ | Infrastructure |
Common Errors
1. CRM Data Quality Issues
CRM data is often incomplete, duplicated, or stale. Missing deal amounts, duplicate contacts, and outdated stage names produce unreliable analytics. Implement data validation rules in your CRM and run quarterly data cleanup.
2. Ignoring the Sales Cycle Length
CLV predictions using short observation Windows (30 days) are unreliable for products with annual or multi-year contracts. Match the observation window to at least 3 times your average sales cycle length.
3. Survivorship Bias in Win Rate Analysis
Analyzing won deals without considering lost and open deals inflates win rates. Include all deals in pipeline analysis. A 90% win rate on 10 deals that ignored 100 stalled deals is misleading.
4. Treating All Support Tickets Equally
Billing inquiries, feature requests, and bug reports require different responses. Categorize tickets by type before measuring resolution time and customer satisfaction. A fast resolution on billing is LESS impactful than a fast resolution on a critical bug.
5. Overfitting Churn Prediction Models
Churn models trained on historical data may not generalize to future customers with different behaviors. Regularly retrain models on recent data. Monitor feature importance for drift and recalibrate when model accuracy drops below threshold.
Practice Questions
1. What is pipeline velocity and how is it calculated? Pipeline velocity measures the speed at which deals move through the sales pipeline. It is calculated as (number of deals x win rate x average deal size) / average sales cycle length in days.
2. Why use Pareto principle (80/20 rule) in CRM analytics? The Pareto principle typically applies to customer value: 20% of customers generate 80% of revenue. CRM analytics identifies these high-value customers so resources can be focused on retention and expansion for the most profitable segment.
3. What is customer lifetime value and why does it matter? CLV predicts the total revenue a customer will generate during their relationship with your business. It determines how much you can spend on customer acquisition and which customer segments deserve premium support and retention investment.
4. How does support ticket analysis improve retention? Support ticket data reveals product friction points, documentation gaps, and feature requests. Analyzing ticket volume by category identifies which issues cause the most customer frustration, allowing proactive fixes that reduce churn.
5. Challenge: Export 12 months of CRM data including opportunities, contacts, activities, and support tickets. Build a pipeline velocity dashboard showing stage-by-stage conversion and time-in-stage. Calculate CLV for all active customers using the predictive model. Build a churn prediction model and identify the top 100 at-risk customers. Create a retention campaign targeting those customers with personalized outreach based on their churn drivers.
Mini Project
Build a complete CRM analytics dashboard that ingests data from a CRM API (Salesforce or HubSpot) into PostgreSQL. Include four views: pipeline velocity with stage-by-stage conversion, CLV distribution with segment breakdown, churn risk list with prediction scores, and support ticket trends by category. Configure weekly email reports to the sales team with at-risk customer alerts. Add a lead scoring model that ranks new leads by predicted conversion probability based on historical lead-to-win patterns.
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro