Causal Models vs Predictive Models: Why It Matters

Causal Models vs Predictive Models: Why It Matters

Tags
Causality
Python
Statistics
Published
July 20, 2025
Author
Aman Abdullayev
 

Introduction

In today’s data-driven world, businesses are waking up to a crucial reality: correlation is not causation. Just because two metrics move together doesn’t mean one causes the other. And when decisions involve money, time, and customer experience, guessing is not an option.
Enter causal models.

Causality

Causal models aim to answer a powerful question: "What would have happened if we had done something differently?"
They go beyond simple forecasting. While predictive models tell you what might happen, causal models explain why it happens and what would change under a different action.
In short, causal models help you understand which levers actually drive outcomes, allowing you to make decisions that move the needle.

Where Causal Models Are Used

  • Marketing → Campaign effectiveness, spend optimization
  • Product → Drivers of feature adoption or churn
  • Pricing → Impact of price changes
  • Operations → Effects of delivery time, staffing, etc.
Businesses don’t just want to know what’s happening—they want to influence it. Causal models enable confident, action-driven decisions. They’re no longer academic nice-to-haves—they're a competitive edge.

Predictive Models

Every time you get a Netflix recommendation, see a revenue forecast, or receive a fraud alert, a predictive model is likely behind it.
These models use historical data to forecast future outcomes. They learn patterns between features (X) and a target (Y), such as:
  • Will this user churn?
  • What will we sell next month?
  • How many orders will this customer make?
But here’s the key: predictive models don’t explain why something happens. That’s where causality comes in. If you want to know whether a discount caused a behavior change, you need a causal model.

Causality vs. Prediction in Action

Let’s illustrate the difference. Imagine you run an e-commerce business and want to estimate Customer Lifetime Value (CLV) over 365 days.

CAC Predicts CLV

You have data on how much you spent to acquire customers (CAC) and their CLVs:
df_cac.head()
customer_id
cac
clv_365
0
15.55
32.48
1
52.35
104.17
2
90.33
178.90
3
66.81
135.26
4
61.42
123.14
Let’s say CAC is highly correlated with CLV:
 
notion image
You train a Linear Regression model on this and get near-perfect results.
X_train, X_test, y_train, y_test = prep_data_for_regression(df_cac, "clv_365") # Train a linear regression model model_1 = LinearRegression() model_1.fit(X_train, y_train) y_pred = model_1.predict(X_test) mse = mean_squared_error(y_test, y_pred) r2 = r2_score(y_test, y_pred) print(f"Mean Squared Error: {mse:.2f}") print(f"R-squared: {r2:.2f}") >>>output Mean Squared Error: 9.94 R-squared: 0.99
notion image
 
Great, right? Not so fast. This is synthetic data where the correlation was intentionally built in. In real life, such a correlation often doesn’t exist. More importantly, a predictive model like this can’t answer: "Does higher CAC cause higher CLV?"
Should you spend more on acquisition just because the model predicts higher CLV with higher CAC? Definitely not.

More Features: Age, Gender, Location, App Usage

You get more customer data: age, gender, urban_loc, app_user , where data looks like this now:
df_customers.head()
customer_id
age
gender
urban_loc
app_user
clv_365
0
46
male
0
0
32.48
1
32
female
0
0
104.17
2
25
female
1
1
178.90
3
38
female
0
1
135.26
4
36
female
1
0
123.14
You retrain your model and achieve a great predictive score (R2 = 1.00, MSE = 4.62). Errors are normally distributed.
X_train, X_test, y_train, y_test = prep_data_for_regression(df_customers, "clv_365") # Train a linear regression model model_2 = LinearRegression() model_2.fit(X_train, y_train) y_pred = model_2.predict(X_test) mse = mean_squared_error(y_test, y_pred) r2 = r2_score(y_test, y_pred) print(f"Mean Squared Error: {mse:.2f}") print(f"R-squared: {r2:.2f}") >>>output Mean Squared Error: 4.62 R-squared: 1.00
notion image
If prediction is your only goal, you’re done. But if you want to know what drives CLV, you’re not.

Does App Usage Drive CLV?

Let’s check the coefficient for app_user. It shows that app users bring $19.6 more. Is that a causal effect? Or just a correlation?
# Feature Importance importance = pd.DataFrame( { "Feature": model_2.feature_names_in_, "Importance": model_2.coef_, } ).sort_values(by="Importance", ascending=False) importance
Feature
Importance
app_user
19.6
urban_loc
12.4
age
-14.8
gender
-23.8

Avg CLV: App Users vs. Non-App Users

Now, compare the average CLV of app users vs non-app users. The difference is $46.5.
app_user_mean = df.query("app_user == 1")["clv_365"].mean() non_app_user_mean = df.query("app_user == 0")["clv_365"].mean() difference = app_user_mean - non_app_user_mean
notion image
So, what’s the true effect—$19.6 or $46.5? Why did the estimated effect of app usage vary so much?

The Problem: Confounding

Let’s look at the age distribution: younger users are more likely to use the app and have higher CLV. That’s confounding: a third variable (age) influences both the treatment (app_user) and the outcome (CLV).
notion image
notion image
notion image
Specifically, age is a confounding variable:
  • Younger users are more likely to use the app
  • Younger users also tend to have higher CLV
This means that part of the observed effect of app usage is actually due to age, not the app itself.

Causal Models to the Rescue

There are many ways to estimate causal effects:
Here we use a simple method: S-learner, a type of Meta-Learner.

What is an S-Learner?

An S-learner uses a standard ML model to estimate the Conditional Average Treatment Effect (CATE) as following:
1. Train an ML model on the original data, use Tree-based estimators:
X_train, X_test, y_train, y_test = prep_data_for_regression(df_customers, "clv_365") model_3 = RandomForestRegressor(n_estimators=500, random_state=42) model_3.fit(X_train, y_train) y_pred = model_3.predict(X_test) mse = mean_squared_error(y_test, y_pred) rmse = np.sqrt(mse) r2 = r2_score(y_test, y_pred) print(f"Root Mean Squared Error: {rmse:.2f}") print(f"R-squared: {r2:.2f}") >>>ouput Root Mean Squared Error: 2.22 R-squared: 1.00
  1. Set app_user = 1 for all users and predict CLV:
X_all = pd.concat([X_train, X_test], axis=0) y_all = pd.concat([y_train, y_test], axis=0) model_3.fit(X_all, y_all) # X_all treated X_all["app_user"] = 1 # Set all customers to treatment (app user) y_pred_treated = model_3.predict(X_all)
  1. Set app_user = 0 for all users and predict again:
# X_all control X_all["app_user"] = 0 # Set all customers to control (non-app user) y_pred_control = model_3.predict(X_all)
  1. The difference gives individual treatment effects (CATEs).
# Calculate the treatment effect df_customers_analysis = df_customers.copy() df_customers_analysis["clv_if_all_app_users"] = y_pred_treated.round(2) df_customers_analysis["clv_if_all_non_app_users"] = y_pred_control.round(2) df_customers_analysis["treatment_effect"] = ( df_customers_analysis["clv_if_all_app_users"] - df_customers_analysis["clv_if_all_non_app_users"] ) df_customers_analysis.head()
And if we add the new prediction columns to our original dataset, it will look like this:
customer_id
age
gender
urban_loc
app_user
clv_365
clv_if_all_app_users
clv_if_all_non_app_users
treatment_effect
0
46
male
0
0
32.48
137.27
96.48
40.79
1
32
female
0
0
104.17
176.62
135.78
40.84
2
25
female
1
1
178.90
179.02
138.48
40.54
3
38
female
0
1
135.26
66.96
26.41
40.55
4
36
female
1
0
123.14
154.23
113.94
40.29
On the left chart, you see the distribution of treatment effects centered around $40. On the right, each customer’s actual CLV is shown alongside their hypothetical CLV if their app usage status were reversed—red dots represent CLV if they were not app users, and green dots if they were app users.
notion image
The average of all CATEs gives the Average Treatment Effect (ATE)—the expected CLV uplift if everyone used the app.

Conclusion

Using the S-learner approach, we estimate the average treatment effect of app usage on CLV to be $40. Compare that to:
  • Linear regression coefficient: $19.6 (underestimated)
  • Raw group difference: $46.5 (overestimated)
Since we generated the synthetic data with a true effect of $40, this confirms the S-learner result is accurate.
This example shows the power of causal models: if you want to understand and influence outcomes—not just predict them—they’re essential.
 

Appendix

Function used to generate the synthetic data:
def generate_customer_data(n): np.random.seed(42) customer_ids = np.arange(n) age = np.random.randint(18, 51, size=n) gender = np.random.choice(["male", "female", "female"], size=n) urban_loc = np.random.choice([0, 1], size=n) min_age, max_age = age.min(), age.max() normalized_age = (age - min_age) / (max_age - min_age) base_prob = 0.4 age_factor = 1.5 - normalized_age # 1.5 for youngest, 0.5 for oldest probabilities = base_prob * age_factor probabilities = np.clip(probabilities, 0, 1) app_user = np.random.binomial(1, probabilities, size=n) base_clv = ( age_factor * 50 + (gender == "female") * 50 + urban_loc * 25 + app_user * 40 ) cac = base_clv * 0.5 + np.random.normal( 0, base_clv.mean() * 0.01, size=n ) noise = np.random.normal(0, base_clv.mean() * 0.02, size=n) # 10% noise clv_365 = base_clv + noise # Create DataFrame df = pd.DataFrame( { "customer_id": customer_ids, "age": age, "gender": gender, "urban_loc": urban_loc, "app_user": app_user, "cac": np.round(cac, 2), "clv_365": np.round(clv_365, 2), } ) return df
 
God is the only true cause. All other causes are instruments through which His will is realized. Fakhr al-Din al-Razi