Example 2: Propensity Score Matching

Propensity score matching is a technique that aims to reduce the bias due to confounding variables by matching units that have similar probabilities of receiving the treatment, based on their observed characteristics. For instance, we can match smokers and non-smokers with comparable ages, genders, and health statuses to evaluate the influence of smoking on lung cancer and compare the results.

We will utilize a synthetic dataset that mimics the impact of a training program on employee performance to demonstrate this technique. The four variables in the dataset are outcome, covariate, treatment, and id. Each employee has a unique identifier or ID; the treatment is a binary indicator of whether or not the employee took part in the training program; the outcome, or measure of employee performance, is a continuous variable; the covariate is a continuous variable that represents some confounding factor that influences both the treatment and the outcome.

We will use the sklearn and causal inference libraries to generate and analyze the data. The code and the output are shown below.

Python3




# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from causalinference import CausalModel
 
# Set random seed for reproducibility
np.random.seed(42)
 
# Generate synthetic data
n = 1000 # number of observations
X, y = make_regression(n_samples=n, n_features=1, n_informative=1, noise=10, random_state=42) # generate covariate and outcome
treatment = np.random.binomial(1, p=0.5, size=n) # generate treatment indicator
y[treatment==1] += 5 # add treatment effect
data = pd.DataFrame({'id': np.arange(n), 'treatment': treatment, 'covariate': X.flatten(), 'outcome': y}) # create dataframe
data.head()


Ouput:

Propensity Score Matching

Python3




# Plot the data
plt.figure(figsize=(8,6))
plt.scatter(data['covariate'], data['outcome'], c=data['treatment'], cmap='bwr', alpha=0.5)
plt.xlabel('Covariate')
plt.ylabel('Outcome')
plt.title('Synthetic Data')
plt.show()


Output:

The plot indicates that the covariate and the outcome, as well as the treatment and the outcome, have a positive connection. However, because the treatment assignment may rely on the covariate, a confounding factor, we are unable to deduce the treatment’s causal effect from this connection. Propensity score matching is one technique we can use to account for the covariate to assess the causal influence.

The likelihood of receiving the therapy in light of the observed covariates is known as the propensity score. A logistic regression model can be used to estimate the propensity score. This way, we can create a balanced sample that has similar distributions of the covariates across the treatment groups, and then compare the average outcomes of the matched pairs.

We will use the CausalModel class from the causal inference library to perform the propensity score matching. The code and the output are shown below.

Python3




# Create a causal model
cm = CausalModel(
    Y=data['outcome'].values, # outcome variable
    D=data['treatment'].values, # treatment variable
    X=data['covariate'].values # covariate variable
)
 
# Estimate the propensity score
cm.est_propensity_s()
cm.propensity
 
# Perform propensity score matching
cm.trim_s() # trim units with extreme propensity scores
cm.stratify_s() # stratify units into bins based on propensity score
cm.est_via_matching() # estimate the treatment effect via matching
cm.estimates


Output:

{'matching': {'atc': 5.435467575470179, 'att': 5.660317763899948, 'ate': 5.5472181191197745, 'atc_se': 1.1868216799057065, 'att_se': 1.2189135556978998, 'ate_se': 1.0618794080326954}}

The output shows the estimated average treatment effect (ATE), the average treatment effect on the controls (ATC), and the average treatment effect on the treated (ATT), along with their standard errors and confidence intervals. We can see that the estimated effect is very close to the true effect of 5 that we added to the data, and the confidence intervals are fairly narrow. This means that the propensity score matching technique can reduce the bias due to the confounding covariate and estimate the causal effect of the treatment accurately.

How to perform Causal Analysis?

Causal analysis is a powerful technique that can help you understand why something happens and how to prevent or improve it, in other words, it helps us understand the relationships between different events or variables. Causal analysis can offer insightful information when doing research, fixing issues, or making judgments.

In this article, we’ll break down the concept of causal analysis, step by step, catering to beginners who are new to this intriguing field.

Table of Content

  • What is Causal Analysis?
  • How to Perform Causal Analysis?
  • Steps to Perform Casual Analysis
  • What are the Benefits of Causal Analysis?
  • Example Case of Causal Analysis
  • Example 1: Causal Analysis with a Synthetic Dataset
  • Example 2: Propensity Score Matching
  • Example3: using CasualPY(Public)
  • Tips for Performing Causal Analysis

Similar Reads

What is Causal Analysis?

Causal analysis is the process of identifying and addressing the causes and effects of a phenomenon, problem, or event. It is about figuring out how one variable (the cause) affects or determines another variable (the effect), as well as recognizing the relationships between various occurrences and how changes in one variable might affect another. For example, smoking causes lung cancer, or increasing the price of a product reduces its demand. To get useful conclusions from data, this technique is frequently applied in disciplines including science, economics, and medicine. Causal analysis can help you answer questions such as:...

How to Perform Causal Analysis?

Depending on the type of causal analysis, the data, and the research topic, there may be differences in the processes involved in doing the analysis. However, a general framework that you can follow is:...

Steps to Perform Casual Analysis

Define the Problem: Begin by clearly defining the problem or issue you want to analyze causally. This step sets the foundation for the entire process. Identify Variables: Break down the problem into different variables. Variables are factors that can change or be changed. For example, if you’re investigating the reasons for low productivity, variables could include workload, employee satisfaction, and work environment. Collect Data: Gather relevant data for each variable. This can involve surveys, experiments, observations, or even analyzing existing data sets. Make sure your data is accurate and comprehensive. Establish Relationships: Determine how the variables are related to each other. Use statistical methods or visual tools like graphs and charts to identify patterns and correlations. Distinguish Correlation from Causation: It is important to realize that correlation does not equal causation. A correlation between two variables does not imply that one causes the other. It is necessary to comprehend the fundamental mechanisms of causation in more detail. Consider Confounding Variables: Recognize confounding variables, which are elements that may affect the observed connection between variables and skew findings. Precise causal analysis requires accounting for these factors....

What are the Benefits of Causal Analysis?

There are several advantages to using causal analysis, including:...

Example Case of Causal Analysis

Here are some examples of causal analysis that you can refer to:...

Example 1: Causal Analysis with a Synthetic Dataset

Objective: Explore the causal relationship between the number of study hours and exam scores using a synthetic dataset....

Example 2: Propensity Score Matching

...

Example3: using CasualPY(Public)

...

Tips for Performing Causal Analysis

...

FAQs on Casual Analysis

...

Contact Us