Example3: using CasualPY(Public)

Python3

# Import libraries
import causalpy as cp
import matplotlib.pyplot as plt
import seaborn as sns
 
# Import and process data
df = (cp.load_data("drinking") # Load the data from the NLSY dataset
      .rename(columns={"agecell": "age"}) # Rename the column for age
      .assign(treated=lambda df_: df_.age > 21) # Assign a binary variable for treatment status
      .dropna()) # Drop the missing values
 
# Make assumptions
# We assume that the outcome variable (all) is continuous and smooth around the cutoff point (21)
# We assume that there is no manipulation or sorting of the running variable (age) around the cutoff point
# We assume that the treatment assignment (treated) is unconfounded, meaning that there are no other variables that affect both the treatment and the outcome
 
# Model the counterfactual
# We use a linear regression model with a constant term, the running variable, and the treatment variable as predictors
# We specify the running variable name, the treatment threshold, and the model object
result = cp.pymc_experiments.RegressionDiscontinuity(df, 
                                                     formula="all ~ 1 + age + treated", 
                                                     running_variable_name="age", 
                                                     model=cp.pymc_models.LinearRegression(), 
                                                     treatment_threshold=21)
 
# Estimate the causal effect
# We use the summary method to get the ATE, the standard error, and the confidence interval
result.summary()
# The output shows that the ATE is -0.052, meaning that drinking alcohol reduces the health outcome by 0.052 units on average
# The standard error is 0.017, and the 95% confidence interval is [-0.086, -0.018]
 
# Visualize the results
# We use the plot method to get a scatter plot of the data and the fitted model, with the discontinuity at the cutoff point
fig, ax = result.plot()
plt.show()
# The plot shows that the outcome variable (all) decreases sharply at the cutoff point (21), indicating a negative causal effect of drinking alcohol
# We can also plot the distribution of the running variable (age) and the outcome variable (all), and check for any anomalies or outliers
sns.histplot(data=df, x="age", hue="treated", bins=20)
plt.show()
# The histogram shows that the running variable (age) is roughly balanced on both sides of the cutoff point, with no evidence of manipulation or sorting
sns.histplot(data=df, x="all", hue="treated", bins=20)
plt.show()
# The histogram shows that the outcome variable (all) is skewed to the right, with some outliers on the lower end

Output:

How to perform Causal Analysis?

Causal analysis is a powerful technique that can help you understand why something happens and how to prevent or improve it, in other words, it helps us understand the relationships between different events or variables. Causal analysis can offer insightful information when doing research, fixing issues, or making judgments.

In this article, we’ll break down the concept of causal analysis, step by step, catering to beginners who are new to this intriguing field.

Table of Content

What is Causal Analysis?
How to Perform Causal Analysis?
Steps to Perform Casual Analysis
What are the Benefits of Causal Analysis?
Example Case of Causal Analysis
Example 1: Causal Analysis with a Synthetic Dataset
Example 2: Propensity Score Matching
Example3: using CasualPY(Public)
Tips for Performing Causal Analysis

Example3: using CasualPY(Public)

Python3

How to perform Causal Analysis?

Similar Reads

Contact Us