Example 1: Causal Analysis with a Synthetic Dataset

Objective: Explore the causal relationship between the number of study hours and exam scores using a synthetic dataset.

Step 1: Import Necessary Libraries

Python3

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

Step 2: Create a Synthetic Dataset

Python3

np.random.seed(42)
study_hours = np.random.normal(30, 10, 100)
exam_scores = 50 + 2 * study_hours + np.random.normal(0, 20, 100)
data = pd.DataFrame({'Study_Hours': study_hours, 'Exam_Scores': exam_scores})

Step 3: Visualize the Data

Python3

plt.scatter(data['Study_Hours'], data['Exam_Scores'])
plt.title('Synthetic Dataset: Study Hours vs. Exam Scores')
plt.xlabel('Study Hours')
plt.ylabel('Exam Scores')
plt.show()

Output:

Explanation: This scatter plot visually represents our synthetic dataset, where the x-axis shows study hours, and the y-axis shows exam scores. We can observe a positive trend, suggesting a potential correlation.

Step 4: Split the Dataset

Python3

X_train, X_test, y_train, y_test = train_test_split(data[['Study_Hours']], data['Exam_Scores'], test_size=0.2, random_state=42)

Explanation: Splitting the dataset into training and testing sets allows us to train our model on one subset and evaluate its performance on another, ensuring unbiased results.

Step 5: Train a Linear Regression Model

Python3

model = LinearRegression()
model.fit(X_train, y_train)

Output:

Linear Regression

Explanation: Linear regression is chosen to model the relationship between study hours and exam scores. Training the model involves finding the best-fit line that minimizes the difference between predicted and actual exam scores.

Step 6: Visualize the Regression Line

Python3

plt.scatter(X_test, y_test)
plt.plot(X_test, model.predict(X_test), color='red', linewidth=2)
plt.title('Linear Regression: Study Hours vs. Exam Scores')
plt.xlabel('Study Hours')
plt.ylabel('Exam Scores')
plt.show()

Output:

Explanation: The red line represents the regression model’s prediction. This line summarizes the relationship between study hours and exam scores, showcasing the model’s ability to generalize.

How to perform Causal Analysis?

Causal analysis is a powerful technique that can help you understand why something happens and how to prevent or improve it, in other words, it helps us understand the relationships between different events or variables. Causal analysis can offer insightful information when doing research, fixing issues, or making judgments.

In this article, we’ll break down the concept of causal analysis, step by step, catering to beginners who are new to this intriguing field.

Table of Content

What is Causal Analysis?
How to Perform Causal Analysis?
Steps to Perform Casual Analysis
What are the Benefits of Causal Analysis?
Example Case of Causal Analysis
Example 1: Causal Analysis with a Synthetic Dataset
Example 2: Propensity Score Matching
Example3: using CasualPY(Public)
Tips for Performing Causal Analysis

Example 1: Causal Analysis with a Synthetic Dataset

Python3

Step 2: Create a Synthetic Dataset

Python3

Step 3: Visualize the Data

Python3

Step 4: Split the Dataset

Python3

Python3

Python3

How to perform Causal Analysis?

Similar Reads

Contact Us