Application of Box-Jenkins Methodology

Here we are using apple stock data from yfinance, we will be using Box-Jenkins method to analyze the stock data, here’s the step-by-step code with explanation:

Importing Libraries:

The code imports necessary libraries yfinance for downloading stock price data, pandas for data manipulation, matplotlib.pyplot for plotting, statsmodels for time series analysis and ARIMA modeling, warnings to suppress warnings during execution.

Python3

import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.stats.diagnostic import acorr_ljungbox
 
import warnings
warnings.filterwarnings('ignore')

                    

Function Definitions:

Now we will be using the functions that are defined for checking stationarity using the Augmented Dickey-Fuller (ADF) test and for plotting the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF).

Python3

# Function to check stationarity using Augmented Dickey-Fuller test
def check_stationarity(ts):
    result = adfuller(ts)
    print(f'ADF Statistic: {result[0]}')
    print(f'p-value: {result[1]}')
    print(f'Critical Values: {result[4]}')
 
# Function to plot ACF and PACF
def plot_acf_pacf(ts):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
    plot_acf(ts, ax=ax1, lags=20)
    plot_pacf(ts, ax=ax2, lags=20)
    plt.show()

                    


Data Loading and Preprocessing:

Stock price data for Apple Inc. (AAPL) is downloaded using yfinance. The data is collected from the start of 2015 to the start of 2023. Log returns are calculated to stabilize variance and make the time series more suitable for modeling.

Python3

# Load stock data
stock_symbol = "AAPL"
start_date = "2015-01-01"
end_date = "2023-01-01"
stock_data = yf.download(stock_symbol, start=start_date, end=end_date)['Close']
 
# Log transformation to stabilize variance
log_returns = stock_data.pct_change().dropna()
log_returns = log_returns.apply(lambda x: pd.np.log(1 + x))

                    


Stationarity Check and Differencing:

The stationarity of the log returns is checked before and after differencing. The time series is differenced to achieve stationarity. ACF and PACF plots are created for the differenced series to help determine ARIMA orders.

Python3

# Check stationarity
check_stationarity(log_returns)
 
# Differencing to make the series stationary
log_returns_diff = log_returns.diff().dropna()
 
# Check stationarity after differencing
check_stationarity(log_returns_diff)
 
# Plot ACF and PACF after differencing
plot_acf_pacf(log_returns_diff)

                    

Output:

ADF Statistic: -13.869148958528394
p-value: 6.51329302121344e-26
Critical Values: {'1%': -3.4336173133865064, '5%': -2.86298332472282, '10%': -2.5675383641200633}
ADF Statistic: -14.058039719328459
p-value: 3.091971442666415e-26
Critical Values: {'1%': -3.433648628001351, '5%': -2.8629971502062155, '10%': -2.5675457254979093}


ACF and PACF Plots



Model Order Selection with AIC and BIC

The code iterates through different values of p, d, and q to find the combination that minimizes both the AIC and BIC values, helping to identify the optimal ARIMA model order.

Python3

# Find optimal values for p, d, q based on AIC and BIC
best_aic = float('inf')
best_bic = float('inf')
best_order = None
 
for p in range(3):  # Choose a range for p
    for d in range(2):  # Choose a range for d
        for q in range(3):  # Choose a range for q
            arima_model = ARIMA(log_returns, order=(p, d, q))
            arima_results = arima_model.fit()
             
            # Calculate AIC and BIC
            current_aic = arima_results.aic
            current_bic = arima_results.bic
             
            # Update best values
            if current_aic < best_aic and current_bic < best_bic:
                best_aic = current_aic
                best_bic = current_bic
                best_order = (p, d, q)
 
print(f'Best AIC: {best_aic}, Best BIC: {best_bic}, Best Order: {best_order}')

                    

Output:

Best AIC: -10277.232291010881, Best BIC: -10260.410146733962, Best Order: (0, 0, 1)

ARIMA Model Fitting and Diagnostics:

The ARIMA model is fitted using the optimal orders obtained from the AIC and BIC selection process. Diagnostics are performed on the residuals, including checking for stationarity. The Ljung-Box test is conducted to assess the autocorrelation in residuals.

Python3

# Fit ARIMA model with the best order
arima_model = ARIMA(log_returns, order=best_order)
arima_results = arima_model.fit()
 
# Diagnostics
residuals = arima_results.resid
check_stationarity(residuals)
 
# Ljung-Box test for autocorrelation in residuals
lb_test_stat, lb_test_pvalue = acorr_ljungbox(residuals, lags=20)
print(f'Ljung-Box test statistics: {lb_test_stat}')
print(f'Ljung-Box p-values: {lb_test_pvalue}')

                    

Output:

ADF Statistic: -13.478138873971695
p-value: 3.2812344010002946e-25
Critical Values: {'1%': -3.4336189466940414, '5%': -2.8629840458358933, '10%': -2.5675387480760885}
Ljung-Box test statistics: lb_stat
Ljung-Box p-values: lb_pvalue

Plotting Results:

Finally, the observed log returns and the fitted values from the ARIMA model are plotted to visualize the model’s performance.

Python3

# Plotting the predicted vs. actual values
plt.figure(figsize=(12, 6))
plt.plot(log_returns_diff, label='Observed')
plt.plot(arima_results.fittedvalues, color='red', label='Fitted', alpha=0.7)
plt.legend()
plt.title(f'ARIMA{best_order} Model for {stock_symbol} Stock Returns')
plt.show()

                    

Output:

Observed vs fitted model with best order

The code mentioned above provides a comprehensive example of applying the Box-Jenkins methodology, including stationarity checks, differencing, model fitting, diagnostics, and result visualization for time series analysis and forecasting of stock returns. Adjustments to the model orders and parameters may be necessary based on the diagnostic results.



Box-Jenkins Methodology for ARIMA Models

Time series data records data points with respect to time intervals. The analysis of such dataset is important to recognize patterns and making predictions as well as providing informative insights. Box-Jenkins model is a forecasting method that is used to forecasts time series data for a specific period of time.

In this article we will be taking a dive into the Box-Jenkins method for ARIMA modelling as it helps us analyze and forecast time series data.

Table of Content

  • ARIMA Modelling
  • Box-Jenkins Method
  • Application of Box-Jenkins Methodology

Let us first discuss an overview about what is an ARIMA model so that we can get a sound understanding about the process.

Similar Reads

ARIMA Modelling

ARIMA modelling or Autoregressive Integrated Moving Average is a time series analysis and forecasting method, the ARIMA model is a combination of autoregression, differencing and moving average which are used in the modelling of time series. Let’s break it down and discuss the different components one by one:...

Box-Jenkins Method

Box-Jenkins method is a type of forecasting and analyzing methodology for time series data. Box-Jenkins method comprises of three stages through which time series analysis could be performed. It comprises of different steps including identification, estimation, diagnostic checking, model refinement and forecasting. The Box-Jenkins method is an iterative process, and steps 1 to 4 from identification to model refinement are often repeated until a suitable and well-diagnosed model is obtained. It is important to note that the method assumes that the underlying time series data is generated by a stationary and linear process. The different stages of the Box-Jenkins model could be identified as:...

Application of Box-Jenkins Methodology

Here we are using apple stock data from yfinance, we will be using Box-Jenkins method to analyze the stock data, here’s the step-by-step code with explanation:...

Contact Us