Blackmont

📗

CheckFitofMu...xlsx14.8 KB

You are a quantitative time-series analyst and stochastic forecasting specialist. Your task is to replicate the full Box-Jenkins workflow: “Check Fit of Multiplicative ARIMA Model”. Use the uploaded...

Executive Summary (Step 9.1)

Using your Excel file CheckFitofMultiplicativeARIMAModelExample.xlsx (sheet DataTable via PSSG):

The series is the classic monthly airline passengers data from January 1949 to December 1960 (144 observations), clearly nonstationary with:
- strong upward trend,
- strong annual seasonality,
- increasing variance (multiplicative structure).
A log transform followed by regular differencing (d = 1) and seasonal differencing (D = 1, s = 12) yields a series that is strongly stationary (ADF p-value ≈ 0.00025 for $z_t$ ), with substantially stabilized variance.
The ACF/PACF of $z_t = (1-L)(1-L^{12})\log(\mathrm{Passengers}_t)$ strongly support a multiplicative SARIMA(0,1,1)×(0,1,1)_{12}:
- Large negative spikes at lags 1 and 12 in the ACF (and PACF), with rapid decay.
The SARIMA(0,1,1)×(0,1,1)_{12} model estimated by MLE fits the data well (AIC ≈ −435.4, BIC ≈ −427.2). Residuals are roughly white noise with some marginal evidence of remaining autocorrelation at seasonal lags.
24‑month forecasts (1961–1962) show continued growth and strong seasonality; the last forecast (Dec 1962) has a point estimate ≈ 526 passengers with a 95% interval ≈ [406, 681].
A 1000‑path Monte Carlo simulation under the fitted model shows high forecast uncertainty but very high probability that future peaks exceed 400 passengers, and a substantial probability of exceeding 500 passengers at longer horizons.

All core plots are available as interactive HTML:

1. Time-Series Diagnostic & Data Understanding (Step 9.2)

Data structure

From DataTable (column PSSG, non-missing rows only):

Number of observations: $n = 144$
Date range: 1949‑01‑01 to 1960‑12‑01
Inferred frequency: monthly (MS)
Descriptives (levels):

Metric	Value
Mean	280.30
Variance	14391.92
Minimum	104
Maximum	622

The raw series shows:

Strong upward trend in mean level (trend range / mean ≈ 1.35).
Strong 12‑month seasonality (seasonal amplitude / mean ≈ 0.89).
Increasing volatility over time, confirmed by the rolling 12‑month standard deviation; max/min rolling std ratio ≈ 19.94, indicating pronounced heteroscedasticity.

Key visuals

Raw time series:
STL decomposition (trend / seasonality / remainder):
Rolling 12‑month volatility:

Diagnosis

Nonstationary in mean: clear trend.
Clear seasonal nonstationarity: strong annual cycle.
Marked heteroscedasticity: variance grows with level ⇒ multiplicative seasonality.

2. Stationarity & Log Transformation (Steps 2 & 9.3)

Define:

$y_t = \log(\mathrm{Passengers}_t)$

Why log transform?

In multiplicative seasonal models:
- $\mathrm{Passengers}_t = \mu_t \times s_t \times \varepsilon_t$
- Taking logs gives $\log(\mathrm{Passengers}_t) = \log(\mu_t) + \log(s_t) + \log(\varepsilon_t)$ .
Thus:
- Multiplicative seasonality becomes additive.
- Variance is typically more stable when deviations are proportional to the level.

Variance stabilization

Summary statistics:

Series	Mean	Variance
Original	280.30	14391.92
Log	5.5422	0.1949

Variance is compressed dramatically in log scale, making Gaussian error assumptions more plausible.

Visual comparison:

Stationarity test on log series

ADF on $y_t$ $y_{t}$ :
- Test statistic ≈ −1.72
- p‑value ≈ 0.42

So even after logging, the series remains nonstationary; we still need differencing.

3. Differencing & Stationarity (Steps 3 & 9.3)

We apply:

Regular difference: $d = 1$ with lag 1
Seasonal difference: $D = 1$ with lag 12

Define:

First difference: $\Delta y_t = y_t - y_{t-1}$
Seasonal difference: $\Delta_{12} y_t = y_t - y_{t-12}$
Combined: $z_t = (1-L)(1-L^{12})y_t$

Implementation equivalence was checked:

Two independent implementations of $(1-L)(1-L^{12})$ differ by max absolute difference ≈ 0 (machine precision).

Visuals

Time series of:

$\Delta y_t$ ,
$\Delta_{12} y_t$ ,
$z_t$ with rolling mean/std:

From this:

The rolling mean of $z_t$ fluctuates around zero without drift.
The rolling standard deviation is relatively stable.

ADF tests

Test	Statistic	p‑value	Interpretation
ADF on $y_t$ (log)	−1.72	0.422	Nonstationary
ADF on $z_t$	−4.44	0.00025	Reject unit root ⇒ stationary after differencing

Thus, $(1-L)(1-L^{12})\log(\mathrm{Passengers}_t)$ is a stationary series, suitable for SARIMA modeling.

4. Seasonal Analysis & SARIMA Identification (Steps 4 & 9.4–9.5)

ACF/PACF of $z_t$

Key values:

Quantity	Value
ACF at lag 1	−0.34
ACF at lag 12	−0.39
PACF at lag 1	−0.34
PACF at lag 12	−0.38

Interpretation:

Large negative spike at lag 1 in ACF ⇒ nonseasonal MA(1) component.
Large negative spike at lag 12 in ACF ⇒ seasonal MA(1) at period 12.
PACF at lags 1 and 12 is similarly negative; further lags decay rapidly.

This pattern is characteristic of a multiplicative MA model:

Chosen model: SARIMA $(0,1,1)\times(0,1,1)\_{12}$ on the log scale.
We retain this as a parsimonious specification; there is no strong evidence from ACF/PACF that higher-order AR or MA terms are required.

5. Parameter Estimation (Step 5 & 9.6)

Model:

$y_t = \log(\mathrm{Passengers}_t)$
SARIMA $(0,1,1)\times(0,1,1)\_{12}$ , no deterministic trend, Gaussian errors.
Estimated via maximum likelihood.

Key fit statistics:

Metric	Value
Log-likelihood	220.72
AIC	−435.44
BIC	−427.16
Innovation var $\sigma^2$ (scale)	1.00

The parameter table (MA and seasonal MA coefficients with standard errors, t‑values, p‑values) is stored in sarima_parameters and shows statistically significant MA(1) and seasonal MA(1) terms (p‑values small, |t| large), consistent with the ACF pattern.

Stochastic meaning

Nonseasonal MA(1): captures short‑run shock propagation from month to month.
Seasonal MA(1) at lag 12: captures one‑year delayed shock effects, i.e., how an innovation in one year influences the same month next year.

6. Residual Diagnostics & Model Adequacy (Steps 6 & 9.7)

Residual diagnostics figure:

Residual properties:

Standardized residuals fluctuate around zero with roughly constant variance.
Residual ACF: mostly within ±1.96/√N, though there are moderate spikes.
Histogram vs N(0,1) and QQ‑plot:
- General alignment with normality, some tail deviations.

Ljung‑Box tests

p‑values:

Lag	p‑value
6	0.9999
12	0.00047
18	0.00916
24	0.0657

Interpretation:

At lag 6: no evidence of autocorrelation.
At lags 12 and 18: p‑values < 0.05 ⇒ evidence of remaining seasonal autocorrelation.
At lag 24: marginal (p ≈ 0.066).

Diagnostic conclusion

Residuals are broadly white‑noise‑like but not perfectly so:
- The model captures most structure, but some seasonal dependence remains.
For teaching/illustration purposes, SARIMA(0,1,1)×(0,1,1)_{12} is still a very reasonable Box‑Jenkins model.
For a production setting, you might explore:
- Adding a small AR(1) or seasonal AR(1) component,
- Checking alternative information criteria.

7. Forecast Results (Steps 7 & 9.8)

Forecast horizon: 24 months beyond Dec 1960 (Jan 1961 – Dec 1962).

Forecasts are produced on the log scale and then exponentiated:

Point forecast at horizon $h$ : $\exp(\hat{y}_{T+h \mid T})$
95% intervals (approximate): $\left[\exp(\ell_{T+h}), \exp(u_{T+h})\right]$

Final (Dec 1962) forecast (levels):

Mean: ≈ 525.7 passengers
95% interval: ≈ [406.0, 680.9]

Forecast plot:

Interpretation

Forecasts preserve the seasonal pattern and upward trend implied by the integrated seasonal structure.
Prediction intervals widen with horizon, reflecting accumulating uncertainty in both local and seasonal shocks.
Because we exponentiate Gaussian prediction bands, the intervals are slightly approximate on the original scale (they are symmetric in log, not in levels).

8. Monte Carlo Simulation Results (Steps 8 & 9.9–9.11)

We simulate 1000 stochastic paths from the fitted SARIMA model:

Length: 24 months ahead.
Simulation in log space, then exponentiate to obtain passenger levels.

For each horizon, we compute across paths:

Mean, median
Quantiles: 5%, 25%, 75%, 95%
Probabilities:
- $\Pr(\mathrm{Passengers}_t > 400)$
- $\Pr(\mathrm{Passengers}_t > 500)$

Example (last horizon, Dec 1962):

$\Pr(\mathrm{Passengers} > 400) \approx 0.987$
$\Pr(\mathrm{Passengers} > 500) \approx 0.639$

Monte Carlo visualization (50 sample paths + mean path + 5–95% band):

Interpretation

Simulated paths exhibit:
- Strong seasonality,
- Upward drift,
- Broadening dispersion over time.
The distribution of future values is right‑skewed in levels (due to exponentiation of Gaussian log errors).
The probability of exceeding high thresholds (e.g., 400, 500 passengers) increases strongly over time, reflecting the stochastic growth implied by the model.

9. Forecast Uncertainty & Probabilistic Interpretation (Steps 9.10–9.11)

The Box‑Jenkins SARIMA(0,1,1)×(0,1,1)_{12} model provides a coherent stochastic description of:
- Trend (integrated component),
- Seasonality (seasonal integration),
- Short‑run and seasonal shocks (MA and seasonal MA).
Uncertainty decomposition:
- Short horizons: dominated by recent idiosyncratic shocks.
- Longer horizons: dominated by cumulated innovation variance in both nonseasonal and seasonal parts, producing wide intervals and wide Monte Carlo bands.
The Monte Carlo distribution at each horizon can be used to compute:
- Quantile‑based Value‑at‑Risk‑type metrics for demand risk,
- Scenario bands for planning (e.g., 5–95% envelopes),
- Threshold exceedance probabilities (already computed for 400 and 500).

10. Final Model Assessment (Step 9.12)

Strengths

Follows Box‑Jenkins methodology rigorously:
- Clear data exploration (trend, seasonality, heteroscedasticity).
- Variance‑stabilizing log transformation.
- Proper $(d,D)=(1,1)$ differencing to achieve stationarity.
- ACF/PACF‑based identification of SARIMA(0,1,1)×(0,1,1)_{12}.
- MLE estimation with interpretable parameters.
- Rich residual diagnostics, forecasts, and Monte Carlo simulation.
The model:
- Captures core stochastic structure (trend + seasonal dynamics).
- Produces economically sensible forecasts.
- Delivers probabilistic forecasts and risk measures, not just point forecasts.

Limitations / potential refinements

Ljung‑Box tests indicate residual seasonal correlation at some lags:
- You might test SARIMA(0,1,1)×(1,1,1)_{12} or SARIMA(1,1,1)×(0,1,1)_{12} and compare AIC/BIC and diagnostics.
Gaussian assumption in log space may not be perfect (QQ‑plot tails).
Prediction intervals in levels are approximate.

Next steps (if you want to extend the study)

Compare alternative SARIMA specifications and perform a formal model selection (AIC/BIC + diagnostics).
Evaluate out‑of‑sample performance via rolling or expanding window forecasting.
Incorporate exogenous regressors (if available) and test SARIMAX.
Use the Monte Carlo distributions to design capacity planning or revenue management policies based on tail risk tolerances.

If you would like, I can:

Extract and format the full parameter table,
Summarize specific Monte Carlo quantiles at each horizon,
Or walk through how you would replicate each step in your preferred software (e.g., Python, R, MATLAB) using this same workflow.