Proxima Alpha - Powered by Santander

Santander

Últimos 7 días

chat_bubble_outlineYou are a quantitative ti...
chat_bubble_outlineUnderstant first this ins...
chat_bubble_outlineUnderstant first this ins...
chat_bubble_outlineOBJECTIVE Build the COMP...
chat_bubble_outlineObjective: Create a stand...
chat_bubble_outlineObjective: Download and a...
chat_bubble_outlineEx30.- Energy transitionlabel
chat_bubble_outlineYou are an elite AI Strat...
chat_bubble_outlineEx 29.- Commodities Garchlabel
chat_bubble_outlineEx 28.- Top Cheflabel
chat_bubble_outlineEx 27.- top 25 Q-Alabel
settings Ajustes
help_outlineAyuda
chevron_right
📗
CheckFitofMu...xlsx14.8 KB
You are a quantitative time-series analyst and stochastic forecasting specialist. Your task is to replicate the full Box-Jenkins workflow: “Check Fit of Multiplicative ARIMA Model”. Use the uploaded...

Blackmont

Executive Summary (Step 9.1)

Using your Excel file CheckFitofMultiplicativeARIMAModelExample.xlsx (sheet DataTable via PSSG):

  • The series is the classic monthly airline passengers data from January 1949 to December 1960 (144 observations), clearly nonstationary with:

    • strong upward trend,
    • strong annual seasonality,
    • increasing variance (multiplicative structure).
  • A log transform followed by regular differencing (d = 1) and seasonal differencing (D = 1, s = 12) yields a series that is strongly stationary (ADF p-value ≈ 0.00025 for ztz_tzt​), with substantially stabilized variance.

  • The ACF/PACF of zt=(1−L)(1−L12)log⁡(Passengerst)z_t = (1-L)(1-L^{12})\log(\mathrm{Passengers}_t)zt​=(1−L)(1−L12)log(Passengerst​) strongly support a multiplicative SARIMA(0,1,1)×(0,1,1)_{12}:

    • Large negative spikes at lags 1 and 12 in the ACF (and PACF), with rapid decay.
  • The SARIMA(0,1,1)×(0,1,1)_{12} model estimated by MLE fits the data well (AIC ≈ −435.4, BIC ≈ −427.2). Residuals are roughly white noise with some marginal evidence of remaining autocorrelation at seasonal lags.

  • 24‑month forecasts (1961–1962) show continued growth and strong seasonality; the last forecast (Dec 1962) has a point estimate ≈ 526 passengers with a 95% interval ≈ [406, 681].

  • A 1000‑path Monte Carlo simulation under the fitted model shows high forecast uncertainty but very high probability that future peaks exceed 400 passengers, and a substantial probability of exceeding 500 passengers at longer horizons.

All core plots are available as interactive HTML:


1. Time-Series Diagnostic & Data Understanding (Step 9.2)

Data structure

From DataTable (column PSSG, non-missing rows only):

  • Number of observations: n=144n = 144n=144
  • Date range: 1949‑01‑01 to 1960‑12‑01
  • Inferred frequency: monthly (MS)
  • Descriptives (levels):
MetricValue
Mean280.30
Variance14391.92
Minimum104
Maximum622

The raw series shows:

  • Strong upward trend in mean level (trend range / mean ≈ 1.35).
  • Strong 12‑month seasonality (seasonal amplitude / mean ≈ 0.89).
  • Increasing volatility over time, confirmed by the rolling 12‑month standard deviation; max/min rolling std ratio ≈ 19.94, indicating pronounced heteroscedasticity.

Key visuals

  • Raw time series:

  • STL decomposition (trend / seasonality / remainder):

  • Rolling 12‑month volatility:

Diagnosis

  • Nonstationary in mean: clear trend.
  • Clear seasonal nonstationarity: strong annual cycle.
  • Marked heteroscedasticity: variance grows with level ⇒ multiplicative seasonality.

2. Stationarity & Log Transformation (Steps 2 & 9.3)

Define:

  • yt=log⁡(Passengerst)y_t = \log(\mathrm{Passengers}_t)yt​=log(Passengerst​)

Why log transform?

  • In multiplicative seasonal models:
    • Passengerst=μt×st×εt\mathrm{Passengers}_t = \mu_t \times s_t \times \varepsilon_tPassengerst​=μt​×st​×εt​
    • Taking logs gives log⁡(Passengerst)=log⁡(μt)+log⁡(st)+log⁡(εt)\log(\mathrm{Passengers}_t) = \log(\mu_t) + \log(s_t) + \log(\varepsilon_t)log(Passengerst​)=log(μt​)+log(st​)+log(εt​).
  • Thus:
    • Multiplicative seasonality becomes additive.
    • Variance is typically more stable when deviations are proportional to the level.

Variance stabilization

Summary statistics:

SeriesMeanVariance
Original280.3014391.92
Log5.54220.1949

Variance is compressed dramatically in log scale, making Gaussian error assumptions more plausible.

Visual comparison:

Stationarity test on log series

  • ADF on yty_tyt​:
    • Test statistic ≈ −1.72
    • p‑value ≈ 0.42

So even after logging, the series remains nonstationary; we still need differencing.


3. Differencing & Stationarity (Steps 3 & 9.3)

We apply:

  • Regular difference: d=1d = 1d=1 with lag 1
  • Seasonal difference: D=1D = 1D=1 with lag 12

Define:

  • First difference: Δyt=yt−yt−1\Delta y_t = y_t - y_{t-1}Δyt​=yt​−yt−1​
  • Seasonal difference: Δ12yt=yt−yt−12\Delta_{12} y_t = y_t - y_{t-12}Δ12​yt​=yt​−yt−12​
  • Combined: zt=(1−L)(1−L12)ytz_t = (1-L)(1-L^{12})y_tzt​=(1−L)(1−L12)yt​

Implementation equivalence was checked:

  • Two independent implementations of (1−L)(1−L12)(1-L)(1-L^{12})(1−L)(1−L12) differ by max absolute difference ≈ 0 (machine precision).

Visuals

Time series of:

  • Δyt\Delta y_tΔyt​,
  • Δ12yt\Delta_{12} y_tΔ12​yt​,
  • ztz_tzt​ with rolling mean/std:

From this:

  • The rolling mean of ztz_tzt​ fluctuates around zero without drift.
  • The rolling standard deviation is relatively stable.

ADF tests

TestStatisticp‑valueInterpretation
ADF on yty_tyt​ (log)−1.720.422Nonstationary
ADF on ztz_tzt​−4.440.00025Reject unit root ⇒ stationary after differencing

Thus, (1−L)(1−L12)log⁡(Passengerst)(1-L)(1-L^{12})\log(\mathrm{Passengers}_t)(1−L)(1−L12)log(Passengerst​) is a stationary series, suitable for SARIMA modeling.


4. Seasonal Analysis & SARIMA Identification (Steps 4 & 9.4–9.5)

ACF/PACF of ztz_tzt​

Key values:

QuantityValue
ACF at lag 1−0.34
ACF at lag 12−0.39
PACF at lag 1−0.34
PACF at lag 12−0.38

Interpretation:

  • Large negative spike at lag 1 in ACF ⇒ nonseasonal MA(1) component.
  • Large negative spike at lag 12 in ACF ⇒ seasonal MA(1) at period 12.
  • PACF at lags 1 and 12 is similarly negative; further lags decay rapidly.

This pattern is characteristic of a multiplicative MA model:

  • Chosen model: SARIMA (0,1,1)×(0,1,1)_12(0,1,1)\times(0,1,1)\_{12}(0,1,1)×(0,1,1)_12 on the log scale.
  • We retain this as a parsimonious specification; there is no strong evidence from ACF/PACF that higher-order AR or MA terms are required.

5. Parameter Estimation (Step 5 & 9.6)

Model:

  • yt=log⁡(Passengerst)y_t = \log(\mathrm{Passengers}_t)yt​=log(Passengerst​)
  • SARIMA (0,1,1)×(0,1,1)_12(0,1,1)\times(0,1,1)\_{12}(0,1,1)×(0,1,1)_12, no deterministic trend, Gaussian errors.
  • Estimated via maximum likelihood.

Key fit statistics:

MetricValue
Log-likelihood220.72
AIC−435.44
BIC−427.16
Innovation var σ2\sigma^2σ2 (scale)1.00

The parameter table (MA and seasonal MA coefficients with standard errors, t‑values, p‑values) is stored in sarima_parameters and shows statistically significant MA(1) and seasonal MA(1) terms (p‑values small, |t| large), consistent with the ACF pattern.

Stochastic meaning

  • Nonseasonal MA(1): captures short‑run shock propagation from month to month.
  • Seasonal MA(1) at lag 12: captures one‑year delayed shock effects, i.e., how an innovation in one year influences the same month next year.

6. Residual Diagnostics & Model Adequacy (Steps 6 & 9.7)

Residual diagnostics figure:

Residual properties:

  • Standardized residuals fluctuate around zero with roughly constant variance.
  • Residual ACF: mostly within ±1.96/√N, though there are moderate spikes.
  • Histogram vs N(0,1) and QQ‑plot:
    • General alignment with normality, some tail deviations.

Ljung‑Box tests

p‑values:

Lagp‑value
60.9999
120.00047
180.00916
240.0657

Interpretation:

  • At lag 6: no evidence of autocorrelation.
  • At lags 12 and 18: p‑values < 0.05 ⇒ evidence of remaining seasonal autocorrelation.
  • At lag 24: marginal (p ≈ 0.066).

Diagnostic conclusion

  • Residuals are broadly white‑noise‑like but not perfectly so:
    • The model captures most structure, but some seasonal dependence remains.
  • For teaching/illustration purposes, SARIMA(0,1,1)×(0,1,1)_{12} is still a very reasonable Box‑Jenkins model.
  • For a production setting, you might explore:
    • Adding a small AR(1) or seasonal AR(1) component,
    • Checking alternative information criteria.

7. Forecast Results (Steps 7 & 9.8)

Forecast horizon: 24 months beyond Dec 1960 (Jan 1961 – Dec 1962).

Forecasts are produced on the log scale and then exponentiated:

  • Point forecast at horizon hhh: exp⁡(y^T+h∣T)\exp(\hat{y}_{T+h \mid T})exp(y^​T+h∣T​)
  • 95% intervals (approximate): [exp⁡(ℓT+h),exp⁡(uT+h)]\left[\exp(\ell_{T+h}), \exp(u_{T+h})\right][exp(ℓT+h​),exp(uT+h​)]

Final (Dec 1962) forecast (levels):

  • Mean: ≈ 525.7 passengers
  • 95% interval: ≈ [406.0, 680.9]

Forecast plot:

Interpretation

  • Forecasts preserve the seasonal pattern and upward trend implied by the integrated seasonal structure.
  • Prediction intervals widen with horizon, reflecting accumulating uncertainty in both local and seasonal shocks.
  • Because we exponentiate Gaussian prediction bands, the intervals are slightly approximate on the original scale (they are symmetric in log, not in levels).

8. Monte Carlo Simulation Results (Steps 8 & 9.9–9.11)

We simulate 1000 stochastic paths from the fitted SARIMA model:

  • Length: 24 months ahead.
  • Simulation in log space, then exponentiate to obtain passenger levels.

For each horizon, we compute across paths:

  • Mean, median
  • Quantiles: 5%, 25%, 75%, 95%
  • Probabilities:
    • Pr⁡(Passengerst>400)\Pr(\mathrm{Passengers}_t > 400)Pr(Passengerst​>400)
    • Pr⁡(Passengerst>500)\Pr(\mathrm{Passengers}_t > 500)Pr(Passengerst​>500)

Example (last horizon, Dec 1962):

  • Pr⁡(Passengers>400)≈0.987\Pr(\mathrm{Passengers} > 400) \approx 0.987Pr(Passengers>400)≈0.987
  • Pr⁡(Passengers>500)≈0.639\Pr(\mathrm{Passengers} > 500) \approx 0.639Pr(Passengers>500)≈0.639

Monte Carlo visualization (50 sample paths + mean path + 5–95% band):

Interpretation

  • Simulated paths exhibit:
    • Strong seasonality,
    • Upward drift,
    • Broadening dispersion over time.
  • The distribution of future values is right‑skewed in levels (due to exponentiation of Gaussian log errors).
  • The probability of exceeding high thresholds (e.g., 400, 500 passengers) increases strongly over time, reflecting the stochastic growth implied by the model.

9. Forecast Uncertainty & Probabilistic Interpretation (Steps 9.10–9.11)

  • The Box‑Jenkins SARIMA(0,1,1)×(0,1,1)_{12} model provides a coherent stochastic description of:

    • Trend (integrated component),
    • Seasonality (seasonal integration),
    • Short‑run and seasonal shocks (MA and seasonal MA).
  • Uncertainty decomposition:

    • Short horizons: dominated by recent idiosyncratic shocks.
    • Longer horizons: dominated by cumulated innovation variance in both nonseasonal and seasonal parts, producing wide intervals and wide Monte Carlo bands.
  • The Monte Carlo distribution at each horizon can be used to compute:

    • Quantile‑based Value‑at‑Risk‑type metrics for demand risk,
    • Scenario bands for planning (e.g., 5–95% envelopes),
    • Threshold exceedance probabilities (already computed for 400 and 500).

10. Final Model Assessment (Step 9.12)

Strengths

  • Follows Box‑Jenkins methodology rigorously:

    • Clear data exploration (trend, seasonality, heteroscedasticity).
    • Variance‑stabilizing log transformation.
    • Proper (d,D)=(1,1)(d,D)=(1,1)(d,D)=(1,1) differencing to achieve stationarity.
    • ACF/PACF‑based identification of SARIMA(0,1,1)×(0,1,1)_{12}.
    • MLE estimation with interpretable parameters.
    • Rich residual diagnostics, forecasts, and Monte Carlo simulation.
  • The model:

    • Captures core stochastic structure (trend + seasonal dynamics).
    • Produces economically sensible forecasts.
    • Delivers probabilistic forecasts and risk measures, not just point forecasts.

Limitations / potential refinements

  • Ljung‑Box tests indicate residual seasonal correlation at some lags:
    • You might test SARIMA(0,1,1)×(1,1,1)_{12} or SARIMA(1,1,1)×(0,1,1)_{12} and compare AIC/BIC and diagnostics.
  • Gaussian assumption in log space may not be perfect (QQ‑plot tails).
  • Prediction intervals in levels are approximate.

Next steps (if you want to extend the study)

  1. Compare alternative SARIMA specifications and perform a formal model selection (AIC/BIC + diagnostics).
  2. Evaluate out‑of‑sample performance via rolling or expanding window forecasting.
  3. Incorporate exogenous regressors (if available) and test SARIMAX.
  4. Use the Monte Carlo distributions to design capacity planning or revenue management policies based on tail risk tolerances.

If you would like, I can:

  • Extract and format the full parameter table,
  • Summarize specific Monte Carlo quantiles at each horizon,
  • Or walk through how you would replicate each step in your preferred software (e.g., Python, R, MATLAB) using this same workflow.

Nuestro Asistente es una herramienta de inteligencia artificial. La información generada no debe considerarse asesoramiento oficial