iTranslated by AI
Newey-West HAC Estimator: Understanding Standard Error Correction in Time Series Regression with Python
Newey-West HAC Estimator: Understanding Standard Error Correction in Time Series Regression with Python
Introduction
Suppose you performed an OLS regression on time series data and found a p-value of 0.001. You might be tempted to conclude, "It's definitely significant," but that p-value is unreliable if you ignore autocorrelation (the phenomenon where values in a time series are correlated with their own past values). In the simulations in this article, I will show that for autocorrelated data, OLS can falsely determine variables as "significant" even when they are actually unrelated (the true coefficient is zero).
In this article, we will understand how the Newey-West HAC (Heteroskedasticity and Autocorrelation Consistent) estimator corrects the standard errors of coefficients by implementing it in Python.
What You Will Learn
- Why standard errors of OLS coefficients break with time series data
- How underestimation of standard errors affects p-values
- What the HAC estimator is actually correcting
- How to implement it in Python
- Verification through Monte Carlo simulations
Premise: When to Use OLS Regression with Time Series Data
For time series data, there are inherently methods that explicitly model time dependency, such as ARIMA, VAR, and State Space Models. However, in practice, there are many situations where we want to know "the magnitude of the impact of explanatory variable X on target variable Y," and we may use OLS, prioritizing the interpretability of the coefficients.
However, when using OLS with time series data, at the very least, the standard errors should be corrected with HAC. Otherwise, statistical inference (significance testing, confidence intervals) will break.
The Problem: Underestimation of Standard Errors Due to Autocorrelation
First, let's review what a "standard error" is. The coefficients estimated by OLS change if the data changes. The standard error is an index representing this variability in the estimated coefficient. It indicates how much the coefficient would fluctuate if you repeated the same analysis many times. The smaller the standard error, the more you can judge that "this coefficient is stable," and the smaller the p-value becomes.
Intuitive Explanation
Consider a situation where you are estimating the average body weight.
Case A: Measuring 5 independent people
- Data: 60kg, 72kg, 55kg, 68kg, 65kg
- Characteristic: Each measured value is distinct
- Information: Independent information for 5 people
Case B: Measuring the same person 5 times
- Data: 71kg, 72kg, 71kg, 72kg, 71kg
- Characteristic: Each measured value is almost the same
- Information: Effectively only information for 1 person
Even in Case B, OLS calculates that "precision is high because there are 5 data points." Autocorrelation in time series data creates a situation similar to Case B. Because data at adjacent time points are similar, the effective amount of information becomes less than the apparent number of data points.
Confirmation with Mathematical Formulas
Let's confirm this intuition using mathematical formulas. We compare the variance of the coefficients calculated by OLS and the true variance when autocorrelation exists. I will use matrix notation here, but the main point is the inequality: "Variance assumed by OLS < True variance." If the details of the formulas are difficult, you can just focus on this inequality and move on.
Variance of coefficients calculated by OLS:
True variance (when autocorrelation exists):
Here
OLS assumes
Why Underestimation of Standard Errors is a Problem
A false positive is incorrectly determining that there is an effect when there actually is none (also called a Type I error in statistics). When standard errors are underestimated, the p-value becomes too small, increasing these false positives.
The mechanism is as follows:
- Autocorrelation exists → Standard errors of coefficients are underestimated
- Formula for t-value:
t = \frac{\hat{\beta}}{\text{SE}(\hat{\beta})} - Denominator (standard error) is underestimated → t-value becomes too large
- Large t-value → p-value becomes too small
- Falsely judging as "significant" even though they are actually unrelated (
)\beta = 0
Up to this point, we have understood the concept through intuition and formulas. Next, we will verify how serious this problem is in actual time series regression and how the HAC estimator corrects it using Python simulations.
Experiment: Confirmation through Simulation
Preparation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from scipy import stats
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')
np.random.seed(42)
Data Generation Function
Autocorrelated Data with AR(1) Process
AR(1) (First-order Autoregressive model) is a model where the current value depends on the immediately preceding value at a rate of
When there is positive autocorrelation (
The explanatory variable X is generated as a random walk (a model where random noise is added to the previous value:
def generate_autocorrelated_data(n=1000, beta_true=2.0, rho=0.7, sigma=1.0):
"""Generate autocorrelated residuals using an AR(1) process"""
# Make explanatory variable X a random walk as well to have autocorrelation
x = np.cumsum(np.random.randn(n)) / np.sqrt(n) # Adjust magnitude by scaling
# Generate AR(1) residuals
u = np.zeros(n)
epsilon = np.random.randn(n) * sigma
for t in range(1, n):
u[t] = rho * u[t-1] + epsilon[t]
y = beta_true * x + u
return x, y, u
Visualization of Residuals
x_ar, y_ar, u_ar = generate_autocorrelated_data(n=1000, rho=0.7)
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].plot(u_ar)
axes[0].set_title('Residuals Time Series (AR(1), rho=0.7)')
axes[0].set_xlabel('Time')
axes[0].set_ylabel('Residual')
axes[0].axhline(0, color='red', linestyle='--', alpha=0.3)
axes[1].acorr(u_ar, maxlags=20)
axes[1].set_title('Autocorrelation Function (AR(1), rho=0.7)')
axes[1].set_xlabel('Lag')
axes[1].set_ylabel('Autocorrelation')
plt.tight_layout()
plt.show()

In the time series plot on the left, you can see the "inertia" caused by positive autocorrelation. Like the body weight example, it's a situation where "the same person is measured repeatedly." Residuals at adjacent time points take similar values.
The autocorrelation function (ACF) on the right shows the autocorrelation at lag k (the correlation coefficient between data points k periods apart). You can see an autocorrelation of about 0.66 at lag 1, which decays as the lag increases.
OLS vs HAC Comparison
Let's see how OLS standard errors and HAC standard errors differ in a situation where there is "inertia" in the residuals.
X_ar = sm.add_constant(x_ar) # Add constant term (intercept)
# Standard OLS
model_ar_ols = sm.OLS(y_ar, X_ar).fit()
# Newey-West HAC (details on maxlags later)
maxlags = int(np.floor(len(y_ar)**(1/3))) # n=1000 -> maxlags=9
model_ar_hac = sm.OLS(y_ar, X_ar).fit(
cov_type='HAC',
cov_kwds={'maxlags': maxlags}
)
# Comparison of results
results = pd.DataFrame({
'Method': ['OLS', 'HAC'],
'Coefficient': [model_ar_ols.params[1], model_ar_hac.params[1]],
'Std Error': [model_ar_ols.bse[1], model_ar_hac.bse[1]],
't-value': [model_ar_ols.tvalues[1], model_ar_hac.tvalues[1]],
'p-value': [model_ar_ols.pvalues[1], model_ar_hac.pvalues[1]]
})
print(results.to_string(index=False))
Execution result (data with
| Method | Coefficient | Std Error | t-value | p-value |
|---|---|---|---|---|
| OLS | 2.2812 | 0.1435 | 15.90 | 6.83e-51 |
| HAC | 2.2812 | 0.2578 | 8.85 | 8.79e-19 |
Quantifying the Effect of HAC Correction
A comparison with a single dataset may be purely coincidental. We will statistically verify the difference between OLS and HAC through a Monte Carlo simulation.
Verification of False Positive Rate
We generate data where the true coefficient is
def false_positive_rate(n_sim=1000, n=1000, beta_true=0.0, rho=0.7):
"""
Calculate the probability of p < 0.05 (False Positive Rate) under beta = 0
Ideally should be 5%
"""
false_pos_ols = []
false_pos_hac = []
for _ in range(n_sim):
x, y, _ = generate_autocorrelated_data(n=n, beta_true=beta_true, rho=rho)
X = sm.add_constant(x)
# OLS
model_ols = sm.OLS(y, X).fit()
false_pos_ols.append(model_ols.pvalues[1] < 0.05)
# HAC
maxlags = int(np.floor(n**(1/3)))
model_hac = sm.OLS(y, X).fit(cov_type='HAC', cov_kwds={'maxlags': maxlags})
false_pos_hac.append(model_hac.pvalues[1] < 0.05)
return np.mean(false_pos_ols), np.mean(false_pos_hac)
fpr_ols, fpr_hac = false_positive_rate(n_sim=1000, beta_true=0.0, rho=0.7)
print(f"OLS: {fpr_ols * 100:.1f}%")
print(f"HAC: {fpr_hac * 100:.1f}%")
Execution result (
- OLS: 41.5% -- Falsely determined as "significant" in about 40% of cases despite being unrelated
- HAC: 11.1% -- Does not reach the ideal 5%, but is significantly improved
HAC is an asymptotic correction (valid when the sample size is sufficiently large), and in finite samples, it may deviate from the nominal level of 5%. Especially with strong autocorrelation such as
Relationship between Autocorrelation Strength and HAC Correction
We will change the autocorrelation coefficient
rho_values = [0.0, 0.3, 0.5, 0.7, 0.9]
fpr_ols_list = []
fpr_hac_list = []
for rho in rho_values:
fpr_ols, fpr_hac = false_positive_rate(n_sim=500, n=1000, beta_true=0.0, rho=rho)
fpr_ols_list.append(fpr_ols)
fpr_hac_list.append(fpr_hac)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(rho_values, [x * 100 for x in fpr_ols_list], marker='o', linewidth=2, label='OLS')
ax.plot(rho_values, [x * 100 for x in fpr_hac_list], marker='s', linewidth=2, label='HAC')
ax.axhline(5.0, color='red', linestyle='--', alpha=0.5, label='Nominal level (5%)')
ax.set_xlabel('Autocorrelation Coefficient rho')
ax.set_ylabel('False Positive Rate (%)')
ax.set_title('False Positive Rate vs Autocorrelation Strength')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

The code above shows a single loop. The figure in the article displays the 95% confidence interval from 50 repeated experiments. At
The standard error correction ratio (HAC SE / OLS SE) also changes according to
rho_values = np.linspace(0, 0.9, 10)
se_ratios = []
for rho in rho_values:
x, y, _ = generate_autocorrelated_data(n=1000, rho=rho)
X = sm.add_constant(x)
model_ols = sm.OLS(y, X).fit()
maxlags = int(np.floor(len(y)**(1/3)))
model_hac = sm.OLS(y, X).fit(cov_type='HAC', cov_kwds={'maxlags': maxlags})
se_ratios.append(model_hac.bse[1] / model_ols.bse[1])
plt.figure(figsize=(10, 6))
plt.plot(rho_values, se_ratios, marker='o', linewidth=2)
plt.axhline(1.0, color='red', linestyle='--', alpha=0.5, label='Ratio=1 (No correction)')
plt.xlabel('Autocorrelation Coefficient rho')
plt.ylabel('HAC SE / OLS SE')
plt.title('HAC Correction vs Autocorrelation Strength')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()

The code above also shows a single loop. The figure in the article displays the 95% confidence interval from 200 repeated experiments. At
Verification of Coverage Rate
The false positive rate was the "probability of incorrectly determining significance when
def monte_carlo_coverage(n_sim=1000, n=1000, beta_true=2.0, rho=0.7):
"""Calculate coverage rate through Monte Carlo simulation"""
coverage_ols = []
coverage_hac = []
for _ in range(n_sim):
x, y, _ = generate_autocorrelated_data(n=n, beta_true=beta_true, rho=rho)
X = sm.add_constant(x)
# OLS
model_ols = sm.OLS(y, X).fit()
ci_ols = model_ols.conf_int(alpha=0.05)[1]
coverage_ols.append(ci_ols[0] <= beta_true <= ci_ols[1])
# HAC
maxlags = int(np.floor(n**(1/3)))
model_hac = sm.OLS(y, X).fit(cov_type='HAC', cov_kwds={'maxlags': maxlags})
ci_hac = model_hac.conf_int(alpha=0.05)[1]
coverage_hac.append(ci_hac[0] <= beta_true <= ci_hac[1])
return np.mean(coverage_ols), np.mean(coverage_hac)
coverage_ols, coverage_hac = monte_carlo_coverage(n_sim=1000)
print(f"OLS: {coverage_ols * 100:.1f}%")
print(f"HAC: {coverage_hac * 100:.1f}%")
Execution result (
- OLS: 59.7% -- Despite being a 95% confidence interval, it misses the true value 40% of the time
- HAC: 89.5% -- Closer to the ideal 95%
The OLS confidence interval is a "95% in name only," with only about 60% coverage in reality. HAC captures the true uncertainty more accurately.
Here is the overall picture when changing the strength of autocorrelation.

At
Manual Implementation of the HAC Estimator
We have confirmed the seriousness of the problem through false positive and coverage rates. But how is the HAC estimator actually calculated internally? Let's open the black box of statsmodels.
How it Works
The idea behind the HAC estimator is simple: "Aggregate the relationships between residuals from near to distant time points, reducing the weight for more distant points." Specifically, it consists of the following three elements:
-
: Captures heteroskedasticity (where the variance of residuals is not constant). On its own,\hat{\Gamma}_0 corresponds to the White (1980) HC estimator, which corrects only for heteroskedasticity.\hat{\Gamma}_0 -
: Captures autocovariance at lag\hat{\Gamma}_j . This reflects the additional uncertainty due to autocorrelation.j - Bartlett Kernel: Assigns smaller weights to further lags. This is based on the natural assumption that "the further in the past, the weaker the relationship with the present." Additionally, this weighting prevents irrational results, such as an estimated variance being negative (guaranteeing positive semi-definiteness).
First, we define the score
Using this, let's look at the three elements in order, where
We construct the matrix
The final variance-covariance matrix of the coefficients is obtained using the sandwich formula:
This is named because the way
Implementation Code
def newey_west_manual(X, residuals, maxlags):
"""Manual implementation of the Newey-West HAC estimator"""
n, k = X.shape
# Score: m_t = x_t * u_t
xu = X * residuals[:, None] # (n, k)
# Bartlett kernel weights
weights = 1 - np.arange(maxlags + 1) / (maxlags + 1)
# Gamma_0: Sum of outer products of scores (not divided by n)
S = weights[0] * (xu.T @ xu)
# Gamma_j: Autocovariance at lag j
for j in range(1, maxlags + 1):
Gamma_j = xu[j:].T @ xu[:-j]
S += weights[j] * (Gamma_j + Gamma_j.T)
# Calculate variance-covariance matrix using the sandwich formula
XtX_inv = np.linalg.inv(X.T @ X)
cov = XtX_inv @ S @ XtX_inv
return cov
Verification against statsmodels
# Verifying the manual implementation
X_ar = sm.add_constant(x_ar)
model_ols = sm.OLS(y_ar, X_ar).fit()
maxlags = int(np.floor(len(y_ar)**(1/3)))
# Manual calculation
cov_manual = newey_west_manual(X_ar, model_ols.resid, maxlags)
se_manual = np.sqrt(np.diag(cov_manual))[1]
# statsmodels
model_hac = sm.OLS(y_ar, X_ar).fit(cov_type='HAC', cov_kwds={'maxlags': maxlags})
se_statsmodels = model_hac.bse[1]
print(f"Manual implementation: {se_manual:.6f}")
print(f"statsmodels: {se_statsmodels:.6f}")
print(f"Match: {np.allclose(se_manual, se_statsmodels)}")
Execution result:
Manual implementation: 0.257776
statsmodels: 0.257776
Difference: 0.0000000000
Match: True
The results of the manual implementation and statsmodels matched perfectly. This confirms that the standard errors of the coefficients are correctly calculated from the score covariance matrix
Key Points in Practice
Choosing maxlags (Maximum Number of Lags)
# Rule of thumb
maxlags = int(np.floor(n**(1/3))) # Or 4*(n/100)**(2/9)
- If
maxlagsis too small → Autocorrelation cannot be sufficiently corrected - If
maxlagsis too large → Noise is introduced, making the standard error of the coefficients unstable
Example of sensitivity analysis:
maxlags_candidates = [5, 8, 10, 15, 20]
for ml in maxlags_candidates:
model = sm.OLS(y_ar, X_ar).fit(cov_type='HAC', cov_kwds={'maxlags': ml})
print(f"maxlags={ml:2d}: SE={model.bse[1]:.4f}, t={model.tvalues[1]:.4f}, p={model.pvalues[1]:.4f}")
Execution result:
| maxlags | HAC SE | t-value | p-value |
|---|---|---|---|
| 5 | 0.2415 | 9.45 | 3.53e-21 |
| 8 | 0.2553 | 8.94 | 4.03e-19 |
| 10 | 0.2599 | 8.78 | 1.70e-18 |
| 15 | 0.2641 | 8.64 | 5.77e-18 |
| 20 | 0.2600 | 8.77 | 1.71e-18 |
Even when changing maxlags, the standard errors remain within the 0.24 - 0.26 range, and the conclusion remains unchanged. This confirms that the results are robust.
Comparison with Other Methods
| Method | Target for Correction | Usage |
|---|---|---|
| Standard OLS SE | None | Only for i.i.d. cases |
| White HC SE | Heteroskedasticity only | Cross-sectional data [1] |
| Newey-West HAC SE | Heteroskedasticity + Autocorrelation | Time series / Panel data [2] |
| Cluster-robust SE [3] | Intra-group correlation | Panel data |
Conclusion
- In time series data, OLS coefficient estimates are unbiased (they show the correct value on average over many repetitions), but the standard errors of the coefficients are underestimated.
- Underestimation of standard errors leads directly to an increase in the false positive rate (at
, OLS: approx. 42% vs HAC: approx. 11%. HAC is a significant improvement, although it doesn't reach the ideal 5%).\rho = 0.7 - The HAC estimator corrects the standard errors of coefficients, enabling more reliable statistical inference.
- The stronger the autocorrelation, the greater the effect of the HAC correction.
- Coefficient estimates do not change; only the standard errors, t-values, and p-values of the coefficients change.
- In time series regression, it is safer to use HAC unless there is a specific reason not to.
Next Steps: Try adding cov_type='HAC' to your own time series regression analysis code to see how much the standard errors change. If the results change significantly, it is worth reconsidering your previous conclusions.
Appendix: Supplementary Technical Explanation
A. Regarding Matrix Notation X
B. Intuition Why Off-diagonal Elements of \Omega Become Positive
With positive autocorrelation, if the error
C. Reason for Symmetrizing with \hat{\Gamma}_j + \hat{\Gamma}_j^\top
D. Practical Validity of Sample Size n=1000
Although
References
- Newey, W. K., & West, K. D. (1987). "A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix." Econometrica, 55(3), 703-708. DOI: 10.2307/1913610
- Proposal of the HAC covariance matrix estimator with a positive semi-definite guarantee using the Bartlett kernel. The main subject of this article.
- Newey, W. K., & West, K. D. (1994). "Automatic Lag Selection in Covariance Matrix Estimation." Review of Economic Studies, 61(4), 631-653. DOI: 10.2307/2297912
- Proposal of a method for automatically selecting the maximum number of lags from data. The theoretical basis for the maxlags rule of thumb
in this article.n^{1/3}
- Proposal of a method for automatically selecting the maximum number of lags from data. The theoretical basis for the maxlags rule of thumb
- White, H. (1980). "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity." Econometrica, 48(4), 817-838. DOI: 10.2307/1912934
- Proposal of the consistent HC estimator without assuming the form of heteroskedasticity. Corresponds to the
part of HAC.\hat{\Gamma}_0
- Proposal of the consistent HC estimator without assuming the form of heteroskedasticity. Corresponds to the
- Stock, J. H., & Watson, M. W. (2020). Introduction to Econometrics. 4th ed. Pearson. ISBN: 978-0134461991.
- A standard textbook on econometrics. Source of the maxlags rule of thumb
in this article.4(n/100)^{2/9}
- A standard textbook on econometrics. Source of the maxlags rule of thumb
-
Cross-sectional data: Data observing multiple subjects at a single point in time (e.g., income data for 47 prefectures in 2024). ↩︎
-
Panel data: Data observing multiple subjects across multiple time points (e.g., tracking income for 47 prefectures over 10 years). ↩︎
-
Cluster-robust SE: Standard errors used when data within the same group (cluster) are correlated. For example, data for employees belonging to the same company tend to be correlated with each other. ↩︎
Discussion