iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🦔

Newey-West HAC Estimator: Understanding Standard Error Correction in Time Series Regression with Python

に公開

Newey-West HAC Estimator: Understanding Standard Error Correction in Time Series Regression with Python

Introduction

Suppose you performed an OLS regression on time series data and found a p-value of 0.001. You might be tempted to conclude, "It's definitely significant," but that p-value is unreliable if you ignore autocorrelation (the phenomenon where values in a time series are correlated with their own past values). In the simulations in this article, I will show that for autocorrelated data, OLS can falsely determine variables as "significant" even when they are actually unrelated (the true coefficient is zero).

In this article, we will understand how the Newey-West HAC (Heteroskedasticity and Autocorrelation Consistent) estimator corrects the standard errors of coefficients by implementing it in Python.

What You Will Learn

  • Why standard errors of OLS coefficients break with time series data
  • How underestimation of standard errors affects p-values
  • What the HAC estimator is actually correcting
  • How to implement it in Python
  • Verification through Monte Carlo simulations

Premise: When to Use OLS Regression with Time Series Data

For time series data, there are inherently methods that explicitly model time dependency, such as ARIMA, VAR, and State Space Models. However, in practice, there are many situations where we want to know "the magnitude of the impact of explanatory variable X on target variable Y," and we may use OLS, prioritizing the interpretability of the coefficients.

However, when using OLS with time series data, at the very least, the standard errors should be corrected with HAC. Otherwise, statistical inference (significance testing, confidence intervals) will break.

The Problem: Underestimation of Standard Errors Due to Autocorrelation

First, let's review what a "standard error" is. The coefficients estimated by OLS change if the data changes. The standard error is an index representing this variability in the estimated coefficient. It indicates how much the coefficient would fluctuate if you repeated the same analysis many times. The smaller the standard error, the more you can judge that "this coefficient is stable," and the smaller the p-value becomes.

Intuitive Explanation

Consider a situation where you are estimating the average body weight.

Case A: Measuring 5 independent people

  • Data: 60kg, 72kg, 55kg, 68kg, 65kg
  • Characteristic: Each measured value is distinct
  • Information: Independent information for 5 people

Case B: Measuring the same person 5 times

  • Data: 71kg, 72kg, 71kg, 72kg, 71kg
  • Characteristic: Each measured value is almost the same
  • Information: Effectively only information for 1 person

Even in Case B, OLS calculates that "precision is high because there are 5 data points." Autocorrelation in time series data creates a situation similar to Case B. Because data at adjacent time points are similar, the effective amount of information becomes less than the apparent number of data points.

Confirmation with Mathematical Formulas

Let's confirm this intuition using mathematical formulas. We compare the variance of the coefficients calculated by OLS and the true variance when autocorrelation exists. I will use matrix notation here, but the main point is the inequality: "Variance assumed by OLS < True variance." If the details of the formulas are difficult, you can just focus on this inequality and move on.

Variance of coefficients calculated by OLS:

\text{Var}_{\text{OLS}}(\hat{\beta}) = \sigma^2 (X^\top X)^{-1}

True variance (when autocorrelation exists):

\text{Var}_{\text{true}}(\hat{\beta}) = (X^\top X)^{-1} X^\top \Omega X (X^\top X)^{-1}

Here \Omega is the variance-covariance matrix of the error terms (\Omega = E[uu^\top]). The diagonal elements of \Omega represent the variance of the error at each time point, and the off-diagonal elements represent the covariance of errors between different time points.

OLS assumes \Omega = \sigma^2 I (a diagonal matrix). This is the assumption that "error terms are uncorrelated with each other." However, with positive autocorrelation, the off-diagonal elements of \Omega become positive, resulting in True variance > OLS calculation.

Why Underestimation of Standard Errors is a Problem

A false positive is incorrectly determining that there is an effect when there actually is none (also called a Type I error in statistics). When standard errors are underestimated, the p-value becomes too small, increasing these false positives.

The mechanism is as follows:

  1. Autocorrelation exists → Standard errors of coefficients are underestimated
  2. Formula for t-value: t = \frac{\hat{\beta}}{\text{SE}(\hat{\beta})}
  3. Denominator (standard error) is underestimated → t-value becomes too large
  4. Large t-value → p-value becomes too small
  5. Falsely judging as "significant" even though they are actually unrelated (\beta = 0)

Up to this point, we have understood the concept through intuition and formulas. Next, we will verify how serious this problem is in actual time series regression and how the HAC estimator corrects it using Python simulations.

Experiment: Confirmation through Simulation

Preparation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from scipy import stats

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')
np.random.seed(42)

Data Generation Function

Autocorrelated Data with AR(1) Process

AR(1) (First-order Autoregressive model) is a model where the current value depends on the immediately preceding value at a rate of \rho. If \rho = 0.7, then 70% of the previous value is carried over.

u_t = \rho \, u_{t-1} + \varepsilon_t

When there is positive autocorrelation (\rho > 0), it creates "inertia" where if the residual is positive, it tends to stay positive, and if it is negative, it tends to stay negative.

The explanatory variable X is generated as a random walk (a model where random noise is added to the previous value: x_t = x_{t-1} + \varepsilon_t). This is a typical model for time series data where today's value is close to yesterday's value, such as stock prices.

def generate_autocorrelated_data(n=1000, beta_true=2.0, rho=0.7, sigma=1.0):
    """Generate autocorrelated residuals using an AR(1) process"""
    # Make explanatory variable X a random walk as well to have autocorrelation
    x = np.cumsum(np.random.randn(n)) / np.sqrt(n)  # Adjust magnitude by scaling

    # Generate AR(1) residuals
    u = np.zeros(n)
    epsilon = np.random.randn(n) * sigma
    for t in range(1, n):
        u[t] = rho * u[t-1] + epsilon[t]

    y = beta_true * x + u
    return x, y, u

Visualization of Residuals

x_ar, y_ar, u_ar = generate_autocorrelated_data(n=1000, rho=0.7)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].plot(u_ar)
axes[0].set_title('Residuals Time Series (AR(1), rho=0.7)')
axes[0].set_xlabel('Time')
axes[0].set_ylabel('Residual')
axes[0].axhline(0, color='red', linestyle='--', alpha=0.3)

axes[1].acorr(u_ar, maxlags=20)
axes[1].set_title('Autocorrelation Function (AR(1), rho=0.7)')
axes[1].set_xlabel('Lag')
axes[1].set_ylabel('Autocorrelation')

plt.tight_layout()
plt.show()

Autocorrelation of Residuals

In the time series plot on the left, you can see the "inertia" caused by positive autocorrelation. Like the body weight example, it's a situation where "the same person is measured repeatedly." Residuals at adjacent time points take similar values.

The autocorrelation function (ACF) on the right shows the autocorrelation at lag k (the correlation coefficient between data points k periods apart). You can see an autocorrelation of about 0.66 at lag 1, which decays as the lag increases.

OLS vs HAC Comparison

Let's see how OLS standard errors and HAC standard errors differ in a situation where there is "inertia" in the residuals.

X_ar = sm.add_constant(x_ar)  # Add constant term (intercept)

# Standard OLS
model_ar_ols = sm.OLS(y_ar, X_ar).fit()

# Newey-West HAC (details on maxlags later)
maxlags = int(np.floor(len(y_ar)**(1/3)))  # n=1000 -> maxlags=9
model_ar_hac = sm.OLS(y_ar, X_ar).fit(
    cov_type='HAC',
    cov_kwds={'maxlags': maxlags}
)

# Comparison of results
results = pd.DataFrame({
    'Method': ['OLS', 'HAC'],
    'Coefficient': [model_ar_ols.params[1], model_ar_hac.params[1]],
    'Std Error': [model_ar_ols.bse[1], model_ar_hac.bse[1]],
    't-value': [model_ar_ols.tvalues[1], model_ar_hac.tvalues[1]],
    'p-value': [model_ar_ols.pvalues[1], model_ar_hac.pvalues[1]]
})

print(results.to_string(index=False))

Execution result (data with \rho = 0.7):

Method Coefficient Std Error t-value p-value
OLS 2.2812 0.1435 15.90 6.83e-51
HAC 2.2812 0.2578 8.85 8.79e-19

Quantifying the Effect of HAC Correction

A comparison with a single dataset may be purely coincidental. We will statistically verify the difference between OLS and HAC through a Monte Carlo simulation.

Verification of False Positive Rate

We generate data where the true coefficient is \beta = 0 (unrelated) and check what percentage is incorrectly determined as "significant (p < 0.05)" with OLS and HAC, respectively. When the null hypothesis is true, a correctly sized test should only reject with a probability equal to the significance level \alpha, which ideally should be 5%.

def false_positive_rate(n_sim=1000, n=1000, beta_true=0.0, rho=0.7):
    """
    Calculate the probability of p < 0.05 (False Positive Rate) under beta = 0
    Ideally should be 5%
    """
    false_pos_ols = []
    false_pos_hac = []

    for _ in range(n_sim):
        x, y, _ = generate_autocorrelated_data(n=n, beta_true=beta_true, rho=rho)
        X = sm.add_constant(x)

        # OLS
        model_ols = sm.OLS(y, X).fit()
        false_pos_ols.append(model_ols.pvalues[1] < 0.05)

        # HAC
        maxlags = int(np.floor(n**(1/3)))
        model_hac = sm.OLS(y, X).fit(cov_type='HAC', cov_kwds={'maxlags': maxlags})
        false_pos_hac.append(model_hac.pvalues[1] < 0.05)

    return np.mean(false_pos_ols), np.mean(false_pos_hac)

fpr_ols, fpr_hac = false_positive_rate(n_sim=1000, beta_true=0.0, rho=0.7)
print(f"OLS: {fpr_ols * 100:.1f}%")
print(f"HAC: {fpr_hac * 100:.1f}%")

Execution result (\beta = 0, \rho = 0.7, Ideal value: 5%):

  • OLS: 41.5% -- Falsely determined as "significant" in about 40% of cases despite being unrelated
  • HAC: 11.1% -- Does not reach the ideal 5%, but is significantly improved

HAC is an asymptotic correction (valid when the sample size is sufficiently large), and in finite samples, it may deviate from the nominal level of 5%. Especially with strong autocorrelation such as \rho \geq 0.7, the false positive rate for HAC is around 10%. This stems from the choice of maxlags and the limitations of the kernel. However, compared to over 40% for OLS, it is a massive improvement, and in practice, there is no option but to use HAC. If autocorrelation is extremely strong, consider using time series models like ARIMA in addition to HAC.

Relationship between Autocorrelation Strength and HAC Correction

We will change the autocorrelation coefficient \rho and observe how the false positive rate and the correction ratio of standard errors change.

rho_values = [0.0, 0.3, 0.5, 0.7, 0.9]
fpr_ols_list = []
fpr_hac_list = []

for rho in rho_values:
    fpr_ols, fpr_hac = false_positive_rate(n_sim=500, n=1000, beta_true=0.0, rho=rho)
    fpr_ols_list.append(fpr_ols)
    fpr_hac_list.append(fpr_hac)

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(rho_values, [x * 100 for x in fpr_ols_list], marker='o', linewidth=2, label='OLS')
ax.plot(rho_values, [x * 100 for x in fpr_hac_list], marker='s', linewidth=2, label='HAC')
ax.axhline(5.0, color='red', linestyle='--', alpha=0.5, label='Nominal level (5%)')
ax.set_xlabel('Autocorrelation Coefficient rho')
ax.set_ylabel('False Positive Rate (%)')
ax.set_title('False Positive Rate vs Autocorrelation Strength')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

False Positive Rate and Autocorrelation

The code above shows a single loop. The figure in the article displays the 95% confidence interval from 50 repeated experiments. At \rho = 0.0 (no autocorrelation), both OLS and HAC are at about 5%, confirming that OLS standard errors function correctly when no autocorrelation is present. As \rho increases, the OLS false positive rate skyrockets, exceeding 60% at \rho = 0.9. HAC also exceeds the ideal 5% when \rho is large, but it is vastly improved compared to OLS.

The standard error correction ratio (HAC SE / OLS SE) also changes according to \rho.

rho_values = np.linspace(0, 0.9, 10)
se_ratios = []

for rho in rho_values:
    x, y, _ = generate_autocorrelated_data(n=1000, rho=rho)
    X = sm.add_constant(x)

    model_ols = sm.OLS(y, X).fit()
    maxlags = int(np.floor(len(y)**(1/3)))
    model_hac = sm.OLS(y, X).fit(cov_type='HAC', cov_kwds={'maxlags': maxlags})

    se_ratios.append(model_hac.bse[1] / model_ols.bse[1])

plt.figure(figsize=(10, 6))
plt.plot(rho_values, se_ratios, marker='o', linewidth=2)
plt.axhline(1.0, color='red', linestyle='--', alpha=0.5, label='Ratio=1 (No correction)')
plt.xlabel('Autocorrelation Coefficient rho')
plt.ylabel('HAC SE / OLS SE')
plt.title('HAC Correction vs Autocorrelation Strength')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()

HAC Correction and Autocorrelation

The code above also shows a single loop. The figure in the article displays the 95% confidence interval from 200 repeated experiments. At \rho = 0 (no autocorrelation), the HAC SE / OLS SE is approximately 1.0 (no correction needed), but the ratio rises as \rho increases, with a correction of about 2.7 times at \rho = 0.9. It is clear that stronger autocorrelation leads to OLS standard errors deviating significantly from the true value.

Verification of Coverage Rate

The false positive rate was the "probability of incorrectly determining significance when \beta = 0." The coverage rate is the flip side, being the "probability that the confidence interval contains the true value." Underestimation of standard errors distorts both. We verify the probability that the 95% confidence interval covers the true value (coverage rate). If OLS standard errors are underestimated, the confidence interval is too narrow, causing the coverage rate to fall below 95%.

def monte_carlo_coverage(n_sim=1000, n=1000, beta_true=2.0, rho=0.7):
    """Calculate coverage rate through Monte Carlo simulation"""
    coverage_ols = []
    coverage_hac = []

    for _ in range(n_sim):
        x, y, _ = generate_autocorrelated_data(n=n, beta_true=beta_true, rho=rho)
        X = sm.add_constant(x)

        # OLS
        model_ols = sm.OLS(y, X).fit()
        ci_ols = model_ols.conf_int(alpha=0.05)[1]
        coverage_ols.append(ci_ols[0] <= beta_true <= ci_ols[1])

        # HAC
        maxlags = int(np.floor(n**(1/3)))
        model_hac = sm.OLS(y, X).fit(cov_type='HAC', cov_kwds={'maxlags': maxlags})
        ci_hac = model_hac.conf_int(alpha=0.05)[1]
        coverage_hac.append(ci_hac[0] <= beta_true <= ci_hac[1])

    return np.mean(coverage_ols), np.mean(coverage_hac)

coverage_ols, coverage_hac = monte_carlo_coverage(n_sim=1000)
print(f"OLS: {coverage_ols * 100:.1f}%")
print(f"HAC: {coverage_hac * 100:.1f}%")

Execution result (\beta = 2.0, \rho = 0.7, Ideal value: 95%):

  • OLS: 59.7% -- Despite being a 95% confidence interval, it misses the true value 40% of the time
  • HAC: 89.5% -- Closer to the ideal 95%

The OLS confidence interval is a "95% in name only," with only about 60% coverage in reality. HAC captures the true uncertainty more accurately.

Here is the overall picture when changing the strength of autocorrelation.

Coverage Rate and Autocorrelation

At \rho = 0, both OLS and HAC are near the nominal 95% level, but as \rho increases, the OLS coverage rate drops rapidly. HAC improves significantly, but still falls slightly below the nominal 95% level at \rho \geq 0.7.

Manual Implementation of the HAC Estimator

We have confirmed the seriousness of the problem through false positive and coverage rates. But how is the HAC estimator actually calculated internally? Let's open the black box of statsmodels.

How it Works

The idea behind the HAC estimator is simple: "Aggregate the relationships between residuals from near to distant time points, reducing the weight for more distant points." Specifically, it consists of the following three elements:

  1. \hat{\Gamma}_0: Captures heteroskedasticity (where the variance of residuals is not constant). On its own, \hat{\Gamma}_0 corresponds to the White (1980) HC estimator, which corrects only for heteroskedasticity.
  2. \hat{\Gamma}_j: Captures autocovariance at lag j. This reflects the additional uncertainty due to autocorrelation.
  3. Bartlett Kernel: Assigns smaller weights to further lags. This is based on the natural assumption that "the further in the past, the weaker the relationship with the present." Additionally, this weighting prevents irrational results, such as an estimated variance being negative (guaranteeing positive semi-definiteness).

First, we define the score m_t = x_t \hat{u}_t (the product of the explanatory variable and the residual) for each time point.

m_t represents "how much the residual at time t influenced the coefficient estimate." Time points where both the residual and the explanatory variable are large will cause the coefficient estimate to fluctuate significantly. OLS determines coefficients such that "the sum of the products of residuals and explanatory variables is zero" (the normal equation). Each m_t represents the contribution of an individual data point to this equilibrium condition.

Using this, let's look at the three elements in order, where L is the maximum number of lags (maxlags):

\begin{aligned} \hat{\Gamma}_0 &= \sum_{t=1}^{n} m_t m_t^\top = \sum_{t=1}^{n} \hat{u}_t^2 \, x_t x_t^\top \\ \hat{\Gamma}_j &= \sum_{t=j+1}^{n} m_t m_{t-j}^\top = \sum_{t=j+1}^{n} \hat{u}_t \hat{u}_{t-j} \, x_t x_{t-j}^\top \\ w(j) &= 1 - \frac{j}{L + 1} \end{aligned}

We construct the matrix S by summing these together:

S = \hat{\Gamma}_0 + \sum_{j=1}^{L} w(j) \left( \hat{\Gamma}_j + \hat{\Gamma}_j^\top \right)

The final variance-covariance matrix of the coefficients is obtained using the sandwich formula:

\hat{V}(\hat{\beta}) = (X^\top X)^{-1} \, S \, (X^\top X)^{-1}

This is named because the way (X^\top X)^{-1} sandwiches S from both sides resembles a sandwich.

Implementation Code

def newey_west_manual(X, residuals, maxlags):
    """Manual implementation of the Newey-West HAC estimator"""
    n, k = X.shape

    # Score: m_t = x_t * u_t
    xu = X * residuals[:, None]  # (n, k)

    # Bartlett kernel weights
    weights = 1 - np.arange(maxlags + 1) / (maxlags + 1)

    # Gamma_0: Sum of outer products of scores (not divided by n)
    S = weights[0] * (xu.T @ xu)

    # Gamma_j: Autocovariance at lag j
    for j in range(1, maxlags + 1):
        Gamma_j = xu[j:].T @ xu[:-j]
        S += weights[j] * (Gamma_j + Gamma_j.T)

    # Calculate variance-covariance matrix using the sandwich formula
    XtX_inv = np.linalg.inv(X.T @ X)
    cov = XtX_inv @ S @ XtX_inv
    return cov

Verification against statsmodels

# Verifying the manual implementation
X_ar = sm.add_constant(x_ar)
model_ols = sm.OLS(y_ar, X_ar).fit()
maxlags = int(np.floor(len(y_ar)**(1/3)))

# Manual calculation
cov_manual = newey_west_manual(X_ar, model_ols.resid, maxlags)
se_manual = np.sqrt(np.diag(cov_manual))[1]

# statsmodels
model_hac = sm.OLS(y_ar, X_ar).fit(cov_type='HAC', cov_kwds={'maxlags': maxlags})
se_statsmodels = model_hac.bse[1]

print(f"Manual implementation: {se_manual:.6f}")
print(f"statsmodels:           {se_statsmodels:.6f}")
print(f"Match: {np.allclose(se_manual, se_statsmodels)}")

Execution result:

Manual implementation: 0.257776
statsmodels:           0.257776
Difference:            0.0000000000
Match: True

The results of the manual implementation and statsmodels matched perfectly. This confirms that the standard errors of the coefficients are correctly calculated from the score covariance matrix S using the sandwich formula (X^\top X)^{-1} S (X^\top X)^{-1}.

Key Points in Practice

Choosing maxlags (Maximum Number of Lags)

# Rule of thumb
maxlags = int(np.floor(n**(1/3)))  # Or 4*(n/100)**(2/9)

n^{1/3} is the optimal order derived by Newey-West (1994). This order strikes a balance in the tradeoff where too few lags lead to insufficient autocorrelation correction, and too many lags increase estimation noise. 4(n/100)^{2/9} is another rule of thumb proposed by Stock-Watson. In practice, using either formula often leads to similar results.

  • If maxlags is too small → Autocorrelation cannot be sufficiently corrected
  • If maxlags is too large → Noise is introduced, making the standard error of the coefficients unstable

Example of sensitivity analysis:

maxlags_candidates = [5, 8, 10, 15, 20]

for ml in maxlags_candidates:
    model = sm.OLS(y_ar, X_ar).fit(cov_type='HAC', cov_kwds={'maxlags': ml})
    print(f"maxlags={ml:2d}: SE={model.bse[1]:.4f}, t={model.tvalues[1]:.4f}, p={model.pvalues[1]:.4f}")

Execution result:

maxlags HAC SE t-value p-value
5 0.2415 9.45 3.53e-21
8 0.2553 8.94 4.03e-19
10 0.2599 8.78 1.70e-18
15 0.2641 8.64 5.77e-18
20 0.2600 8.77 1.71e-18

Even when changing maxlags, the standard errors remain within the 0.24 - 0.26 range, and the conclusion remains unchanged. This confirms that the results are robust.

Comparison with Other Methods

Method Target for Correction Usage
Standard OLS SE None Only for i.i.d. cases
White HC SE Heteroskedasticity only Cross-sectional data [1]
Newey-West HAC SE Heteroskedasticity + Autocorrelation Time series / Panel data [2]
Cluster-robust SE [3] Intra-group correlation Panel data

Conclusion

  • In time series data, OLS coefficient estimates are unbiased (they show the correct value on average over many repetitions), but the standard errors of the coefficients are underestimated.
  • Underestimation of standard errors leads directly to an increase in the false positive rate (at \rho = 0.7, OLS: approx. 42% vs HAC: approx. 11%. HAC is a significant improvement, although it doesn't reach the ideal 5%).
  • The HAC estimator corrects the standard errors of coefficients, enabling more reliable statistical inference.
  • The stronger the autocorrelation, the greater the effect of the HAC correction.
  • Coefficient estimates do not change; only the standard errors, t-values, and p-values of the coefficients change.
  • In time series regression, it is safer to use HAC unless there is a specific reason not to.

Next Steps: Try adding cov_type='HAC' to your own time series regression analysis code to see how much the standard errors change. If the results change significantly, it is worth reconsidering your previous conclusions.

Appendix: Supplementary Technical Explanation

A. Regarding Matrix Notation X

X used in the mathematical formula section of this article ("Confirmation with Mathematical Formulas") is a matrix (n rows) where the values of the explanatory variables at each time point are arranged vertically. In the case of simple regression, (X^\top X)^{-1} corresponds to 1/\sum x_i^2. If the details of the matrices do not click, just focusing on the inequality "Variance assumed by OLS < True variance" is sufficient.

B. Intuition Why Off-diagonal Elements of \Omega Become Positive

With positive autocorrelation, if the error u_t at a certain time is positive, the preceding u_{t-1} also tends to be positive. That is, E[u_t u_{t-1}] > 0. This is why the off-diagonal elements of \Omega become positive. The more positive the off-diagonal elements are, the greater the deviation from the "diagonal matrix" assumed by OLS, and the more serious the underestimation of standard errors becomes.

C. Reason for Symmetrizing with \hat{\Gamma}_j + \hat{\Gamma}_j^\top

\hat{\Gamma}_j only captures the "forward" covariance of lag j (the influence from time t to time t-j). However, a variance-covariance matrix must be symmetric by definition. Therefore, we add the reverse direction \hat{\Gamma}_j^\top to make it symmetric. This also contributes to guaranteeing the positive semi-definiteness of the Bartlett kernel.

D. Practical Validity of Sample Size n=1000

Although n=1000 is used in the simulations in this article, the accuracy of the asymptotic approximation of HAC decreases for monthly data (n ≈ 120) or quarterly data (n ≈ 80) in practice. For small samples, sensitivity analysis of maxlags becomes more important, and in some cases, consider constructing confidence intervals using the bootstrap method.

References

  • Newey, W. K., & West, K. D. (1987). "A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix." Econometrica, 55(3), 703-708. DOI: 10.2307/1913610
    • Proposal of the HAC covariance matrix estimator with a positive semi-definite guarantee using the Bartlett kernel. The main subject of this article.
  • Newey, W. K., & West, K. D. (1994). "Automatic Lag Selection in Covariance Matrix Estimation." Review of Economic Studies, 61(4), 631-653. DOI: 10.2307/2297912
    • Proposal of a method for automatically selecting the maximum number of lags from data. The theoretical basis for the maxlags rule of thumb n^{1/3} in this article.
  • White, H. (1980). "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity." Econometrica, 48(4), 817-838. DOI: 10.2307/1912934
    • Proposal of the consistent HC estimator without assuming the form of heteroskedasticity. Corresponds to the \hat{\Gamma}_0 part of HAC.
  • Stock, J. H., & Watson, M. W. (2020). Introduction to Econometrics. 4th ed. Pearson. ISBN: 978-0134461991.
    • A standard textbook on econometrics. Source of the maxlags rule of thumb 4(n/100)^{2/9} in this article.
脚注
  1. Cross-sectional data: Data observing multiple subjects at a single point in time (e.g., income data for 47 prefectures in 2024). ↩︎

  2. Panel data: Data observing multiple subjects across multiple time points (e.g., tracking income for 47 prefectures over 10 years). ↩︎

  3. Cluster-robust SE: Standard errors used when data within the same group (cluster) are correlated. For example, data for employees belonging to the same company tend to be correlated with each other. ↩︎

Discussion