iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🔍

Nuances Between Interpretability and Explainability in Machine Learning

に公開

Although these two concepts are often used interchangeably, strictly speaking, there are nuances of difference as follows (or so I personally believe). If you have any thoughts or opinions, please let me know in the comments!

Interpretability

For machine learning models (which are often black boxes), being able to clarify the mechanism (itself) by which the model returns predictions.

Explainability

Regarding the predictions of a machine learning model, being able to explain why that specific prediction was returned.

High interpretability does not necessarily mean high explainability

In highly interpretable methods such as linear regression or decision trees, it is possible to calculate "features," which are the mechanisms contributing to the model's predictions. However, when an input with multiple variables having large feature values is fed into the model, it cannot explain how the prediction was reached (i.e., which variable ultimately had the most impact). Similarly, criteria such as how much a specific variable needs to change for the prediction to change cannot be explained either.

High explainability does not necessarily mean high interpretability

In recent neural network research, models have been proposed that output the reason for a prediction, such as in text, along with the prediction itself. Using such models results in high explainability because the reason for the prediction is explained for any given input. However, the underlying mechanism of why that specific reason was output remains a black box, resulting in low interpretability.

Reference

Discussion