Evaluation Methods after SL: A Comparison of R² and PCC

2024/11/15に公開

Hello, I'm Dang, an AI and machine learning engineer at Knowledgelabo, Inc. We provide a service called "Manageboard," which helps aggregate, analyze, and manage corporate data scattered throughout our organization. Manageboard is set to enhance its AI capabilities in the future. In my articles, I will share the challenges we encountered during our research and development.

Background

After training a model in the process of supervised learning, the next important step is to evaluate its accuracy. Quantifying how accurately the model makes predictions is essential for assessing its reliability when applied in the real world. There are various evaluation metrics and methods available, and it is important to understand which approach is most appropriate for different situations. In this article, we will compare two commonly used metrics in regression analysis: the coefficient of determination (R^2 score) and the Pearson correlation coefficient.

Coefficient of Determination (R² Score)

The coefficient of determination is a widely used metric in regression analysis to indicate the accuracy of a model's predictions. It shows the proportion of variance in the dependent variable that is explained by the model.

  1. Definition
    The coefficient of determination is expressed as the following formula:
    R^2=1-\frac{\sum_i(y_i-\hat{y}_i)^2}{\sum_i(y_i-\overline{y})^2}

    Where y_i is the actual value of the target variable, \hat{y}_i is the predicted value from the model, and \overline{y} is the mean value of the target variable.
  2. Range
    The R^2 score ranges from -∞ to 1
  • R^2=1 indicates perfect prediction accuracy, meaning the model explains all of the variance in the data.
  • R^2=0 means the model does not improve over simply predicting the mean of the target variable.
  • R^2<0 implies that the model is performing worse than just predicting the mean value.
  1. Sample Size
    The R^2 score is dependent on the sample size. When the number of samples is small or when there is bias in the data, the results may be overly influenced by these factors. To mitigate this issue, a weighted R^2 can be used. Weighted R^2 assigns different importance to each sample, reducing the dependence on sample size or data distribution, and providing a more reliable evaluation.
  2. Limitations
    Despite its usefulness, the R^2 score has some limitations:
  • Wide range: Since the score ranges from -∞ to 1, small values can be difficult to interpret. In particular, negative values can make it hard to understand.
  • Constant target variable: If the target variable is constant (e.g., all samples have the same value), the R^2 score cannot be computed. This is because the residual sum of squares in the numerator would be zero, leading to computational issues.

Pearson Correlation Coefficient

When the model is linear, the Pearson correlation coefficient can also serve as an effective evaluation metric. The Pearson correlation measures the linear relationship between two variables, with values ranging from -1 to 1.

  1. Definition
    The Pearson correlation coefficient is calculated as follows:
    r=\frac{\sum_i(x_i-\overline{x})(y_i-\overline{y})}{\sqrt{\sum_i(x_i-\overline{x})^2}\sqrt{\sum_i(y_i-\overline{y})^2}}

    Where x_i and y_i are the values of the input and target variables, respectively, \overline{x} and \overline{y} are the mean values of the input and target variables.
  2. Range
  • r=1 indicates a perfect positive linear relationship.
  • r=-1 indicates a perfect negative linear relationship.
  • r=0 indicates no linear relationship.
  1. Advantages: Ease of interpretation
    The absolute value of the Pearson correlation coefficient is between 0 and 1, making it more intuitive to interpret than the R^2 score. For example, if the absolute value of the correlation coefficient is 0.9, it suggests a very strong positive relationship between the variables.
  2. Limitations: Assumption of Linearity
    The Pearson correlation coefficient is most useful when there is a linear relationship between the variables. In the case of nonlinear relationships, the model will not be evaluated appropriately with this method.

Summary

In evaluating supervised learning models, both the R^2 score and Pearson correlation coefficient are useful metrics. Particularly for regression problems, both are powerful tools for assessing model performance, but it's important to choose the appropriate metric based on the context and the model.

The R^2 score is widely used as it indicates how much of the variation in the target variable can be explained by the model. However, due to its wide range and potential difficulty in interpretation, especially with negative values, the Pearson correlation coefficient is often more intuitive for linear regression models. The coefficient's range from 0 and 1 makes it easier to interpret, providing clearer insights into the model's performance.

Both metrics are crucial for understanding model performance and identifying areas for improvement.

Discussion