Evaluation Methods after SL: A Comparison of R² and PCC
Hello, I'm Dang, an AI and machine learning engineer at Knowledgelabo, Inc. We provide a service called "Manageboard," which helps aggregate, analyze, and manage corporate data scattered throughout our organization. Manageboard is set to enhance its AI capabilities in the future. In my articles, I will share the challenges we encountered during our research and development.
Background
After training a model in the process of supervised learning, the next important step is to evaluate its accuracy. Quantifying how accurately the model makes predictions is essential for assessing its reliability when applied in the real world. There are various evaluation metrics and methods available, and it is important to understand which approach is most appropriate for different situations. In this article, we will compare two commonly used metrics in regression analysis: the coefficient of determination (
Coefficient of Determination (R² Score)
The coefficient of determination is a widely used metric in regression analysis to indicate the accuracy of a model's predictions. It shows the proportion of variance in the dependent variable that is explained by the model.
-
Definition
The coefficient of determination is expressed as the following formula:
R^2=1-\frac{\sum_i(y_i-\hat{y}_i)^2}{\sum_i(y_i-\overline{y})^2}
Where is the actual value of the target variable,y_i is the predicted value from the model, and\hat{y}_i is the mean value of the target variable.\overline{y} -
Range
The score ranges fromR^2 to-∞ 1
-
indicates perfect prediction accuracy, meaning the model explains all of the variance in the data.R^2=1 -
means the model does not improve over simply predicting the mean of the target variable.R^2=0 -
implies that the model is performing worse than just predicting the mean value.R^2<0
-
Sample Size
The score is dependent on the sample size. When the number of samples is small or when there is bias in the data, the results may be overly influenced by these factors. To mitigate this issue, a weightedR^2 can be used. WeightedR^2 assigns different importance to each sample, reducing the dependence on sample size or data distribution, and providing a more reliable evaluation.R^2 -
Limitations
Despite its usefulness, the score has some limitations:R^2
- Wide range: Since the score ranges from
to-∞ , small values can be difficult to interpret. In particular, negative values can make it hard to understand.1 - Constant target variable: If the target variable is constant (e.g., all samples have the same value), the
score cannot be computed. This is because the residual sum of squares in the numerator would be zero, leading to computational issues.R^2
Pearson Correlation Coefficient
When the model is linear, the Pearson correlation coefficient can also serve as an effective evaluation metric. The Pearson correlation measures the linear relationship between two variables, with values ranging from
-
Definition
The Pearson correlation coefficient is calculated as follows:
r=\frac{\sum_i(x_i-\overline{x})(y_i-\overline{y})}{\sqrt{\sum_i(x_i-\overline{x})^2}\sqrt{\sum_i(y_i-\overline{y})^2}}
Where andx_i are the values of the input and target variables, respectively,y_i and\overline{x} are the mean values of the input and target variables.\overline{y} - Range
-
indicates a perfect positive linear relationship.r=1 -
indicates a perfect negative linear relationship.r=-1 -
indicates no linear relationship.r=0
-
Advantages: Ease of interpretation
The absolute value of the Pearson correlation coefficient is between and0 , making it more intuitive to interpret than the1 score. For example, if the absolute value of the correlation coefficient is 0.9, it suggests a very strong positive relationship between the variables.R^2 -
Limitations: Assumption of Linearity
The Pearson correlation coefficient is most useful when there is a linear relationship between the variables. In the case of nonlinear relationships, the model will not be evaluated appropriately with this method.
Summary
In evaluating supervised learning models, both the
The
Both metrics are crucial for understanding model performance and identifying areas for improvement.
Discussion