After we sum the squares of every residual, let’s alsoĭivide the final result by the number of data points in our dataset. Let’s therefor slightly modify our metric. Price of 7 houses, but what if we predicted 70.000 houses instead? Then this error suddenly doesn’t look that bad anymore. An error of 205.000$ is really bad if we only predicted the Now that’s prettyīad! But this number is a bit tricky to interpret. This means that our function made a total error of roughly 205.000$. If we did not have the SOSR-values for f f f and h h h, how could we tell ifĪ SOSR of 42200 is very good, decent, bad, or terrible? If we take the square root of 42200, I mean, it’s a good metric, but we can’t really interpret a So it looks like the function g g g best fits our data! Now I don’t know about you, but I’m a bit The large residual has a weight three times larger than the three smaller residuals,Įven though their total error is exactly the same! If we were to take the SOAR instead, However, the SOSR in the second case would be 3 0 2 = 900 30^2 = 900 3 0 2 = 9 0 0. The same as the error of the fourth residual. R 4 r_4 r 4 is exactly r 1 + r 2 + r 3 r_1+r_2+r_3 r 1 + r 2 + r 3 , so the total error of the first three residuals is exactly
Need to take the derivative of our metric if we want to find it’s minimum, Why do we need the derivative of the SOAR? We