Index > Course > 2021-04-01: Residuals and the QQ plot
Assignment 5 is out! Readings have been assigned (I have chapter 6).
Recall, the residual is each point’s distance from the best fit line. Rather than showing the residuals along the original Y axis, show them against the normal distribution of the residuals (Y = std. devs. from average).
This scales the data along the Y axes, converting the units into standard deviations. 95% of the data should be within 2 std devs.
In the “Rocket Fuel” data set, studentized residuals quickly reveal two points as clear outliers.
You probably don’t want to ignore them completely. If you can leave them in, maybe you should.
Sort the data points and plot them against the expected percentile for the normal distribution