Index > Course > 2021-04-01: Residuals and the QQ plot

2021-04-01: Residuals and the QQ plot

Assignment 5 is out! Readings have been assigned (I have chapter 6).

Studentized Residuals

Recall, the residual is each point’s distance from the best fit line. Rather than showing the residuals along the original Y axis, show them against the normal distribution of the residuals (Y = std. devs. from average).

This scales the data along the Y axes, converting the units into standard deviations. 95% of the data should be within 2 std devs.

In the “Rocket Fuel” data set, studentized residuals quickly reveal two points as clear outliers.

What to do with outliers?

You probably don’t want to ignore them completely. If you can leave them in, maybe you should.

The Q-Q Plot

Sort the data points and plot them against the expected percentile for the normal distribution


Index > Course > 2021-04-01: Residuals and the QQ plot