In a previous post, we worked through the mechanics of linear regression. Today we will cover how to interpret our results and their limitations. Last time, we calculated a best fit line of:
Y = 1.6x – 2, or equivalently, monthly sales  = 1.6 *(median home value) – 2
Remember that both monthly sales and median home value are in units of $100,000. So, on average, we would expect a $100,00 increase in median home values to increase our monthly sales by $160,000. In our example, to keep the number of calculations under control, I only used five pairs of data points. To consider our results reliable, we would need more data points. There is no precise consensus on how many, but 30 or more is a good starting point. I prefer 42 because it worn by Jackie Robinson, and it is, after all, the “Answer to The Ultimate Question of Life, the Universe, and Everything”. The calculations needed to analyze a few thousand data points by hand would be tedious, but we have computers. A harder problem is when we do not have enough data points. We have to take our results with a grain of salt, or look to other tools for our analysis.
Other potential problems we have to look out for are:
So with all of these problems, why do we bother? With modern statistical software we are able to run thousands of regressions in a matter of seconds
and the results are accurate and easy to interpret. Advanced techniques of regression have been developed to deal with some of the problems we have covered here and other more technical difficulties like autocorrelation, multicollinearity, and heteroscedasticity, but we have to still approach our results with care and at least a little skepticism.
‍
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Non eget pharetra nibh mi, neque, purus.