Problem:
You plot weight (y) against height (x) for three of your friends and obtain the points (x1​,y1​), (x2​,y2​), (x3​,y3​). If x1​<x2​<x3​ and x3​−x2​=x2​−x1​, which of the following is necessarily the slope of the line which best fits the data? "Best fits" means that the sum of the squares of the vertical distances from the data points to the line is smaller than for any other line.
Answer Choices:
A. x3​−x1​y3​−y1​​
B. x3​−x1​(y2​−y1​)−(y3​−y2​)​
C. 2x3​−x1​−x2​2y3​−y1​−y2​​
D. x2​−x1​y2​−y1​​+x3​−x2​y3​−y2​​
E. none of these
Solution:
The best fit line goes through some point (x2​,z). Claim: the correct slope for this line makes the directed vertical distances to it from (x1​,y1​) and (x3​,y3​) the same. To see this, imagine any line through (x2​,z) and rotate it until the two directed vertical distances to it are the same. Call this distance d. For any other line through (x2​,z) the directed distances to it from (x1​,y1​) and from (x3​,y3​) will be d+e and d−e for some eî€ =0. Thus the sum of the squares will increase from
2d2+(z−y2​)2
to
(d+e)2+(d−e)2+(z−y2​)2=2d2+2e2+(z−y2​)2.
This proves the claim. Finally, the slope from (x1​,y1​+d) to (x3​,y3​+d) is (y3​−y1​)/(x3​−x1​), whatever d and z are!
Note. There are several formulas in statistics which give the slope of the best fit line for arbitrary data points. One can use any of these general formulas to solve this special case, but the algebra is quite involved.
Queries. What are d and z? Where was the hypothesis x3​−x2​=x2​−x1​ used?