Problem:
You plot weight (y) against height (x) for three of your friends and obtain the points (x1β,y1β), (x2β,y2β), (x3β,y3β). If x1β<x2β<x3β and x3ββx2β=x2ββx1β, which of the following is necessarily the slope of the line which best fits the data? "Best fits" means that the sum of the squares of the vertical distances from the data points to the line is smaller than for any other line.
Answer Choices:
A. x3ββx1βy3ββy1ββ
B. x3ββx1β(y2ββy1β)β(y3ββy2β)β
C. 2x3ββx1ββx2β2y3ββy1ββy2ββ
D. x2ββx1βy2ββy1ββ+x3ββx2βy3ββy2ββ
E. none of these
Solution:
The best fit line goes through some point (x2β,z). Claim: the correct slope for this line makes the directed vertical distances to it from (x1β,y1β) and (x3β,y3β) the same. To see this, imagine any line through (x2β,z) and rotate it until the two directed vertical distances to it are the same. Call this distance d. For any other line through (x2β,z) the directed distances to it from (x1β,y1β) and from (x3β,y3β) will be d+e and dβe for some eξ =0. Thus the sum of the squares will increase from
2d2+(zβy2β)2
to
(d+e)2+(dβe)2+(zβy2β)2=2d2+2e2+(zβy2β)2.
This proves the claim. Finally, the slope from (x1β,y1β+d) to (x3β,y3β+d) is (y3ββy1β)/(x3ββx1β), whatever d and z are!
Note. There are several formulas in statistics which give the slope of the best fit line for arbitrary data points. One can use any of these general formulas to solve this special case, but the algebra is quite involved.
Queries. What are d and z? Where was the hypothesis x3ββx2β=x2ββx1β used?