$Questions about using matrices for finding best straight line by linear regression$

Question

$Questions about using matrices for finding best straight line by linear regression$

In examples I have seen so far which show how to find a line which best fits a set of points, the equations are set out in a matrix form and so I am learning about matrices.
To simplify suppose we have thre points then the eqautions are
x1*m +c = y1
x2*m +c = y2
x3*m + c = y3
(m is the slope, c the offset)
in matrix form we can have

$\begin{bmatrix} x_1 & 1 \\ x_2 & 1 \\ x_3 & 1 \end{bmatrix} $ $\begin{bmatrix} m \\ c \end{bmatrix} $ = $\begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix} $

so say this is
X M = Y
Then to remove the X on the left side we need to multiply X by the inverse of X, and to do that we need X to be a square matrix. If X is n x 2 then we can multiply by a matrix D say which is 2 x n to give n x n.
In the examples I have seen D is chosen to be the transpose of X.

Question 1
Why use the transpose of X? couldn't we use any matrix within reason which is 2 x n?

Question 2
How does the whole think work anyhow considering that the measured points will not be exactly on the straight line so really we should write
x1*m +c = y1 + e1
x2*m +c = y2 + e2
x3*m + c = y3 + e3
where e is the error in y.
Is the assumption that if the errors were included in the calculation then they would average out to zero?

The same questions relate to finding the best circle for a set of measured points.

Linear Algebra Matrices

Martincg

8

Report

Mathe

0

I don't understsand the last question, about the circle. Could you expand on it perhaps?
- Martincg
  
  0
  
  If you want to find the best circle fit you can adopt the same approach- Maybe I shouldn't have mentioned the ciircles! If the measured point is x,y and the arc centre is a,b and the radius is r then (x - a)^2 + (y - b)^2 = r^2 By expanding you can get the same sort of matrix arrangement but more columns for X. But again in examples I have seen the transpose of X is used to mltiply X, and again the errors in measured values are ignored.
Kav10

+1

Low bounty!
Martincg

-1

Maybe. The question seems to me like a simple one and it would take someone less than 5 minutes to read and less tha 5 minutes to answer. So that's $60 an hour. If I'm wrong I'll reconsider, but I'm not a mathemetician so my judgement is not good.

Answer

The answer is accepted.

Join Matchmaticians Affiliate Marketing Program to earn up to a 50% commission on every question that your affiliated users ask or answer.

Mathe

0

I don't understsand the last question, about the circle. Could you expand on it perhaps?

Martincg

0

If you want to find the best circle fit you can adopt the same approach- Maybe I shouldn't have mentioned the ciircles! If the measured point is x,y and the arc centre is a,b and the radius is r then (x - a)^2 + (y - b)^2 = r^2 By expanding you can get the same sort of matrix arrangement but more columns for X. But again in examples I have seen the transpose of X is used to mltiply X, and again the errors in measured values are ignored.
Martincg

0

If you want to find the best circle fit you can adopt the same approach- Maybe I shouldn't have mentioned the ciircles! If the measured point is x,y and the arc centre is a,b and the radius is r then (x - a)^2 + (y - b)^2 = r^2 By expanding you can get the same sort of matrix arrangement but more columns for X. But again in examples I have seen the transpose of X is used to mltiply X, and again the errors in measured values are ignored.
Martincg

-1

Maybe. The question seems to me like a simple one and it would take someone less than 5 minutes to read and less tha 5 minutes to answer. So that's $60 an hour. If I'm wrong I'll reconsider, but I'm not a mathemetician so my judgement is not good.

Answer 1

Answers can only be viewed under the following conditions:

The questioner was satisfied with and accepted the answer, or
The answer was evaluated as being 100% correct by the judge.

View the answer

I'll try to answer both your questions at once as they are getting at the same core point. The key observation, which you've made in the second question is "the measured points will not be exactly on the straight line". That is correct they will not be exactly on the straight line. In fact, linear regression in general consists of an overdetermined system where there are more equations than unknowns. In your simple example $n = 3$ but there are only $p = 2$ observations. From basic algebra we know that overdetermined systems almost always have no solutions. This is exactly the fact you stated when you said the points will not be on the straight line.

With that in mind, we need to ask the question "what are we actually doing when we are constructing a least squares line?". The answer is partially in the name. Let $\hat{M}$ be our coefficient estimates, then we have an estimate of $Y$ given by $\hat{Y} = X\hat{M}$. In least squares, we seek to minimize the sum of squared residuals. In matrix form, we can write this as:

$$\min_M (Y - XM)^T (Y - XM) = \min_M (Y - \hat{Y})^T (Y - \hat{Y}) = \min_M \sum_{i=1}^n (y_i - \hat{y}_i)^2 $$

So in least squares, we are trying to find a coefficient vector $M$ that minimizes the above expression. Now to understand why the solution to this problem is given by $\hat{M} = (X^T X)^{-1} X^T Y$, we minimize the above expression using standard calculus techniques of taking the gradient and setting it equal to zero. I won't go into those details here but they are commonly found, e.g. in this wikipedia article.

The process taking $Y = XM$ and solving for $M$ by multiplying both sides by $X^T$ and rearraging is just something to be considered for intuition. The more rigorous way to solve the expression is via minimization is described above. Some "reasons" to consider on why we multiply by $X^T$ and not some other matrix is that the quadratic form $X^T X$ is at least positive semi-definite and in many cases, it is positive definite. If $X^T X$ is positive definite, then it is guaranteed to have an inverse, allowing us to even rearranging the expression using $(X^T X)^{-1}$. Another point to be made is that we can write:

$$Y = XM \implies Y - XM = 0 \implies X^T(Y - XM) = 0$$

The meaning of the last equation on the right is that the residual $Y - XM$ is orthogonal to the column space of $X$. In some sense, this means that the difference between your estimate and the data is not contained in the data $X$.

These sorts of intuitive points are not exactly straightforward to make precise. In summary, when you are finding the best straight line for some data, you are actually solving a minimization problem. It happens that the "intuitive" approach of multiplying by the transpose and rearranging gives a solution that corresponds exactly to the solution of the minimization problem. There are deep reasons for this correspondence, but it goes beyond the scope of this response.

Amas

171

Martincg

+1

Thank you Amas that's a very comprehensive answer in greater depth than I was expecting.

$Questions about using matrices for finding best straight line by linear regression$

Answer

Related Questions

Search