:

## Solving a least-squares problem with QR decomposition

Maple

This post is motivated by a question asked by @vs140580  ( The program is making intercept zero even though There is a intercept in regression Fit (A toy code showing the error attached) ).

The problem met by @vs140580 comes from the large magnitudes of the (two) regressors and the failure to Fit/LinearFit to find the correct solution unless an undecent value of Digits is used.
This problem has been answerd by @dharr after scaling the data (which is always, when possible, a good practice) and by
myself while using explicitely the method called "Normal Equations" (see https://en.wikipedia.org/wiki/Least_squares).

The method of "Normal Equations" relies upon the inversion of a symmetric square matrix H whose dimension is equal to the number of coefficients of the model to fit.
It's well known that this method can potentially lead to matrices H extremely ill-conditionned, and thus face severe numerical problems (the most common situation being the fit of a high degree polynomial).

• In English: http://www.math.kent.edu/~reichel/courses/intr.num.comp.1/fall09/lecture4/lecture4.pdf
• In French: https://moodle.utc.fr/pluginfile.php/24407/mod_resource/content/5/MT09-ch3_ecran.pdfI

The attached file illustrates how the QR decomposition method works.
The test case is @vs140580's.

Maybe the development team could enhance Fit/LinearFit in future versions by adding an option which specifies what method is to be used?

 > restart:
 > with(Statistics):
 > interface(version)
 (1)
 > Data := Matrix([[4.74593554708566, 11385427.62, 2735660038000], [4.58252830679671, 25469809.77, 12833885700000], [4.29311160501838, 1079325200, 11411813200000000], [4.24176959154225, 1428647556, 18918585950000000], [5.17263072694618, 1428647556, 18918585950000000], [4.39351114955735, 1877950416, 30746202150000000], [4.39599006758777, 1428647556, 18918585950000000], [5.79317412396815, 2448320309, 49065217290000000], [4.48293612651735, 2448320309, 49065217290000000], [4.19990181982522, 2448320309, 49065217290000000], [5.73518217699046, 1856333905, 30648714900000000], [4.67943831980476, 3071210420, 75995866910000000], [4.215240105336, 2390089264, 48670072110000000], [4.41566877563247, 3049877383, 75854074610000000], [4.77780395369828, 2910469403, 74061327950000000], [4.96617430604669, 1416936352, 18891734280000000], [4.36131111330988, 1416936352, 18891734280000000], [5.17783192063198, 1079325200, 11411813200000000], [4.998266287191, 1067513353, 11402362980000000], [4.23366152474871, 2389517120, 48661380410000000], [4.58252830679671, 758079709.3, 5636151969000000], [6.82390874094432, 1304393838, 14240754750000000], [4.24176959154225, 912963601.2, 8621914602000000], [4.52432881167557, 573965555.4, 3535351888000000], [4.84133601918601, 573965555.4, 3535351888000000], [6.88605664769316, 732571773.2, 5558875538000000], [4.35575841415627, 1203944381, 13430693320000000], [4.42527441640593, 955277678, 8795128298000000], [6.82390874094432, 997591754.9, 8968341995000000], [4.35144484433733, 143039477.1, 305355143300000]]):
 > # Direct use of LinearFit. # # As far as I know LinearFit is based on the resolution of the "Normal Equations" # (see further down), a system of equations that is known to be ill-conditioned # when regressors have large values (in particular when polynomial regression # is used). X := Data[.., [2, 3]]: Y := Data[.., 1]: LinearFit(C1+C2*v+C3*w, X, Y, [v, w]);
 (2)
 > # For roundoff issues the 3-by-3 matrix involved in the "Normal Equations" (NE) # appears to of rank < 3. # The rank of this matrix is rqual to 1+rank(X) and one can easily verify that # the 2 columns of X are linearly independent: LinearAlgebra:-LinearSolve(X, Vector(numelems(Y), 0)); LinearAlgebra:-Rank(X);
 (3)
 > # Solve the least squares problem by using explicitely the NE. # # To account for an intercept we augment X by a vector column of "1" # traditionally put in column one. Z := `<|>`(Vector(numelems(Y), 1), X):   A := (Z^+ . Z)^(-1) . Z^+ . Y;          # Normal Equations
 (4)
 > # What is the rank of Z? # Due to the scale of compared to "1", Rank fails to return the good value # of rank(Z), which is obviously equal to rank(X)+1. LinearAlgebra:-LinearSolve(Z, Vector(numelems(Y), 0)); LinearAlgebra:-Rank(Z);
 (5)

A WORKAROUND : SCALING THE DATA

 > model := unapply( LinearFit(C1+C2*v+C3*w, Scale(X), Scale(Y), [v, w]), [v, w] );
 (6)
 > mX, sX := (Mean, StandardDeviation)(X); mY, sY := (Mean, StandardDeviation)(Y);
 (7)
 > MODEL := model((x1-mX[1])/sX[1], (x2-mX[2])/sX[2]) * sY + mY
 (8)
 > # Check that the vector of regression coefficients is almost equal to A found above # relative error lesst than 10^(-14) A_from_scaling       := < coeffs(MODEL) >: Relative_Discrepancy := (A_from_scaling - A) /~ A
 (9)

THE QR DECOMPOSITION  (applied on raw data)

The QR decomposition, as well as Given's rotation method, are two alternatives to the the NE method
to find the vector of regression coefficients.
Both of them are known to be less sensitive to the magnitudes of the regressors and do nt require (not
always) a scaling of the data (which can be quite complex with polynomial regression or when some
transformation is used to liearize the statistical model, for instanc Y=a*exp(b*X) --> log(Y)=log(a)+b*X).

 > N := numelems(Y); P := numelems(Z[1]);
 (10)
 > # Perform the QR decomposition of Z. Q, R := LinearAlgebra:-QRDecomposition(Z, fullspan);
 (11)
 > # Let C the column vector of length P defined by: C := (Q^+ . Y)[1..P];
 (12)
 > # Then the vector of regression coefficients is given by: A_QR                 := (R[1..P, 1..P])^(-1) . C; Relative_Discrepancy := (A_QR - A) /~ A
 (13)
 > # The matrix H = Z^+ . Z writes H                    := Z^+ . Z: H_QR                 := R^+ . Q^+ . Q . R: Relative_Discrepancy := (H_QR - H) /~ H
 (14)
 > # H_QR expression is required to obtain the covariance matrix of the regression coefficients.