*********************************************************************** Fortran 77 Program of the PARTIAL TOTAL LEAST SQUARES ALGORITHM. ------------------------------------------ Sabine VAN HUFFEL ESAT Laboratory, KU Leuven. Kardinaal Mercierlaan 94, 3030 Heverlee, Belgium *********************************************************************** SUBROUTINE : PTLS 1 PURPOSE: The subroutine PTLS solves the Total Least Squares (TLS) problem by using a Partial Singular Value Decomposition (PSVD), hereby improving considerably the computational efficiency with respect to the classical TLS algorithm. The TLS problem assumes an overdetermined set of linear equations AX = B, where both the data matrix A as well as the observation matrix B are inaccurate. If the perturbations D on the data [A;B] have zero mean and if their covariance matrix E(D'D) with E the expected value operator, equals the identity matrix up to an unknown scaling factor (e.g. when all the errors are independent and equally sized), then the routine computes a strongly consistent estimate of the true solution of the corresponding unperturbed set. The routine also solves determined and underdetermined sets of equations by computing the minimum norm solution. It is assumed that all preprocessing measures (scaling, coordinate transformations, whitening, ... ) of the data have been performed in advance. 2 SPECIFICATION: SUBROUTINE PTLS(C, LDC, M, N, L, RANK, THETA, X, LDX, Q, INUL, WRK, IWRK, LWRK, TOL1, TOL2, IERR, IWARN) INTEGER LDC, M, N, L, RANK, LDX, IERR, IWARN, IWRK(N+L) DOUBLE PRECISION THETA, TOL1, TOL2 DOUBLE PRECISION C(LDC,N+L), X(LDX,L), Q(min(M,N+L)+min(M+1,N+L)), WRK(*) LOGICAL INUL(N+L), LWRK(N+L) 3 ARGUMENT LIST: 3.1 ARGUMENTS IN C - DOUBLE PRECISION array of DIMENSION (LDC,N+L). The leading M x (N + L) part of this array contains A and B as follows: The first N columns contain the data matrix A, and the last L columns the observation matrix B (right-hand sides). NOTE that this array is overwritten. LDC - INTEGER. The declared leading dimension of the array C. LDC >= max(M,(N+L)). M - INTEGER. The number of rows in the data matrix A and the observation matrix B. N - INTEGER. The number of columns in the data matrix A. L - INTEGER. The number of columns in the observation matrix B. RANK - INTEGER. The rank of the TLS approximation [A+DA;B+DB]. On entry there are two possibilities: i) RANK < 0: the rank is to be computed by the routine. ii) RANK >= 0: specifies the given rank. RANK <= min(M,N). NOTE that RANK may be overwritten, if C has multiple singular values or if the upper triangular matrix F is singular (see 6 METHOD DESCRIPTION). THETA - DOUBLE PRECISION. On entry, there are two possibilities (depending on RANK): i) RANK < 0: the rank of the TLS approximation [A+DA;B+DB] is computed from THETA and equals min(M,N+L) - d, where d equals the number of singular values of [A;B] <= THETA. THETA >= 0. ii) RANK >= 0: THETA is an initial estimate for computing a lower bound of the RANK largest singular values of [A;B]. If not available, assign a negative value (< 0) to THETA. NOTE that THETA is overwritten. 3.2 ARGUMENTS OUT C - DOUBLE PRECISION array of DIMENSION (LDC,N+L). The first N + L components of the columns of C whose index i corresponds with an INUL(i) = .TRUE., are the N + L - RANK base vectors of the right singular subspace corresponding to the singular values of C which are <= THETA. The TLS solution is computed from these vectors. RANK - INTEGER. RANK specifies the rank of [A+DA;B+DB]. If not specified by the user, then RANK is computed by the routine. If specified by the user, then the specified RANK is changed by the routine if the RANK-th and the (RANK+1)-th singular value of [A;B] coincide up to the precision given by TOL1, or if the upper triangular matrix F (see 6 METHOD DESCRIP- TION) is singular. THETA - DOUBLE PRECISION. If RANK >= 0, then THETA specifies the computed bound such that precisely RANK singular values of [A;B] are greater than THETA + TOL1. X - DOUBLE PRECISION array of DIMENSION (LDX,L). The leading N x L part of this array contains the solutions to the TLS problems specified by A and B. LDX - INTEGER. The declared leading dimension of the array X . LDX >= N. Q - DOUBLE PRECISION array of DIMENSION (min(M,N+L)+min(M+1,N+L)) Returns the partially diagonalized bidiagonal computed from C, at the moment that the desired singular subspace has been found. The first p = min(M,N+L) entries of Q contain the diagonal elements q(1),...,q(p) and the entries Q(p+2),...,Q(p+s) (with s = min(M+1,N+L)) contain the superdiagonal elements e(2),...,e(s). Q(p+1) = 0. INUL - LOGICAL array of DIMENSION (N+L). The indices of the elements of INUL with value .TRUE. indicate the columns in C containing the base vectors of the right singular subspace of C from which the TLS solution has been computed. 3.3 WORK SPACE WRK - DOUBLE PRECISION array of DIMENSION (t). The dimension t = M + N + L + (N+L)*(N+L-1)/2 if M >= N + L; M + N + L + M*(N+L-(M+1)/2) if M < N + L. IWRK - INTEGER array of DIMENSION (N+L). LWRK - LOGICAL array of DIMENSION (N+L). 3.4 TOLERANCES TOL1 - DOUBLE PRECISION. This parameter defines the multiplicity of singular values by considering all singular values within an interval of length TOL1 as coinciding. TOL1 is used in checking how many singu- lar values are <= THETA. Also in computing an appropriate upper bound THETA by a bisection method, TOL1 is used as stop criterion defining the minimal subinterval length. According to the numerical properties of the SVD, TOL1 must be >= !!C!! x EPS where !!C!! denotes the L2-norm and EPS is the machine precision. TOL2 - DOUBLE PRECISION. Working precision for the computations in PTLS. This parame- ter specifies that elements of matrices used in the computa- tion, which are <= TOL2 in absolute value, are considered to be zero. 3.6 ERROR INDICATORS IERR - INTEGER. On return, IERR contains 0 unless the subroutine has failed. IWARN - INTEGER. On return, IWARN contains 0 unless RANK has been lowered by the routine. 4 ERROR INDICATORS and WARNINGS: Errors detected by the routine: IERR = 0: successful completion. 1: number M of rows of array C = [A;B] smaller than 1. 2: number N of columns of matrix A smaller than 1. 3: number L of columns of matrix B smaller than 1. 4: leading dimension LDC of array C is smaller than max(M,N+L). 5: leading dimension LDX of array X is smaller than N. 6: rank of the TLS approximation [A+DA;B+DB] is larger than min(M,N). 7: the parameters RANK and THETA are both negative (< 0). 8: tolerance TOL1 is negative. 9: tolerance TOL2 is negative. 10: maximum number of QR/QL iteration steps (50) exceeded. Warnings given by the routine: IWARN = 0: no warnings. 1: the rank of matrix C has been lowered because a singular value of multiplicity > 1 has been found. 2: the rank of matrix C has been lowered because a singular upper triangular matrix F has been found. 5 EXTERNAL SUBROUTINES and FUNCTIONS: DAXPY, DDOT, DCOPY from BLAS [6]; DQRDC from LINPACK [7]; BIDIAG, CANCEL, HOUSH, INIT, QRQL. 6 METHOD DESCRIPTION: Let C = [A;B] denote the matrix formed by adjoining the columns of B to the columns of A on the right. Total Least Squares (TLS) definition: Given matrices A and B, find a matrix X satisfying (A + DA) X = B + DB, where A and DA are M x N matrices, B and DB are M x L matrices, X is an N x L matrix. The solution X must be such that the Frobenius norm of [DA;DB] is a minimum and each column of B + DB is in the range of A + DA. Whenever the solution is not unique, PTLS singles out the mini- mum norm solution X. Let V denote the right singular subspace of C. With the routine PTLS the TLS solution can be computed from any orthogonal basis of the subspace of V corresponding to the smallest singular values of C. Therefore, Partial Singular Value Decomposition (PSVD) can be used instead of classical SVD. The dimension of this subspace of V may be determined by the rank of C or by an upper bound for those smallest singular values. The PTLS algorithm proceeds as follows (see [2 - 5]): Step 1: Bidiagonalization phase ----------------------- 1.a): If M >= 5 x (N + L)/3, transform C into upper triangular form R by Householder transformations. 1.b): Transform C (or R) into bidiagonal form (p = min(M,N+L)): !q(1) e(2) 0 ... 0 ! (0) ! 0 q(2) e(3) . ! J = ! . . ! ! . e(p)! ! 0 ... q(p)! if M >= N + L, or !q(1) e(2) 0 ... 0 0 ! (0) ! 0 q(2) e(3) . . ! J = ! . . . ! ! . e(p) . ! ! 0 ... q(p) e(p+1)! if M < N + L, using Householder transformations. 1.c): Initialize the right singular base matrix with the identity matrix. 1.d): If M < N + L, then cancel e(p+1) by Givens rotations and reduce the bidiagonal to M x M . Accumulate the Givens rotations in V. Step 2: Partial diagonalization phase ----------------------------- If the upper bound THETA is not given, then compute THETA such that precisely p - RANK singular values (p=min(M,N+L)) of the bidiagonal are <= THETA, using a bisection method [5]. Diagonalize the given bidiagonal J partially, using either QL itera- tions (if the upper left diagonal element of the considered subbi- diagonal > the lower right diagonal element) or QR iterations, such such that J is splitted into unreduced subbidiagonals whose singular values are either all larger than THETA or are all less than or equal to THETA. Accumulate the Givens rotations in V. Step 3: Back transformation phase ------------------------- Apply the Householder transformations of step 1.b) onto the base vectors of V associated with the subbidiagonals with all singular values <= THETA. Step 4: Computation of F and Y ---------------------- Let V2 be the matrix of the columns of V corresponding to the (N + L - RANK) smallest singular values of C. Compute with Householder transformations the matrices F and Y such that: !VH Y! V2 x Q = ! ! !0 F! with Q being an orthogonal matrix, VH an N x (N - RANK), Y an N x L and F an L x L upper triangular matrix. If F is singular, then lower the rank RANK by one and repeat the steps 2, 3 and 4. Step 5: Computation of the TLS solution ------------------------------- If F is not singular then the solutions X are obtained by solving the following equations by forward elimination: X F = -Y. Notes: - In case the rank must be lowered in Step 4, some additional base vectors must be computed in Step 2. The additional computations are kept to a minimum. - If the rank RANK is lowered in Step 4 but the multiplicity of the RANK-th singular value is > 1, the rank is further lowered with its multiplicity defined by the parameter TOL1. This is done in the be- ginning of Step 2 by the routine which estimates THETA using a bi- section method. 7 REFERENCES : [1] G.H. Golub and C.F. Van Loan, An Analysis of the Total Least Squares Problem. SIAM J. Numer. Anal., 17 (1980), 883-893. [2] S. Van Huffel, J. Vandewalle and A. Haegemans, An efficient and reliable algorithm for computing the singular subspace of a matrix associated with its smallest singular values. J. Comput. and Applied Math., 19 (1987), 313 - 330. [3] S. Van Huffel, Analysis of the total least squares problem and its use in parameter estimation. Doctoral dissertation, Dept. of Electr. Eng., K.U.Leuven, June 1987. [4] T.F.Chan, An improved algorithm for computing the singular value decomposition. ACM Trans. Math. Software, 8 (1982), 72-83. [5] S. Van Huffel and J. Vandewalle, The Partial Total Least Squares Algorithm. J. Comput. and Applied Math., 21 (1988),to appear. [6] C.L. Lawson, R.J. Hanson, F.T. Krogh and O.R. Kincaid, Basic Lin- ear Algebra Subprograms for FORTRAN Usage. ACM Trans. Math. Soft- ware, 5 (1979), 308-323. [7] J.J. Dongarra, J.R. Bunch, C.B. Moler and G.W. Stewart, LINPACK User's Guide. SIAM, Philadelphia (1979). 8 NUMERICAL ASPECTS: The computational efficiency of the PTLS algorithm compared with the classical TLS algorithm (see [2 - 5]) is obtained by making use of PSVD (see [1]) instead of performing the entire SVD. Depending on the gap between the RANK-th and the (RANK+1)-th singular value of C, the number N + L - RANK of base vectors to be computed w.r.t. the column dimension N + L of C and the desired accuracy TOL2, the PTLS algorithm is approximately two times faster than the classi- cal TLS algorithm. However PTLS requires more memory space, namely: (N + L) x (N + L - 1)/2 if M >= N + L, M x (N + L - (M + 1)/2) if M < N + L, extra storage locations. This is because the Householder transforma- tions performed onto the rows of C in the bidiagonalization phase (see step 1) must be kept until the end (step 5). 9 EXAMPLE: 9.1 PROGRAM TEXT PARAMETER (IN = 5, IOUT = 6) PARAMETER (LDC = 30, LDX = 10) INTEGER M, N, L, NL, RANK, IERR, IWARN, IWRK(11) LOGICAL INUL(11), LWRK(11) DOUBLE PRECISION C(LDC,11), X(LDX,10), Q(22), * WRK(44), * THETA, TOL1, TOL2 INTEGER I, J, K INTRINSIC MIN READ(IN,12) M, N, L, RANK, THETA WRITE(IOUT,13) M, N, L, RANK, THETA NL = N + L WRITE(IOUT,9) DO 1 I = 1, M READ(IN,5) (C(I,J), J=1,NL) 1 CONTINUE WRITE(IOUT,6) DO 2 I = 1, M WRITE(IOUT,7) (C(I,J), J=1,NL) 2 CONTINUE TOL1 = 1.0D-08 TOL2 = 1.0D-10 CALL PTLS(C, LDC, M, N, L, RANK, THETA, X, LDX, Q, INUL, WRK, * IWRK, LWRK, TOL1, TOL2, IERR, IWARN) WRITE(IOUT,8) IERR, IWARN WRITE(IOUT,18) RANK, THETA K = MIN(M,NL) WRITE(IOUT,10) (Q(I), I = 1, K) WRITE(IOUT,11) (Q(K+I), I = 2, K) WRITE(IOUT,14) K = 0 DO 3 I = 1, NL IF (INUL(I)) THEN K = K + 1 WRITE(IOUT,15) K, I WRITE(IOUT,7) (C(J,I), J = 1, NL) END IF 3 CONTINUE WRITE(IOUT,16) DO 4 I = 1, L WRITE(IOUT,17) I, (X(J,I), J = 1, N) 4 CONTINUE STOP 5 FORMAT(4D15.0) 6 FORMAT(/' ',5X,'C(*,I)',8X,'C(*,I+1)',7X,'C(*,I+2)',7X, * 'C(*,I+3)') 7 FORMAT(' ',4D15.6) 8 FORMAT(/' IERR = ',I3,' IWARN = ',I3) 9 FORMAT(/' PARTIAL TOTAL LEAST SQUARES TEST PROGRAM'/1X,40('-')/) 10 FORMAT(/' DIAGONAL OF THE PARTIALLY DIAGONALIZED BIDIAGONAL=' * /(1X,4D15.6)) 11 FORMAT(/' SUPERDIAGONAL OF THE PARTIALLY DIAGONALIZED ', * 'BIDIAGONAL='/(1X,4D15.6)) 12 FORMAT(4I3,D12.5) 13 FORMAT(/' M =',I3,' N =',I3,' L =',I3, ' RANK =',I3, * ' THETA =',D12.5) 14 FORMAT(/' BASIS OF THE COMPUTED RIGHT SINGULAR SUBSPACE :'/1X, * 45('-')) 15 FORMAT(' THE ',I2,'-TH BASE VECTOR V(*,',I2,') =') 16 FORMAT(/' TLS SOLUTION :'/1X,12('*')) 17 FORMAT(/' X(*,',I2,') = '/(' ',4D15.6)) 18 FORMAT(/' COMPUTED RANK =',I3,4X,' COMPUTED BOUND THETA = ', * D12.5) END 9.2 PROGRAM DATA 6 3 1 -1 1.00000D-03 0.80010002D+00 0.39985167D+00 0.60005390D+00 0.89999446D+00 0.29996484D+00 0.69990689D+00 0.39997269D+00 0.82997570D+00 0.49994235D+00 0.60003167D+00 0.20012361D+00 0.79011189D+00 0.90013643D+00 0.20016919D+00 0.79995025D+00 0.85002662D+00 0.39998539D+00 0.80006338D+00 0.49985474D+00 0.99016399D+00 0.20002274D+00 0.90007114D+00 0.70009777D+00 0.10299439D+01 9.3 PROGRAM RESULTS M = 6 N = 3 L = 1 RANK = -1 THETA = 0.10000D-02 PARTIAL TOTAL LEAST SQUARES TEST PROGRAM ---------------------------------------- C(*,I) C(*,I+1) C(*,I+2) C(*,I+3) 0.800100D+00 0.399852D+00 0.600054D+00 0.899994D+00 0.299965D+00 0.699907D+00 0.399973D+00 0.829976D+00 0.499942D+00 0.600032D+00 0.200124D+00 0.790112D+00 0.900136D+00 0.200169D+00 0.799950D+00 0.850027D+00 0.399985D+00 0.800063D+00 0.499855D+00 0.990164D+00 0.200023D+00 0.900071D+00 0.700098D+00 0.102994D+01 IERR = 0 IWARN = 0 COMPUTED RANK = 3 COMPUTED BOUND THETA = 0.10000D-02 DIAGONAL OF THE PARTIALLY DIAGONALIZED BIDIAGONAL= 0.322802D+01 0.871401D+00 0.369809D+00 0.128626D-03 SUPERDIAGONAL OF THE PARTIALLY DIAGONALIZED BIDIAGONAL= 0.286516D-01 0.167536D-01 -0.739580D-22 BASIS OF THE COMPUTED RIGHT SINGULAR SUBSPACE : --------------------------------------------- THE 1-TH BASE VECTOR V(*, 4) = -0.355483D+00 -0.568663D+00 -0.212821D+00 0.710606D+00 TLS SOLUTION : ************ X(*, 1) = 0.500254D+00 0.800251D+00 0.299492D+00 *********************************************************************** 1988, February 15.