CSIRO

Software from Alan J. Miller

Index
Applied statistics algorithms
Quadruple precision code (F)
Quadruple precision
Random
Toms

[Subset Selection | Random Number Generation | Quad Precision | Applied Statistics Algorithms | Logistic Regression | TOMS Algorithms | Naval Surface Warfare Center Code | Miscellaneous | 10-Byte Reals for NAS FortranPlus | F code | Linear Least Squares | Links ]

N.B. Most of this software is compatible with Lahey's ELF90 compiler, and hence should be compatible with any full Fortran 90 or 95 compiler.


Subset Selection in Regression

The 2nd edition of my book on this subject was published by CRC Press (Chapman & Hall) in April 2002 (ISBN 1-58488-171-2). Dr. David Smith from the Medical College of Georgia, USA, has notified me of certain missing references. Here they are in postscript and pdf formats.

lsq.f90 is the latest version of my uncontrained weighted least-squares module. It is an upgraded version of Applied Statistics algorithm AS 274. It uses planar rotations to produce an upper-triangular factorization. The routine INCLUD is called once for each case in the data set. It is suitable for situations in which the least-squares calculations have to be updated each time more observations become available. It has routines for automatically setting up tolerances and testing for singularities. It uses a version of rank-revealing QR decomposition for this. This code is now compatible with ELF90.

If results for a subset of predictor variables are required, those variables are moved to the first positions, and variables to be excluded are ordered after them. Routine VMOVE moves one variable; routine REORDR re-orders the variables so that those listed are in the first positions, though not necessarily in the order specified.

For further details on how to use the module and on methods of least-squares calculation refer to the document

There is a DEMO program which uses a simple data set fuelcons.dat, and a nasty test program test1.f90.
If you want to see more tests then download the zip file tests.zip and the set of data files used by these tests in testdata.

  • find_sub.f90 is a module for finding subsets of variables using a variety of different algorithms.
  • subset.f90 is a driver program which uses the above two modules. I have also made available the data set pollute.dat of mortality rates against socio-economic, meteorological and pollution variables for 60 statistical areas in the USA.

For subset selection using the L1-norm, that is minimizing the sum of absolute residuals, here is toms615.f90 which is a translation of TOMS algorithm 615 to make it ELF90 compatible. There is also a driver program test615.f90, some test data test615.dat, and the expected output test615.out.

For linear regression, but when the regression coefficients must be positive or zero, there is the Lawson & Hanson non-negative least-squares routine nnls.f90 N.B. Two call arguments have been removed from the Fortran 77 version. This routine is ELF90-compatible.
I have also added my own nonnegls.f90 routine which is called after a QR-factorization has been formed using module LSQ. The file t_nnls.f90 uses both of these routines.


Random Numbers

For code for random number generation from the uniform distribution, and others, including normal, exponential, Poisson, binomial, gamma and others, Click here


Quadruple precision

For code for quadruple precision arithmetic, using pairs of double precision numbers, Click here


Some Applied Statistics Algorithms

For many years, the Royal Statistical Society published algorithms in its journal `Applied Statistics'. I have translated a few of these to F90, Click here


Logistic Regression


I have received several requests for Fortran code to perform logistic regression, that is to fit:

p = F/(1 + F)
where
p = the probability that a case is in one of two categories
F = exp(b0 + b1.X1 + b2.X2 + ... + bk.Xk)
X1, X2, ..., Xk is a set of k predictors, and
b0, b1, b2, ..., bk is a set of coefficients to be fitted.

  • logistic.f90 Module for performing the weighted least squares calculations. It requires my least squares module lsq.f90
  • t_lgstc1.f90 Driver program which uses the data set birthwt.dat from Appendix 1 of Hosmer & Lemeshow's book `Applied Logistic Regression'.
  • t_lgstc2.f90 Driver program which uses the data set surgical.dat. This illustrates the use of logistic regression with grouped data. There is an error in the published data (deliberately NOT corrected here) which has caused grief with several well-known statistical packages.
  • t_lgstc3.f90 Driver program which uses the data set clearcut.dat. This illustrates the case in which there is a linear boundary such that all cases on one side are in one category, and all cases on the other side are in the other category.
  • se_lgstc.f90 The standard errors reported by logistic.f90 are often larger than those reported by other packages for logistic regression. This simple simulation program shows that those reported here are about right. It also illustrates the bias in the slope parameters (they are always biased towards being too large).

Miscellaneous TOMS (and CACM) algorithms

N.B. I have been asked to provide a link to the copyright policy of the ACM. Loosely paraphrased, this allows use, and modification, of the TOMS algorithms for most non-commercial purposes. It also emphasizes that the ACM accepts no responsibility for the accuracy of the code.


I have updated some of the Transactions on Mathematical Software (TOMS) algorithms to Fortran 90. Click here


Code converted from the

Naval Surface Warfare Center Math. Library


  • cgamma.f90 Complex gamma function.
  • erf.f90 The error function.
  • dcerf.f90 Complex error function & its complement.
  • cexpli.f90 Exponential integral for complex argument.
  • cbsslj.f90 Complex Bessel function J_{\nu}(z) where both the argument, z, and the order, \nu, are complex.
  • dple.f90 Solution of systems of linear equations using the Henderson-Wassyng partial pivot algorithm. Includes a test program to solve 100 simultaneous equations. The user must provide a subroutine to supply any requested row of the matrix.
  • dspslv.f90 Solution of sparse systems of linear equations using Gaussian elimination.
  • dsvdc.f90 Calculates the singular-value decomposition (SVD) of a real matrix. There is also a test program: --
  • t_svd.f90
  • fft.f90 Fast Fourier Transform (FFT) for any length of series which has no prime factor greater than 23. Also the inverse and multivariate FFT.
  • hbrd.f90 Solve sets of non-linear equations using Powell's Hybrid algorithm.
  • specfunc.zip A zip file containing all of the special functions from the NSWC library.
  • polyarea.f90 Calculates the area of a polygon.
  • p_intcpt.f90 Finds the crossing points of a finite line and a polygon.
  • bnd_solv.f90 Solve banded linear equations using compact storage of the banded matrix. There is a complex version of this -- cbnd_slv.f90
  • big_solv.f90 Solves a large set of n general linear equations using out-of-core methods, requiring storage for about n^2/4 values.
  • qagi.f90 Adaptive one-dimensional integration over infinite or semi-infinite ranges (adapted from QUADPACK).
  • qxgs.f90 Adaptive one-dimensional integration over finite ranges (adapted from TOMS algorithm 691).
  • inc_gam.f90 The incomplete gamma function and its inverse. Test program: t_incgam.f90 based on TOMS algorithm 654.
  • constant.f90 This is a module of constants used by some of the functions in the NSWC collection of routines.
  • qsortd.f90 A subroutine which implements a quicksort algorithm without changing the input array. It returns an integer array containing the order.
  • smplx.f90 Linear programming using the simplex algorithm. This is a translation of the Fortran 66 program from the NSWC (Naval Surface Warfare Center) library written by Alfred Morris. There is also a simple test program t_smplx.f90. Needs the module constant.f90 which defines the precision and returns certain machine constants.
  • dmexp.f90 Calculates the exponential of a matrix.
  • fprob.f90 Cumulative F distribution. Requires bratio.f90 for the incomplete gamma function, and constant.f90. N.B. bratio.f90 contains code for a number of special functions including the error function, the logarithm of the gamma function, the logarithm of the beta function, and the digamma function. bratio was translated from the NSWC library.
  • dceigv.f90 Calculate the eigenvalues and vectors of a general complex matrix. Another routine from the NSWC library. Needs constant.f90. Based upon EISPACK routines. There is a test program: t_dceigv.f90
  • locpt.f90 Is a point inside a polygon?
  • qtcrt.f90 Solve quadratic, cubic and quartic equations. Includes a short driver program, and hence includes the interfaces needed for your program. Adapted from the NSWC library.
  • toeplitz.f90 Solution of Toeplitz systems of linear equations.
  • zeroin.f90 Finds a zero of a user-supplied function in a specified range (a, b).

Miscellaneous code


  • qsort.f90 A version of the quicksort algorithm adapted from Walt Brainerd's code.
  • cobyla.f90 Mike Powell's routine for minimization of non-linear functions with smooth non-linear constraints, using local linear approximations to the constraints.
  • tron.zip Newton's method for large bound-constrained optimization problems by Chih-Jen Lin & Jorge More', a MINPACK-2 project. Use PKUNZIP or WINZIP to unpack the file.
  • lm.zip Levenberg-Marquardt algorithm for non-linear least squares (unconstrained). This is a translation of the MINPACK routines, LMDER & LMDIF. Use LMDER for functions which can be differentiated, and LMDIF when it is necessary to use differences. The ZIPped file includes the MINPACK test programs, and a simple example fitting a 4-parameter logistic.
  • conmin.zip The classic CONMIN package for constrained minimization updated to Fortran 90. Test examples and the manual are included in the ZIP file. N.B. CONMIN is included as just one of the algorithms in TOMS algorithm 734 which can be downloaded from netlib (http://www.netlib.org).
  • minim.f90 The Nelder-Mead simplex algorithm for unconstrained minimization. It does NOT require or use derivatives. N.B. This is NOT for linear programming! t_minim.f90 is a very simple test program for minim.f90 which may help users. This is now ELF90-compatible.
  • primefac.f90 A program to find prime factors.
  • uobyqa.f90 Mike Powell's package for unconstrained minimization when derivatives are not available. There is a test/driver program:
    t_uobyqa.f90 and a file of results from the test program:
    test.out The documentation, which is in gzipped postscript, can be downloaded from the Optimization Decision Tree at:
  • tn.zip Stephen Nash's truncated-Newton code for the minimization of continuous functions. It can use differences instead of derivatives, and bounds may be imposed on the parameters.
  • xdlegf.f90 Legendre functions and polynomials, from the CMLIB library.
  • datesub.f90 Some date manipulation routines collected together by H.D. Knoble.
  • global.f90 At Arnold Neumaier's web site, this is recommended as the most successful of the global optimization packages. There is a sample program fit.f90 and the original documentation global.txt for the f77 version. I have included testfunc.f90 which will eventually contain all of Neumaier's 30 test functions. N.B. Users of local optimization packages usually obtain satisfactory convergence after 10s or sometimes 100s of function evaluations. Global optimization routines usually require many 1000s of function evaluations.
  • tensolve.zip A package for solving sets of non-linear equations using Robert Schnabel's tensor method. This is a translation of TOMS algorithm 768. The file was compressed using PKZIP. It is now compatible with version 4.0 of ELF90, but misbehaves on example 2.
  • KS2.f90 Calculates 1 and 2-tail probabilities for the single-sample Kolmogorov-Smirnov statistic. For 2-tail probabilities, it uses a combination of the first algorithm from CACM 487, double the single-tail probability, and the asymptotic distribution. See also TOMS algorithm 519.
  • nCr.f90 Calculates number of combinations of r out of n.
  • update.f90 Three very short subroutines to update the sample mean and sum of squares of deviations about the mean (and hence update variances or std. deviations), when one observation is added, dropped or replaced with another. Designed for fast, repeated use with no checks.
  • mvnpack.f90 Alan Genz's package of 4 methods of evaluating multivariate normal integrals.
  • bivnorm.f90 A function for calculating bivariate normal probabilities, extracted from Alan Genz's package for multivariate normal integration.
  • genz2d3d.f90 This is code for bivariate and trivariate normal integrals which I discovered on Alan Genz's web site. I have made it compatible with ELF90. It is more recent (January 2001) than bivnorm above.
  • dcuhre.f90 Alan Genz's program for general multivariate integration, not just of one function but simultaneously for a vector of functions over the same multivariate region. There is also a test program dtest1.f90 and a text file dcuhre.txt. which contains the results from the test program (run in single precision) N.B. While the Fortran 77 version of this code is still at Alan Genz's web site, he is referring users to the CUBPACK project at: CUBPACK
  • tfunc.f90 A module for calculating bivariate normal probabilities using code by Baughman and Patefield.
  • t_bivnor.f90 Comparison of 3 functions for the bivariate normal - those of Donnelly (CACM 462), Genz, and Baughman/Patefield.
  • hash.f90 and hashord.f90 Hashing routines by Richard Brent and Donald Knuth, taken from Herman Noble's web site (http://ftp.cac.psu.edu/pub/ger/fortran/).
  • elsunc.f90 Per Lindstroem's package for non-linear least squares with upper & lower bounds on parameter values. d_elsunc.f90 demonstrates this package.
  • solvopt.zip The SolvOpt package minimizes or maximizes nonlinear functions, which may be nonsmooth and may have constraints, using the so-called method of exact penalization. This is a zip file containing several driver programs.
  • ga.zip A package for global minimization using genetic algorithms. Please note the copyright conditions if you want to use this for commercial purposes. This is a ZIP file.
  • pikaia.zip Another genetic algorithm for global optimization without derivatives. This one comes from the High Altitude Observatory of NCAR. This is a ZIP file, containing a number of interesting examples. There is an excellent manual, but you will have to download it from their web site. N.B. The manual is 120 pages long, and in postscript. Where does the name Pikaia come from? Pikaia gracilens is a flattened worm-like beast about 5cm long which crawled in the mud on the sea floor about 530 million years ago!
  • cdfcor.zip A package for rational (Pade) approximation in one and two dimensions. Includes a driver for 9 test problems, input data and output. This is a ZIP file.
  • sym_band.f90 Find the eigenvalues/vectors of a symmetric banded matrix stored in compact form. Based upon EISPACK code.
  • strassen.f90 The Strassen fast matrix multiply algorithm for large matrices. Code downloaded from the comp.lang.fortran newsgroup.
  • varpro.f90 The VARPRO package for separable nonlinear least squares is for fitting models of the kind
    Y = a1.F1(x, b) + a2.F2(x, b) + ...
    in which the a's are linear parameters, the vector b is a vector of nonlinear parameters, and the F's are nonlinear functions. An important application is fitting sums of exponential decay terms. A driver program twoexp.f90 is provided, as well as the data.
  • bvls.f90 Fits a linear model using least squares but with upper and lower bounds as constraints on each regression coefficient.
  • foldat73.f90 There is a compiler which accepts Fortran code in fixed format extending beyond column 72, other than sequence numbers which occupy these columns in fixed form. This simple program takes such code as input data and breaks long lines after column 72 starting with a continuation character in column 6.
  • DIEHARD A version of George Marsaglia's random number tests in standard Fortran. You will also need the files tests.txt and operm5d.ata, and you will need to generate a binary file containing just over 11 million random 32-bit integers using the random number generator which you want to test. I have provided a short note diehard.txt and an example t_taus88.f90 showing how to generate the binary file using Pierre L'Ecuyer's TAUS88 random number generator.
  • sortchar.f90 Code for sorting character strings. This was made available on the comp.lang.fortran newsgroup. Author unknown.
  • total_ls.f90 Van Huffel's Total Least Squares. It can be used for least-squares regressions in which there are errors in both the X and Y variables, but the user must have first scaled the variables so that the errors in all variables are all the same. Can be used for ODR (orthogonal distance regression) after the same scaling, otherwise use ODRPACK (a much larger package) from netlib, or Applied Statistics algorithm AS 286. test_tls.f90 A test program for total_ls requiring test_tls.dat. ptls-doc.txt is Van Huffel's doc-file.
  • K_smooth.zip Eva Hermann's software for kernel smoothing, both local (lokern) and global (glkern). You will need PKUNZIP or WinZip to extract the individual files.
  • ewma.f90 Code for updating exponentially-weighted moving averages, including updating a residual sum of squares.
  • Adventur.zip This is a Fortran 90 version of the classic Adventure game.
  • zhangjin.zip The source code from `Computation of Special Functions' by Zhang & Jin, published by Wiley, 1996. As well as the `usual' functions such as gamma, error function, Bessel and Airy functions, there are the confluent hypergeometric, parabolic cylinder, Mathieu, spheroidal wave, and various exponential integrals, etc.
  • dli.f90 Code from the Slatec library for the logarithmic integral, Li(x), and the exponential integrals, Ei(x) and E1(x).
  • dcosint.f90 & dsinint.f90 are for evaluating the cosine, Ci(x), and sine, Si(x), integrals respectively. They are translations of Numerical Algorithm NA 12. There are corresponding test programs dcitest.f90 & dsitest.f90
  • r_zeta.f90 Riemann's zeta function for real argument. Adapted from DRIZET in the MATHLIB library from CERN.
  • cincgam.f90 The incomplete gamma function for complex arguments.
  • assndx.f90 The Munkres algorithm for solution of the assignment problem. Adapted from a routine in the MATHLIB library from CERN.
  • Easter.f90 When is Easter in year XXXX?
  • nnes.zip Code for the solution of simultaneous non-linear equations using a variety of algorithms. No documentation, but 3 example programs, one of which (MDR) contains 10 different problems. Warning: This code contains the statement `Copyright R.S.Bain (1991)', but attempts to contact Rod Bain have been unsuccesful. This too is a zipped file.
  • lanczos.f90 A simple algorithm for the logarithm of the gamma function.
  • to_f90.f90 This program takes Fortran 77 code and converts it to make it look more like Fortran 90.
  • kaiser.f90 A simple routine to calculate the eigenvalues and eigenvectors of a symmetric positive definite, e.g. a covariance matrix.
  • singlton.f90 The classic Singleton multi-dimensional FFT algorithm for series whose length is not necessarily a power of 2.
  • fft.f90 A simple Fast Fourier routine for the case in which the series length is a power of 2.
  • chirp.f90 The Chirp-Z algorithm for the FFT of a series of any length.
  • fft235.f90 Fast Fourier Transform for the case in which the series length is a multiple of some or all of the integers 2, 3 and 5 (e.g. length = 60 or 240).
  • hartly2d.f90 Hartley 2D Fast Fourier Transform.
  • pzeros.f90 Solve polynomial equations using Aberth's method. Translated from a Fortran 77 algorithm by Dario Bini published in Numerical Algorithms, vol.13 (1996). A short test program poly20.f90 is also available. Dario Bini has a package called MPSolve which does it in multiple precision. It can be downloaded by ftp from: http://www.dm.unipi.it/~bini/.
  • envelope.f90 A simple but efficient routine for finding 2D convex hulls, i.e. in finding the minimum polygon to enclose a set of points.
  • median.f90 Finds the median of a set of numbers using a truncated quicksort algorithm.
  • hermite.f90 Hermite integration, that is integration of f(x).p(x) from minus infinity to plus infinity, where f(x) is the user's function and p(x) = exp(-x^2).
  • hh.f90 Half-Hermite integration, that is from zero to plus infinity. hh_test.f90 is a test program which also serves as an illustration of how to use hh. See below for a program to calculate in quadruple precision the weights and abscissae.
  • chi_sq.f90 Chi-squared distribution function. Requires as239.f90 and lanczos.f90.
  • hyperg.f90 Calculate hypergeometric probabilities.
  • rhohat.f90 Maximum likelihood estimation of the shape parameter of the gamma (Erlang) distribution, including calculation of the derivatives of the log-gamma function.
  • fnprod.f90 Computes the distribution function for the product of two correlated normal variates using the algorithm of Meeker & Escobar. Needs constant.f90 and qxgs.f90.
  • twodqd.f90 A translation of Kahaner & Rechard's 1984 program for bivariate integration over triangular regions, including a test program.
  • simann.f90 is a module and test program for simulated annealing based upon the algorithm of Corana.

  • ives.f90 is a routine for generating all combinations of n objects.
  • mace.f90 is an F90 version of Jerry Friedman's program to estimate multiple optimal transformations for regression and correlation by alternating conditional expectation estimates.

NAS FortranPlus

Software for NAS FortranPlus This exploits the 10-byte REAL data type which is supported by this compiler. At the moment, it only contains a special version of my quadruple precision package which gives about 38 decimal digit representation of quadruple precision numbers. N.B. It is extremely unlikely that this will give correct answers with any other compiler (e.g. Salford) which supports 10-byte REALs.


Code for Imagine1's free F compiler

Windows version ONLY

Click here


Some linear least-squares examples and tests


fit_poly.f90 Fit a polynomial to a set of (x,y) data.

quadsurf.f90 Fit a quadratic surface to a set of (x,y,z) data, i.e. fit:
Z = b0 + b10.X + b01.Y + b20.X^2 + b11.X.Y + b02.Y^2
There is a module ridge.f90 for ridge regression / regularization. It uses the output from the module lsq to form the singular value decomposition (SVD) and offers a choice of 4 different methods of regularization, or the user can input their own vector to add to the diagonal of the X'X-matrix.

wtd_quin.f90 This is a program to fit a quintic polynomial with exponential weighting of past values. It was developed for someone who had reason to believe that his process was well approximated locally by a quintic, but wanted to give progressively less weight to old observations. The slope and 2nd derivative (acceleration) of the smoothing polynomial are output after each new case is read. This is suitable for real time applications. It demonstrates the use of the least-squares package's updating algorithm.

spline5.f90 This program fits quintic splines with user-chosen but evenly-spaced knots. The slope and 2nd derivative (acceleration) of the smoothing polynomial are output, but only after all of the data have been read. The smoothed fit and derivatives at any point are based upon the data from both before and after the point.


Some other useful web sites:

  • Try the Fortran Market for general information on Fortran compilers, tutorials, books and access to some sources of Fortran code, Gary Scott's Fortran Library web site.
  • The Fortran90 FAQ (frequently asked questions) can be obtained from: F90FAQ.
  • For an extensive set of routines for sorting and ranking real numbers see OrderPack.
  • NAG F90 Software Repository is a source of useful Fortran 90 code.
  • The ACM collection of TOMS algorithms is a source of refereed code, mainly in Fortran, for a wide range of numerical calculations.
  • A collection of functions and subroutines covering a wide area of mathematical John Monahan's site contains the software from his book `Numerical Methods of Statistics'. It covers a very similar are to this web site, and is at: Monahan's web site. operations can be found at Jean-Pierre Moreau's web site.
  • For a guide to available mathematical software, refer to GAMS, and for the numerical analysis FAQ numafaq.
  • If you are interested in object-oriented programming in Fortran, you should see oof90.html.
  • For statistical software in a variety of languages try statlib. For distribution functions, random number generation and other statistical programs, try cdflib/ranlib.
  • For optimization, particularly constrained optimization, see the Optimization Decision Tree, or Arnold Neumaier's Global Optimization web site. The latter also contains many general links to mathematical and statistical software.
  • For multivariate normal integrals, and for multiple integration in general, look at Alan Genz's home page.
  • Michel Olagnon's ORDERPACK is for sorting and ranking. It can be found at: ORDERPACK.
  • Australian Fortran users will find the web page of Computer Transition Systems useful. (This company distributes Lahey, Salford, Edinburgh Portable Compilers, Digital Visual Fortran and other compilers in Australia.)
  • To download Lahey's cheap Fortran 90 compiler click on ELF90.
  • There is an interpreter for a subset of Fortran 90 available from: Georg Petrich's interpreter. I have not tried it myself, so comments would be welcome. It can be downloaded freely, but I understand it is a shareware product.
  • For a wide range of code for the FFT, in several computer languages, see: The FFT Home Page. and: Another FFT Home Page.