| |
[Subset Selection | Random Number
Generation | Quad Precision | Applied
Statistics Algorithms | Logistic Regression |
TOMS Algorithms | Naval Surface Warfare
Center Code | Miscellaneous | 10-Byte
Reals for NAS FortranPlus | F code |
Linear Least Squares | Links ]
N.B. Most of this software is compatible with Lahey's ELF90 compiler, and
hence should be compatible with any full Fortran 90 or 95 compiler.
Subset Selection in Regression
The 2nd edition of my book on this subject was published by CRC Press
(Chapman & Hall) in April 2002 (ISBN 1-58488-171-2). Dr. David Smith from
the Medical College of Georgia, USA, has notified me of certain missing
references. Here they are in postscript and
pdf formats.
lsq.f90
is the latest version of my uncontrained weighted least-squares module.
It is an upgraded version of Applied
Statistics algorithm AS 274. It uses planar rotations to produce an
upper-triangular factorization. The routine INCLUD is called once for each case
in the data set. It is suitable for situations in which the least-squares
calculations have to be updated each time more observations become available. It
has routines for automatically setting up tolerances and testing for
singularities. It uses a version of rank-revealing QR decomposition for this.
This code is now compatible with ELF90.
If results for a subset of predictor variables are required, those variables
are moved to the first positions, and variables to be excluded are ordered after
them. Routine VMOVE moves one variable; routine REORDR re-orders the variables
so that those listed are in the first positions, though not necessarily in the
order specified.
For further details on how to use the module and on methods of least-squares
calculation refer to the document
There is a DEMO program
which uses a simple data set fuelcons.dat,
and a nasty test program test1.f90.
If you want to see more tests then download the zip file
tests.zip and
the set of data files used by these tests in testdata.
- find_sub.f90
is a module for finding subsets of variables using a variety of different
algorithms.
- subset.f90
is a driver program which uses the above two modules.
I have also made available the data set pollute.dat
of mortality rates against socio-economic, meteorological and pollution
variables for 60 statistical areas in the USA.
For subset selection using the L1-norm, that is minimizing the sum of
absolute residuals, here is toms615.f90
which is a translation of TOMS algorithm 615 to make it ELF90 compatible.
There is also a driver program test615.f90,
some test data test615.dat,
and the expected output test615.out.
For linear regression, but when the regression coefficients must be positive
or zero, there is the Lawson & Hanson non-negative least-squares routine
nnls.f90 N.B.
Two call arguments have been removed from the Fortran 77 version. This routine
is ELF90-compatible. I have also added my own nonnegls.f90
routine which is called after a QR-factorization has been formed using module
LSQ. The file t_nnls.f90
uses both of these routines.
Random Numbers
For code for random number generation from the uniform distribution, and
others, including normal, exponential, Poisson, binomial, gamma and others,
Click here
Quadruple precision
For code for quadruple precision arithmetic, using pairs of double precision
numbers,
Click here
Some Applied Statistics Algorithms
For many years, the Royal Statistical Society published algorithms in its
journal `Applied Statistics'. I have translated a few of these to F90,
Click here
Logistic Regression
I have received several requests for Fortran code to perform logistic
regression, that is to fit:
p = F/(1 + F) where p = the probability that a case is in one of
two categories F = exp(b0 + b1.X1 + b2.X2 + ... + bk.Xk) X1, X2, ..., Xk
is a set of k predictors, and b0, b1, b2, ..., bk is a set of coefficients to
be fitted.
- logistic.f90
Module for performing the weighted least squares calculations. It requires my
least squares module lsq.f90
- t_lgstc1.f90
Driver program which uses the data set birthwt.dat
from Appendix 1 of Hosmer & Lemeshow's book `Applied Logistic Regression'.
- t_lgstc2.f90
Driver program which uses the data set surgical.dat.
This illustrates the use of logistic regression with grouped data. There is an
error in the published data (deliberately NOT corrected here) which has caused
grief with several well-known statistical packages.
- t_lgstc3.f90
Driver program which uses the data set clearcut.dat.
This illustrates the case in which there is a linear boundary such that all
cases on one side are in one category, and all cases on the other side are in
the other category.
- se_lgstc.f90
The standard errors reported by logistic.f90 are often larger than those
reported by other packages for logistic regression. This simple simulation
program shows that those reported here are about right. It also illustrates
the bias in the slope parameters (they are always biased towards being too
large).
Miscellaneous TOMS (and CACM) algorithms
N.B. I have been asked to provide a link to the
copyright
policy of the ACM. Loosely paraphrased, this allows use, and modification,
of the TOMS algorithms for most non-commercial purposes. It also emphasizes that
the ACM accepts no responsibility for the accuracy of the code.
I have updated some of the Transactions on Mathematical Software (TOMS)
algorithms to Fortran 90.
Click here
Code converted from the
Naval Surface Warfare Center Math. Library
- cgamma.f90
Complex gamma function.
- erf.f90 The
error function.
- dcerf.f90
Complex error function & its complement.
- cexpli.f90
Exponential integral for complex argument.
- cbsslj.f90
Complex Bessel function J_{\nu}(z) where both the argument, z, and the order,
\nu, are complex.
- dple.f90
Solution of systems of linear equations using the Henderson-Wassyng partial
pivot algorithm. Includes a test program to solve 100 simultaneous equations.
The user must provide a subroutine to supply any requested row of the matrix.
- dspslv.f90
Solution of sparse systems of linear equations using Gaussian elimination.
- dsvdc.f90
Calculates the singular-value decomposition (SVD) of a real matrix.
There is also a test program: --
- t_svd.f90
- fft.f90
Fast Fourier Transform (FFT) for any length of series which has no prime factor
greater than 23. Also the inverse and multivariate FFT.
- hbrd.f90
Solve sets of non-linear equations using Powell's Hybrid algorithm.
- specfunc.zip
A zip file containing all of the special functions from the NSWC library.
- polyarea.f90
Calculates the area of a polygon.
- p_intcpt.f90
Finds the crossing points of a finite line and a polygon.
- bnd_solv.f90
Solve banded linear equations using compact storage of the banded matrix.
There is a complex version of this -- cbnd_slv.f90
- big_solv.f90
Solves a large set of n general linear equations using out-of-core methods,
requiring storage for about n^2/4 values.
- qagi.f90
Adaptive one-dimensional integration over infinite or semi-infinite ranges
(adapted from QUADPACK).
- qxgs.f90
Adaptive one-dimensional integration over finite ranges (adapted from TOMS
algorithm 691).
- inc_gam.f90
The incomplete gamma function and its inverse. Test program:
t_incgam.f90
based on TOMS algorithm 654.
- constant.f90
This is a module of constants used by some of the functions in the NSWC
collection of routines.
- qsortd.f90
A subroutine which implements a quicksort algorithm without changing the
input array. It returns an integer array containing the order.
- smplx.f90
Linear programming using the simplex algorithm. This is a translation of the
Fortran 66 program from the NSWC (Naval Surface Warfare Center) library
written by Alfred Morris. There is also a simple test program
t_smplx.f90.
Needs the module constant.f90
which defines the precision and returns certain machine constants.
- dmexp.f90
Calculates the exponential of a matrix.
- fprob.f90
Cumulative F distribution. Requires bratio.f90
for the incomplete gamma function, and constant.f90.
N.B. bratio.f90 contains code for a number of special functions including the
error function, the logarithm of the gamma function, the logarithm of the beta
function, and the digamma function. bratio was translated from the NSWC
library.
- dceigv.f90
Calculate the eigenvalues and vectors of a general complex matrix. Another
routine from the NSWC library. Needs constant.f90.
Based upon EISPACK routines. There is a test program:
t_dceigv.f90
- locpt.f90
Is a point inside a polygon?
- qtcrt.f90
Solve quadratic, cubic and quartic equations. Includes a short driver program,
and hence includes the interfaces needed for your program. Adapted from the
NSWC library.
- toeplitz.f90
Solution of Toeplitz systems of linear equations.
- zeroin.f90
Finds a zero of a user-supplied function in a specified range (a, b).
Miscellaneous code
- qsort.f90
A version of the quicksort algorithm adapted from Walt Brainerd's code.
- cobyla.f90
Mike Powell's routine for minimization of non-linear functions with smooth
non-linear constraints, using local linear approximations to the constraints.
- tron.zip
Newton's method for large bound-constrained optimization problems by Chih-Jen
Lin & Jorge More', a MINPACK-2 project. Use PKUNZIP or WINZIP to unpack
the file.
- lm.zip
Levenberg-Marquardt algorithm for non-linear least squares (unconstrained).
This is a translation of the MINPACK routines, LMDER & LMDIF. Use LMDER
for functions which can be differentiated, and LMDIF when it is necessary to
use differences. The ZIPped file includes the MINPACK test programs, and a
simple example fitting a 4-parameter logistic.
- conmin.zip
The classic CONMIN package for constrained minimization updated to Fortran 90.
Test examples and the manual are included in the ZIP file. N.B. CONMIN is
included as just one of the algorithms in TOMS algorithm 734 which can be
downloaded from netlib (http://www.netlib.org).
- minim.f90
The Nelder-Mead simplex algorithm for unconstrained minimization. It does NOT
require or use derivatives. N.B. This is NOT for linear programming!
t_minim.f90
is a very simple test program for minim.f90 which may help users. This is now
ELF90-compatible.
- primefac.f90
A program to find prime factors.
- uobyqa.f90
Mike Powell's package for unconstrained minimization when derivatives are not
available. There is a test/driver program:
t_uobyqa.f90
and a file of results from the test program: test.out
The documentation, which is in gzipped postscript, can be downloaded from the
Optimization Decision Tree at:
- tn.zip Stephen
Nash's truncated-Newton code for the minimization of continuous functions. It
can use differences instead of derivatives, and bounds may be imposed on the
parameters.
- xdlegf.f90
Legendre functions and polynomials, from the CMLIB library.
- datesub.f90
Some date manipulation routines collected together by H.D. Knoble.
- global.f90 At
Arnold Neumaier's web site, this is recommended as the most successful of the
global optimization packages. There is a sample program fit.f90 and the
original documentation global.txt for the
f77 version. I have included testfunc.f90 which
will eventually contain all of Neumaier's 30 test functions. N.B. Users of
local optimization packages usually obtain satisfactory convergence after 10s
or sometimes 100s of function evaluations. Global optimization routines
usually require many 1000s of function evaluations.
- tensolve.zip
A package for solving sets of non-linear equations using Robert Schnabel's
tensor method. This is a translation of TOMS algorithm 768. The file was
compressed using PKZIP. It is now compatible with version 4.0 of ELF90, but
misbehaves on example 2.
- KS2.f90
Calculates 1 and 2-tail probabilities for the single-sample Kolmogorov-Smirnov
statistic. For 2-tail probabilities, it uses a combination of the first
algorithm from CACM 487, double the single-tail probability, and the
asymptotic distribution. See also TOMS algorithm 519.
- nCr.f90
Calculates number of combinations of r out of n.
- update.f90
Three very short subroutines to update the sample mean and sum of squares of
deviations about the mean (and hence update variances or std. deviations),
when one observation is added, dropped or replaced with another. Designed for
fast, repeated use with no checks.
- mvnpack.f90
Alan Genz's package of 4 methods of evaluating multivariate normal integrals.
- bivnorm.f90
A function for calculating bivariate normal probabilities, extracted from Alan
Genz's package for multivariate normal integration.
- genz2d3d.f90 This
is code for bivariate and trivariate normal integrals which I discovered on
Alan Genz's web site. I have made it compatible with ELF90. It is more recent
(January 2001) than bivnorm above.
- dcuhre.f90
Alan Genz's program for general multivariate integration, not just of one
function but simultaneously for a vector of functions over the same
multivariate region. There is also a test program dtest1.f90
and a text file dcuhre.txt.
which contains the results from the test program (run in single precision)
N.B. While the Fortran 77 version of this code is still at Alan Genz's web site,
he is referring users to the CUBPACK project at:
CUBPACK
- tfunc.f90 A
module for calculating bivariate normal probabilities using code by Baughman
and Patefield.
- t_bivnor.f90
Comparison of 3 functions for the bivariate normal - those of Donnelly (CACM
462), Genz, and Baughman/Patefield.
- hash.f90 and hashord.f90 Hashing
routines by Richard Brent and Donald Knuth, taken from Herman Noble's web site
(http://ftp.cac.psu.edu/pub/ger/fortran/).
- elsunc.f90
Per Lindstroem's package for non-linear least squares with upper & lower
bounds on parameter values. d_elsunc.f90
demonstrates this package.
- solvopt.zip
The SolvOpt package minimizes or maximizes nonlinear functions, which may be
nonsmooth and may have constraints, using the so-called method of exact
penalization. This is a zip file containing several driver programs.
- ga.zip A package
for global minimization using genetic algorithms. Please note the copyright
conditions if you want to use this for commercial purposes. This is a ZIP
file.
- pikaia.zip
Another genetic algorithm for global optimization without derivatives. This
one comes from the High Altitude Observatory of NCAR. This is a ZIP file,
containing a number of interesting examples. There is an excellent manual, but
you will have to download it from their web
site. N.B. The manual is 120 pages long, and in postscript. Where does the
name Pikaia come from? Pikaia gracilens is a flattened worm-like beast about
5cm long which crawled in the mud on the sea floor about 530 million years
ago!
- cdfcor.zip A
package for rational (Pade) approximation in one and two dimensions. Includes
a driver for 9 test problems, input data and output. This is a ZIP file.
- sym_band.f90 Find
the eigenvalues/vectors of a symmetric banded matrix stored in compact form.
Based upon EISPACK code.
- strassen.f90 The
Strassen fast matrix multiply algorithm for large matrices. Code downloaded
from the comp.lang.fortran newsgroup.
- varpro.f90
The VARPRO package for separable nonlinear least squares is for
fitting models of the kind
Y = a1.F1(x, b) + a2.F2(x, b) + ... in which
the a's are linear parameters, the vector b is a vector of nonlinear
parameters, and the F's are nonlinear functions. An important application is
fitting sums of exponential decay terms. A driver program
twoexp.f90 is
provided, as well as the data.
- bvls.f90 Fits a
linear model using least squares but with upper and lower bounds as
constraints on each regression coefficient.
- foldat73.f90 There
is a compiler which accepts Fortran code in fixed format extending beyond
column 72, other than sequence numbers which occupy these columns in fixed
form. This simple program takes such code as input data and breaks long lines
after column 72 starting with a continuation character in column 6.
- DIEHARD A
version of George Marsaglia's random number tests in standard Fortran. You
will also need the files tests.txt and
operm5d.ata,
and you will need to generate a binary file containing just over 11 million
random 32-bit integers using the random number generator which you want to
test. I have provided a short note diehard.txt
and an example t_taus88.f90
showing how to generate the binary file using Pierre L'Ecuyer's TAUS88 random
number generator.
- sortchar.f90 Code
for sorting character strings. This was made available on the
comp.lang.fortran newsgroup. Author unknown.
- total_ls.f90
Van Huffel's Total Least Squares. It can be used for least-squares regressions
in which there are errors in both the X and Y variables, but the user must
have first scaled the variables so that the errors in all variables are all
the same. Can be used for ODR (orthogonal distance regression) after the same
scaling, otherwise use ODRPACK (a much larger package) from netlib, or Applied
Statistics algorithm AS 286. test_tls.f90
A test program for total_ls requiring test_tls.dat.
ptls-doc.txt
is Van Huffel's doc-file.
- K_smooth.zip Eva
Hermann's software for kernel smoothing, both local (lokern) and global
(glkern). You will need PKUNZIP or WinZip to extract the individual files.
- ewma.f90 Code
for updating exponentially-weighted moving averages, including updating a
residual sum of squares.
- Adventur.zip This
is a Fortran 90 version of the classic Adventure game.
- zhangjin.zip The
source code from `Computation of Special Functions' by Zhang & Jin,
published by Wiley, 1996. As well as the `usual' functions such as gamma,
error function, Bessel and Airy functions, there are the confluent
hypergeometric, parabolic cylinder, Mathieu, spheroidal wave, and various
exponential integrals, etc.
- dli.f90 Code
from the Slatec library for the logarithmic integral, Li(x), and the
exponential integrals, Ei(x) and E1(x).
- dcosint.f90
& dsinint.f90
are for evaluating the cosine, Ci(x), and sine, Si(x), integrals respectively.
They are translations of Numerical Algorithm NA 12. There are corresponding
test programs dcitest.f90
& dsitest.f90
- r_zeta.f90
Riemann's zeta function for real argument. Adapted from DRIZET in the MATHLIB
library from CERN.
- cincgam.f90
The incomplete gamma function for complex arguments.
- assndx.f90
The Munkres algorithm for solution of the assignment problem. Adapted from a
routine in the MATHLIB library from CERN.
- Easter.f90 When
is Easter in year XXXX?
- nnes.zip Code for
the solution of simultaneous non-linear equations using a variety of algorithms.
No documentation, but 3 example programs, one of which (MDR) contains 10
different problems. Warning: This code contains the statement `Copyright
R.S.Bain (1991)', but attempts to contact Rod Bain have been unsuccesful. This
too is a zipped file.
- lanczos.f90
A simple algorithm for the logarithm of the gamma function.
- to_f90.f90
This program takes Fortran 77 code and converts it to make it look more like
Fortran 90.
- kaiser.f90
A simple routine to calculate the eigenvalues and eigenvectors of a symmetric
positive definite, e.g. a covariance matrix.
- singlton.f90
The classic Singleton multi-dimensional FFT algorithm for series whose length
is not necessarily a power of 2.
- fft.f90
A simple Fast Fourier routine for the case in which the series length is a
power of 2.
- chirp.f90
The Chirp-Z algorithm for the FFT of a series of any length.
- fft235.f90
Fast Fourier Transform for the case in which the series length is a multiple
of some or all of the integers 2, 3 and 5 (e.g. length = 60 or 240).
- hartly2d.f90
Hartley 2D Fast Fourier Transform.
- pzeros.f90
Solve polynomial equations using Aberth's method. Translated from a Fortran 77
algorithm by Dario Bini published in Numerical Algorithms, vol.13 (1996). A
short test program poly20.f90
is also available. Dario Bini has a package called MPSolve which does it in
multiple precision. It can be downloaded by ftp from:
http://www.dm.unipi.it/~bini/.
- envelope.f90
A simple but efficient routine for finding 2D convex hulls, i.e. in finding
the minimum polygon to enclose a set of points.
- median.f90
Finds the median of a set of numbers using a truncated quicksort algorithm.
- hermite.f90
Hermite integration, that is integration of f(x).p(x) from minus infinity to
plus infinity, where f(x) is the user's function and p(x) = exp(-x^2).
- hh.f90
Half-Hermite integration, that is from zero to plus infinity.
hh_test.f90
is a test program which also serves as an illustration of how to use hh. See
below for a program to calculate in quadruple precision the weights and
abscissae.
- chi_sq.f90
Chi-squared distribution function. Requires as239.f90 and lanczos.f90.
- hyperg.f90
Calculate hypergeometric probabilities.
- rhohat.f90
Maximum likelihood estimation of the shape parameter of the gamma (Erlang)
distribution, including calculation of the derivatives of the log-gamma
function.
- fnprod.f90
Computes the distribution function for the product of two correlated normal
variates using the algorithm of Meeker & Escobar. Needs
constant.f90
and qxgs.f90.
- twodqd.f90
A translation of Kahaner & Rechard's 1984 program for bivariate
integration over triangular regions, including a test program.
- simann.f90
is a module and test program for simulated annealing based upon the algorithm
of Corana.
- ives.f90 is
a routine for generating all combinations of n objects.
- mace.f90 is
an F90 version of Jerry Friedman's program to estimate multiple optimal
transformations for regression and correlation by alternating conditional
expectation estimates.
NAS FortranPlus
Software for NAS
FortranPlus This exploits the 10-byte REAL data type which is supported by
this compiler. At the moment, it only contains a special version of my quadruple
precision package which gives about 38 decimal digit representation of quadruple
precision numbers. N.B. It is extremely unlikely that this will give correct
answers with any other compiler (e.g. Salford) which supports 10-byte REALs.
Code for Imagine1's free F compiler
Windows version ONLY
Click
here
Some linear least-squares examples and tests
fit_poly.f90
Fit a polynomial to a set of (x,y) data.
quadsurf.f90
Fit a quadratic surface to a set of (x,y,z) data, i.e. fit: Z = b0 + b10.X
+ b01.Y + b20.X^2 + b11.X.Y + b02.Y^2
There is a module ridge.f90 for
ridge regression / regularization. It uses the output from the module lsq to
form the singular value decomposition (SVD) and offers a choice of 4 different
methods of regularization, or the user can input their own vector to add to the
diagonal of the X'X-matrix.
wtd_quin.f90
This is a program to fit a quintic polynomial with exponential weighting of
past values. It was developed for someone who had reason to believe that his
process was well approximated locally by a quintic, but wanted to give
progressively less weight to old observations. The slope and 2nd derivative
(acceleration) of the smoothing polynomial are output after each new case is
read. This is suitable for real time applications. It demonstrates the use of
the least-squares package's updating algorithm.
spline5.f90
This program fits quintic splines with user-chosen but evenly-spaced knots.
The slope and 2nd derivative (acceleration) of the smoothing polynomial are
output, but only after all of the data have been read. The smoothed fit and
derivatives at any point are based upon the data from both before and after
the point.
Some other useful web sites:
- Try the Fortran Market for
general information on Fortran compilers, tutorials, books and access to some
sources of Fortran code, Gary Scott's Fortran Library web site.
- The Fortran90 FAQ (frequently asked questions) can be obtained from:
F90FAQ.
- For an extensive set of routines for sorting and ranking real
numbers see OrderPack.
- NAG F90 Software
Repository is a source of useful Fortran 90 code.
- The ACM collection of TOMS algorithms is a
source of refereed code, mainly in Fortran, for a wide range of numerical
calculations.
- A collection of functions and subroutines covering a wide area of mathematical
John Monahan's site contains the software from his book `Numerical Methods of Statistics'.
It covers a very similar are to this web site, and is at:
Monahan's web site.
operations can be found at
Jean-Pierre
Moreau's web site.
- For a guide to available mathematical software, refer to
GAMS, and for the numerical analysis FAQ
numafaq.
- If you are interested in object-oriented programming in Fortran, you
should see oof90.html.
- For statistical software in a variety of languages try
statlib. For distribution functions,
random number generation and other statistical programs, try
cdflib/ranlib.
- For optimization, particularly constrained optimization, see
the Optimization Decision Tree,
or Arnold Neumaier's Global Optimization
web site. The latter also contains many general links to mathematical and
statistical software.
- For multivariate normal integrals, and for multiple integration in general, look at
Alan Genz's home
page.
- Michel Olagnon's ORDERPACK is for sorting and ranking. It can be found at:
ORDERPACK.
- Australian Fortran users will find the web page of
Computer Transition Systems
useful. (This company distributes Lahey, Salford, Edinburgh Portable
Compilers, Digital Visual Fortran and other compilers in Australia.)
- To download Lahey's cheap Fortran 90 compiler click on
ELF90.
- There is an interpreter for a subset of Fortran 90 available from:
Georg Petrich's interpreter.
I have not tried it myself, so comments would be welcome.
It can be downloaded freely, but I understand it is a shareware product.
- For a wide range of code for the FFT, in several computer languages, see:
The FFT Home Page.
and:
Another FFT Home Page.
|