Further Reading: Chapter 2

« Back to Table of Contents

The theory of linear discriminants dates back to the 1930s, when Fisher proposed a procedure for classification. In the field of artificial intelligence attention was drawn to this problem by the work of Frank Rosenblatt, who starting from 1956 introduced the perceptron learning rule. Minsky and Papert’s famous book Perceptrons analysed the computational limitations of linear learning machines. The famous book by Duda and Hart provides a complete survey of the state of the art up to 1973. For more recent results, see also Bishop‘s book on neural networks which includes a description of a class of generalised learning machine.

The extension of Novikoff to the non-separable case is due to Freund and Schapire; the pocket algorithm was proposed by Gallant; a simple description of Fisher’s discriminant is in Duda and Hart. For discussions of the computational complexity of learning linear separations in the non-separable case, see Hoeffgen et al. and Arora et al. The idea of a maximal margin hyperplane has been rediscovered several times. It is discussed by Vapnik and Lerner, by Duda and Hart, and an iterative algorithm known as Adatron for learning maximal margin hyperplanes was investigated in the statistical mechanics literature by Anlauf and Biehl, and will be further discussed in Chapter 7.

The problem of linear regression is much older than the classification one. Least squares linear interpolation was first used by Gauss in the 18th century for astronomical problems. The ridge regression algorithm was published by Hoerl and Kennard, and subsequently discovered to be a special case of the regularisation theory of Tikhonov for the solution of ill-posed problems. The dual form of ridge regression including the derivations of Subsection 2.2.2 was studied by Saunders et al. and Smola et al. An equivalent heuristic was widely used in the neural networks literature under the name of weight decay. The Widrow-Hoff algorithm is described here.

Finally note that the representation of linear machines in the dual form, using the training data, is intimately related to the optimisation technique of Lagrange multipliers, and will be further discussed in Chapter 5. Guyon and Stork compare the primal and dual representation of linear learning machines in an analogous way to that adopted in this chapter.

References not on-line

Vapnik, V.; Lerner, A. 1963. Pattern Recognition using Generalized Portrait Automation and Remote Control, 24:6, 1963.
Duda R.O. and Hart P.E.; Patter Classification and Scene Analysis. Wiley, 1973.
R.Fisher. Contributions to Mathematical Statistics}. Wiley, 1952.
Rosenblatt, Frank (1958), The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain, Cornell Aeronautical Laboratory, Psychological Review, v65, No. 6, pp. 386-408.
Minsky, Marvin and Seymour Papert (1969), Perceptrons: An introduction to Computational Geometry, MIT Press.
C.M. Bishop.Neural Networks for Pattern Recognition}. Clarendon Press, 1995.
A.B. Novikoff. On convergence proofs on perceptrons. In Symposium on the Mathematical Theory of Automata, volume12, pages 615–622. Polytechnic Institute of Brooklyn, 1962.
S.I. Gallant. Perceptron based learning algorithms. IEEE Transactions on Neural Networks}, 1:179–191, 1990.
S.Arora, L.Babai, J.Stern, and Z.Sweedyk. Hardness of approximate optima in lattices, codes and linear systems. Journal of Computer and System Sciences, 54(2):317–331, 1997.
J.K. Anlauf and M.Biehl. The adatron: an adaptive perceptron algorithm. Europhysics Letters, 10:687–692, 1989.
A.E. Hoerl and R.W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970.
A.N. Tikhonov and V.Y. Arsenin. Solutions of Ill-posed Problems. W. H. Winston, 1977.
B.Widrow and M.Hoff. Adaptive switching circuits. IRE WESCON Convention record, 4:96–104, 1960.
I.Guyon and D.Stork. Linear discriminant and support vector classifiers. In A.J. Smola, P.Bartlett, B. Schoelkopf, and C. Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, 1999.