Further Reading: Chapter 8

« Back to Table of Contents

The applications described in this final chapter are just some examples of the many and diverse uses that have been made of Support Vector Machines in recent years. We have selected particular examples for inclusion in this book on the grounds that they are easily accessible or that they illustrate some important aspect of learning systems design. Many other important applications could not be described due to lack of space, or because of their complexity: these, together with more recent examples, can be accessed via Isabelle Guyon’s “SVM applications” web site.

One crucial design choice is deciding on a kernel. Creating good kernels often requires lateral thinking: many measures of similarity between inputs have been developed in different contexts, and understanding which of them can provide good kernels depends on insight into the application domain. Some applications described here use rather simple kernels, motivated by established information retrieval techniques, like the ones for computer vision (Schoelkopf, Blanz et al., Pontil and Verri, Chapelle et al, or the ones for text categorisation (Dumais et al. and  Joachims). Still, even with such simple kernels and even if the number of features is much larger than the number of training examples, SVMs deliver a very accurate hypothesis, often outperforming other more standard algorithms. Other applications use very sophisticated kernels, like the ones for biosequences discussed in Jaakkola and Haussler, where the kernel is provided by an entire Hidden Markov Model system. Both Watkins and Haussler have recently proposed elegant kernels for symbol-sequences (discussed in Chapter 3), and their application to problems like biosequences analysis and text categorisation could provide very interesting results.

Biological data mining applications are one of the most promising uses of Support Vector Machines, particularly for the high dimensionality of the data. For example gene expression data obtained by DNA microarrays are becoming increasingly common. The kernel used in the gene expression problem described in this chapter was a simple Gaussian kernel, but the difficulty arose in the imbalance of the dataset, a problem that was solved with a technique similar to the 2-norm soft margin. That work was published in the Proceedings of the U.S. National Academy of Sciences. The same technique was also used in another medical application: automatic diagnosis of TBC (Veropoulos, Campbell and Cristianini) with Support Vector Machines for separately controlling specificity and sensitivity of the hypothesis. The gene expression work also used another important feature of SVMs: the possibility of using them for data cleaning, as discussed Guyon et al., where the fact that potential outliers can be sought among the support vectors is exploited. The data, paper and experiment of the gene expression work are all available online. Find here an updated list of bioinformatics applications of SVMs.

Other computer vision applications include the works on face detection and pedestrian detection by Tomaso Poggio, Federico Girosi and their co-workers; tuberculosis diagnosis; the object recognition experiments discussed in Schoelkopf. More applications of SVM can be found in the two edited books (proceedings of two workshops on SVMs, in 1997 and 1998), and in the websites of Kernel-Machines Website, and of  Isabelle Guyon. Applications of Gaussian processes can be found in this site.

Tuning the parameters of the system is another important design issue. The heuristics mentioned as an example in the introduction of this chapter for the choice of $\sigma $ in Gaussian kernels is due to Jaakkola; in general, one can think of schemes that automatically adapt the parameters to the data, based on some learning principle. One such simple scheme is provided by Cristianini et al., but more sophisticated approaches can be devised, based for example on the new bounds that exploit the eigenvalues of the kernel matrix. In a sense, also the use of HMMs can be regarded as learning the kernel from data.

Useful Applications Links: