Support Vector Machines in Computational Biology

Support Vector Machines have a natural match with the features of many bioinformatics datasets. They deliver state of the art performance in several application, and for microarray gene expression data, are becoming the system of choice. Here is a list of publications…

Gene Function from microarray expression data

Knowledge-based analysis of microarray gene expression data by using support vector machines, Michael P. S. Brown, William Noble Grundy, David Lin, Nello Cristianini, Charles Walsh Sugnet, Terence S. Furey, Manuel Ares, Jr., David Haussler, Proc. Natl. Acad. Sci. USA, vol. 97, pages 262-267
pdf
http://www.pnas.org/cgi/reprint/97/1/262.pdf

Support Vector Machine Classification of Microarray Gene Expression Data, Michael P. S. Brown William Noble Grundy, David Lin, Nello Cristianini, Charles Sugnet, Manuel Ares, Jr., David Haussler
ps.gz
http://www.cse.ucsc.edu/research/compbio/genex/genex.ps

Gene functional classification from heterogeneous data Paul Pavlidis, Jason Weston, Jinsong Cai and William Noble Grundy, Proceedings of RECOMB 2001
pdf
http://www.cs.columbia.edu/compbio/exp-phylo/exp-phylo.pdf

Cancer Tissue classification from microarray expression data, and gene selection:

Support vector machine classification of microarray data, S. Mukherjee, P. Tamayo, J.P. Mesirov, D. Slonim, A. Verri, and T. Poggio, Technical Report 182, AI Memo 1676, CBCL, 1999.
ps.gz

PS file here

Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data, Terrence S. Furey, Nigel Duffy, Nello Cristianini, David Bednarski, Michel Schummer, and David Haussler, Bioinformatics. 2000, 16(10):906-914.
pdf
http://bioinformatics.oupjournals.org/cgi/reprint/16/10/906.pdf

Gene Selection for Cancer Classification using Support Vector Machines, I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Machine Learning 46(1/3): 389-422, January 2002
pdf
http://homepages.nyu.edu/~jaw281/genesel.pdf

Molecular classification of multiple tumor types ( C. Yeang, S. Ramaswamy, P. Tamayo, Sayan Mukerjee, R. Rifkin, M Angelo, M. Reich, E. Lander, J. Mesirov, and T. Golub) Intelligent Systems in Molecular Biology

Combining HMM and SVM : the Fisher Kernel

Exploiting generative models in discriminative classifiers, T. Jaakkola and D. Haussler, Preprint, Dept. of Computer Science, Univ. of California, 1998
ps.gz
http://www.cse.ucsc.edu/research/ml/papers/Jaakola.ps

A discrimitive framework for detecting remote protein homologies, T. Jaakkola, M. Diekhans, and D. Haussler, Journal of Computational Biology, Vol. 7 No. 1,2 pp. 95-114, (2000)
ps.gz
PS file here

Classifying G-Protein Coupled Receptors with Support Vector Machines, Rachel Karchin, Master’s Thesis, June 2000
ps.gz
PSgz here

The Fisher Kernel for classification of genes

Promoter region-based classification of genes, Paul Pavlidis, Terrence S. Furey, Muriel Liberto, David Haussler and William Noble Grundy, Proceedings of the Pacific Symposium on Biocomputing, January 3-7, 2001. pp. 151-163.
pdf
http://www.cs.columbia.edu/~bgrundy/papers/prom-svm.pdf

String Matching Kernels

David Haussler: “Convolution kernels on discrete structures”
ps.gz
Chris Watkins: “Dynamic alignment kernels”
ps.gz
J.-P. Vert; “Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings”
pdf

Translation initiation site recognition in DNA

Engineering support vector machine kernels that recognize translation initiation sites, A. Zien, G. Ratsch, S. Mika, B. Scholkopf, T. Lengauer, and K.-R. Muller, BioInformatics, 16(9):799-807, 2000.
pdf.gz
http://bioinformatics.oupjournals.org/cgi/reprint/16/9/799.pdf

Protein fold recognition

Multi-class protein fold recognition using support vector machines and neural networks, Chris Ding and Inna Dubchak, Bioinformatics, 17:349-358, 2001
ps.gz
http://www.kernel-machines.org/papers/upload_4192_bioinfo.ps

Support Vector Machines for predicting protein structural class, Yu-Dong Cai*1 , Xiao-Jun Liu 2 , Xue-biao Xu 3 and Guo-Ping Zhou 4
BMC Bioinformatics (2001) 2:3
http://www.biomedcentral.com/content/pdf/1471-2105-2-3.pdf

The spectrum kernel: A string kernel for SVM protein classification Christina Leslie, Eleazar Eskin and William Stafford Noble Proceedings of the Pacific Symposium on Biocomputing, 2002
http://www.cs.columbia.edu/~bgrundy/papers/spectrum.html

Protein-protein interactions

Predicting protein-protein interactions from primary structure w, Joel R. Bock and David A. Gough, Bioinformatics 2001 17: 455-460
pdf
http://bioinformatics.oupjournals.org/cgi/reprint/17/5/455.pdf

Protein secondary structure prediction

A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach, Sujun Hua and Zhirong Sun, Journal of Molecular Biology, vol. 308 n.2, pages 397-407, April 2001.

Protein Localization

Sujun Hua and Zhirong Sun Support vector machine approach for protein subcellular localization prediction Bioinformatics 2001 17: 721-728

Various

Rapid discrimination among individual DNA hairpin molecules at single-nucleotide resolution using an ion channel
Wenonah Vercoutere, Stephen Winters-Hilt, Hugh Olsen, David Deamer, David Haussler, Mark Akeson
Nature Biotechnology 19, 248 – 252 (01 Mar 2001)

Making the most of microarray data
Terry Gaasterland, Stefan Bekiranov
Nature Genetics 24, 204 – 206 (01 Mar 2000)