My MS thesis was a research on protein function prediction. As a part of this work, I measured the contribution of the secondary structure information on the performance of protein function prediction and I applied Oommen-Kashyap syntactic transition probability calculation on the peptide classification problem. This is the first published work, as far as I know, that combines Oommen-Kashyap method and biological sequence analysis.
Improvement of Protein Function Prediction Using Structural Information and Peptide Classification Using Syntactic Transition Probabilities
Biological sequence analysis deals with nucleotide and amino acid sequences, aiming to expose their evolutionary, structural and functional properties. This study intends to provide a review of well known pairwise alignment methods, to introduce the syntactic transition probability of Oommen and Kashyap as a biological sequence similarity metric, to demonstrate how the structural information improves protein function prediction, to compare syntactic transition probability of Oommen and Kashyap with standard sequence similarity metrics on two peptide classifaction problems, and to implement necessary sequence analysis tools as a computer software. In the first part of the experiments, the results clearly indicate that the use of secondary structure sequences along with amino acid sequence alignments improves molecular function prediction performance, while the use of predicted secondary structures does not. In the second part, syntactic transition probabilities are compared with standard global alignment scores as being features fed into a machine learning classifier. The classification performance measurements undoubtedly proved that syntactic transition probabilities are much better features than global alignment scores for peptides.
Thesis (in Turkish): Aygun2009.pdf
- Aygün, E.; Oommen B.J. & Cataltepe, Z. Peptide Classification Using Optimal and Information Theoretic Syntactic Modelling Pattern Recognition, 2010, 43, 3891
- Aygün, E.; Oommen B.J. & Cataltepe, Z. On Utilizing Optimal and Information Theoretic Syntactic Modelling for Peptide Classification Pattern Recognition in Bioinformatics, 2009 (Presentation)
- Aygün, E.; Komurlu, C.; Aydin, Z. & Cataltepe, Z. Protein Function Prediction with Amino Acid Sequence and Secondary Structure Alignment Scores International Symposium on Health Informatics and Bioinformatics, 2008
- Filiz, A.; Aygün, E.; Keskin, O. & Cataltepe, Z. Importance of Secondary Structure Elements for Prediction of GO Annotations International Symposium on Health Informatics and Bioinformatics, 2008
- Aygün E. & Cataltepe Z. Gene Ontology (GO) Molecular Function Prediction Based on Alignment Scores International Symposium on Health Informatics and Bioinformatics, 2007
- Cataltepe, Z.; Ayan, U. & Aygün, E. Protein Function Prediction Using Motifs, Sequence Features, Alignment Scores Research in Computational Molecular Biology, 2007
- Cataltepe, Z.; Aygün, E.; Filiz, A.; Keskin, O.; Komurlu, C. & Altunbasak, Y. Dimensionality Reduction for Protein Function Prediction Automated Function Prediction – Biosapiens Joint Special Interest Group Meeting at ISMB/ECCB, 2007