A few mathematics from real genome data

日期:2009-11-25 阅读:584


Biology now generates huge amount of data. In this talk we will touch on some mathematical problems inspired by the study of real genomic data that require the use of combinatorics, graph theory and formal language thoery. All examples are taken from our own bioinformatics work in recent years. The first problem concerns the number of longer missing strings (of length K+i, i>=1) taken away by the absence of one or more K-strings. The exact solution of the problem may be obtained by using the Golden-Jackson cluster method in combinatorics and by making use of a special kind of formal languages, namely, the factorizable language. The second problem consists in explaining the fine structure observed in one-dimensional K-string histograms of some randomized genomes. The third problem is the uniqueness of reconstructing a protein sequence from its constituent K-peptides. The latter problem has a natural connection with the number of Eulerian loops in a graph. To tell whether a protein sequence has a unique reconstruction at a given K the factorizable language again comes to our help.



研究领域: 基于实际基因组数据的生物信息学和计算生物学研究。


© 上海交通大学物理与天文学院 版权所有