Type:
Journal
Description:
In this article we show how dichotomic classes, binary variables naturally derived from a new mathematical model of the genetic code, can be used in order to characterize different parts of the genome. In particular, we analyze and compare different parts of whole chromosome 1 of Arabidopsis thaliana: genes, exons, introns, coding sequences (CDS), intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task we encode each sequence in the 3 possible reading frames according to the definitions of the dichotomic classes (parity, Rumer and hidden). Then, we perform a statistical analysis on the binary sequences. Interestingly, the results show that coding and non-coding sequences have different patterns and proportions of dichotomic classes. This suggests that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Moreover, such patterns seem to be more enhanced in CDS than in exons. Also, we derive an independence test in order to assess whether the percentages observed could be considered as an expression of independent random processes. The results confirm that only genes, exons and CDS seem to possess a dependence structure that distinguishes them from iid sequences. Such informational content is independent from the global proportion of nucleotides of a sequence. The present work confirms that the recent mathematical model of the genetic code is a new paradigm for understanding the management and the organization of genetic information and is an innovative tool for investigating informational aspects of error detection …
Publisher:
American Institute of Mathematical Sciences
Publication date:
1 Jan 2013
Biblio References:
Volume: 10 Issue: 1 Pages: 199
Origin:
Mathematical Biosciences & Engineering