Gene prediction and counting
Gene prediction is an important problem for computational biology and there are various algorithms that do gene prediction using known genes as a training data set. Since most of the knowledge to carry out these predictions comes from experimentally identified genes, this becomes a limitation. Even if we know where the genes are in the genome, it is not entirely clear how to count them. Due to the existence of overlapping genes and splice variants it is difficult to define the parts of the DNA that should be regarded as the same or several different genes. Nevertheless, for practical purposes (allowing for some 'experimental error') we can count the number of genes in an organism. Some of the results of counting predicted genes have turned to be quite surprising (Table 1).
Table 1. Genome size and gene predictions between several organisms.
One of the surprises is the relatively small number of genes in a human genome ( 20,000 - 25,000 genes) in comparison to worm (19,000 genes). In fact some experts still think that there must be at least 40,000 - 50,000 genes in the human genome, and that 30,000 just reflects the unreliability of in silico (i.e., computational) gene prediction. Still, it seems that there is no simple correlation between the intuitive complexity of an organism and the number of genes in its genome.