Genome Sequencing Projects
There are several reasons for completely sequencing a genome.
• First it provides a means for the discovery of all the genes and thus provides an inventory of genes.
• Second, the sequence shows the relationships between genes.
• Third, it provides a set of tools for future experimentation.
• Fourth, sequencing provides an index to draw and organize all genetic information about the organism.
• Fifth, and very important over time, is that the whole genome sequence is an archive for the future containing all the genetic information required to make the organism.
There are several methods for small-scale sequencing, although most of these do not scale well to sequence entire genomes. The two main methodologies used for genome sequencing are discussed here. These have also been briefly discussed in the introduction.
Directed sequencing of Bacterial Artificial Chromosome (BAC) contigs
You have already learnt in the previous that Bacterial Artificial Chromosome (BAC) vectors are capable of stably propagating large, complex DNA inserts in Escherichia coli. These vectors are used to make genomic libraries in which the insert size is 80-100 kb. This library is then screened by finding common restriction fragments. These BAC clones are then mapped to find overlapping arrays of contiguous clones called contigs. The mapped contigs are sequenced by breaking large DNA fragments into small pieces. Therefore, in this directed sequencing strategy, pieces of DNA from adjacent stretches of a chromosome are sequenced.
Random shotgun sequencing
Random shotgun sequencing is one approach to sequence genomic DNA. Genomic DNA macromolecules are very long and they contain many genes and other sequences required to build the whole organism. Even with the best of sequencing techniques we get a maximum of 700 bases of sequence information from one single run of an experiment. Therefore, we need a strategy to sequence the whole DNA. The random shotgun sequencing approach follows a very well known common theme "divide a big problem into small tasks. Solve these small tasks individually. Finally add up all these solutions to get the full final solution". Big genomic DNA molecules are broken down into small fragments, which are cloned in small (2.0 kb) and medium (10 kb) plasmid vectors. Plasmids have specific sites where these molecules can be inserted through enzymatic procedures. Thus, a library is constructed. Now each clone is picked up randomly and sequenced from both ends. By picking many clones and sequencing them, we get large amounts of sequences. Observations show that several of these sequences are identical, some are similar to each other in parts called overlapping parts, whereas, a few may be just unique. After we feed all these data into a computer program, these sequences are joined by finding overlapping parts. The result is, we get long pieces of DNA sequences. This process of assembling continues until all overlapping parts are exhausted. Finally, we would get a large portion of the genomic DNA sequence.
Even though in theory, the entire genomic DNA sequence can be obtained in this way, in practise, this is not so. Some gaps in genomic DNA sequence do arise and these gaps need to be closed by specific cloning of those regions and additional sequencing.