Wednesday, June 13, 2012

Single Molecule Real-Time DNA Sequencing

Pacific Biosciences:
DNA sequencing has undergone several evolutions in the past decade. The most widely used DNA sequencing platform – the “1st generation” Sanger technology used in the Human Genome Project, has been supplemented by “2nd generation” systems promising higher throughput at reduced costs.

While 2nd generation technologies provided enormous improvements in throughput and dramatically lowered the cost per sequenced base, they are reaching their twilight in potential future performance enhancements.

Second generation technologies can be thought of as ‘brute-force’ systems that deliver high throughput at the expense of readlength and speed. In addition, several other inherent aspects of 2nd generation systems limit their utility. For example, because they are designed for high volume runs, they require customers to wait until enough samples have been collected to run at capacity, and often require complicated molecular barcoding methods to allow for maximum productivity. This increases costs significantly for smaller sized projects and severely reduces the flexibility of these systems.

Further, because of their short reads, 2nd generation technologies are not capable of addressing all of the relevant types of variation in the genome. Rare genetic variants, complex structural rearrangements, and other sources of variation (such as differentially methylated DNA sites) have recently been proven to play a larger role in explaining disease risk and progression, and may be more medically important than SNPs. Ultimately, it is necessary to look comprehensively across all types of variation to understand the fundamental complexity of the genome.

Currently, organizations are using a combination of 1st and 2nd generation approaches depending on the application. For example, 2nd generation systems are used primarily for resequencing and counting or tagging applications. Sanger sequencing continues to be the platform of choice for de novo sequencing, validation studies, and projects requiring a fast time to result (such as infectious disease monitoring and molecular diagnostics).

What is required is a breakthrough technology capable of offering a new performance envelope with improvements across applications and the ability to ultimately drive down the cost and time required for human genome sequencing to make it feasible for personalized medicine.

At a cost of more than $250 million, PacBio has developed that breakthrough technology.

Third Generation Sequencing
Third generation technologies are differentiated by single molecule resolution, very long reads, fast time to results, and lower overall cost, including the flexibility to cost-effectively perform both small and large projects.

PacBio’s Single Molecule Real Time (SMRT) System is a 3rd generation DNA sequencing
technology that enables a much wider range of applications when compared to 2nd generation technologies. Enzyme processivity enables much longer readlength while the speed of synthesis drives fast time to results. In addition, by monitoring the enzyme in real time, SMRT sequencing provides richer data, including kinetic information.

Together, these capabilities open new opportunities for disease research, including infectious disease studies, detection of rare variants, understanding the genomic complexity of cancer, and conducting epigenetic studies. Real-time detection is also critical to quickly and efficiently identifying and subtyping pathogens.

SMRT technology eliminates the current bottlenecks inherent in 2nd generation technologies by using DNA polymerase as a real-time sequencing engine. By observing the natural process of DNA synthesis in real-time without interruption, the system harnesses the power of the DNA polymerase, thereby capitalizing on the performance increases derived from millions of years of natural evolution. In order to enable “eavesdropping” on DNA synthesis as it occurs, PacBio developed three key innovations that overcame the challenges faced in previous attempts to conduct real-time single molecule sequencing:

1) The SMRT Cell, which enables single molecule, real-time observation of individual fluorophores against a dense background of labeled nucleotides while maintaining a high signal-to-noise ratio,

2) Phospholinked nucleotides, which enable long readlengths by producing a completely natural DNA strand through fast, accurate, and processive DNA synthesis, and

3) A novel detection platform that enables single molecule, real-time detection as well as flexibility in run configurations and applications.

SMRT Zero-Mode Waveguide (ZMW)
A ZMW is a hole, tens of nanometers in diameter, fabricated in a 100 nm metal film deposited on a silicon dioxide substrate. Each ZMW becomes a nanophotonic visualization chamber providing a detection volume of just 20 zeptoliters (10-21 liters). At this volume, the technology detects the activity of a single molecule among a background of thousands of labeled nucleotides. DNA polymerase molecules are attached to the bottom surface such that they permanently reside within the detection volume. Phospholinked nucleotides, each type labeled with a different colored fluorophore, are then introduced into the reaction solution at high concentrations that promote enzyme speed, accuracy, and processivity.

Through directed attachment strategies, over time, the number of ZMWs with a single active polymerase can be increased, delivering higher and higher yields. When DNA polymerase incorporates complementary nucleotides, the enzyme holds each nucleotide within the detection volume for tens of milliseconds—orders of magnitude longer than the amount of time it takes a nucleotide to diffuse in and out of the detection volume.

During this time, the engaged fluorophore emits fluorescent light whose color corresponds to the base identity. Then, as part of the natural incorporation cycle, the polymerase cleaves the bond that previously held the fluorophore in place and the dye diffuses out of the detection volume. Following incorporation, the signal
immediately returns to baseline and the process repeats. Unhampered and uninterrupted, the DNA polymerase continues incorporating multiple bases per second. In this way, the SMRT approach produces a completely natural long chain of DNA in minutes. Simultaneous and continuous excitation and detection occurs across all of the thousands of ZMWs in the SMRT Cell in real time.

More information:
Proof of principle in Science, January 2009
Pacific Biosciences YouTube
SMRT sequencing has been demonstrated for de novo genome sequencing in publications analyzing the E. coli outbreak in Germany in 2011 and in the cholera outbreak in Haiti in 2010, both in the New England Journal of Medicine. Scientists are also using single molecule real time sequencing in hybrid assemblies for de novo genomes to combine short-read sequence data with long-read sequence data.

No comments: