PacBio Sequencing

Overview

PacBio (Pacific Biosciences) uses SMRT sequencing (Single Molecule Real-Time) to produce long reads - often 10,000 to 25,000+ base pairs.

For better illustration, watch the video below:


How It Works

The Setup: ZMW (Zero-Mode Waveguide)

PacBio uses tiny wells called ZMWs - holes so small that light can only illuminate the very bottom.

At the bottom of each well:

  • A single DNA polymerase is fixed in place
  • A single DNA template is threaded through it

The Chemistry: Real-Time Detection

  1. Fluorescent nucleotides (A, T, G, C - each with different color) float in solution
  2. When polymerase grabs the correct nucleotide, it holds it in the detection zone
  3. Laser detects the fluorescence - we see which base is being added
  4. Polymerase incorporates the nucleotide, releases the fluorescent tag
  5. Repeat - watching DNA synthesis in real-time

Key difference from Illumina: We watch a single molecule of polymerase working continuously, not millions of molecules in sync.


Why Long Reads?

The circular template trick:

PacBio uses SMRTbell templates - DNA with hairpin adapters on both ends, forming a circle.

    ╭──────────────╮
    │              │
────┤   Template   ├────
    │              │
    ╰──────────────╯

The polymerase goes around and around, reading the same template multiple times.


Error Correction: Why High Accuracy?

Raw reads have ~10-15% error rate (mostly insertions/deletions)

But: Because polymerase circles the template multiple times, we get multiple reads of the same sequence.

CCS (Circular Consensus Sequencing):

  • Align all passes of the same template
  • Errors are random, so they cancel out
  • Result: >99.9% accuracy (HiFi reads)
Pass 1:  ATGC-CCAAA
Pass 2:  ATGCCC-AAA
Pass 3:  ATGCCCAAAA
Pass 4:  ATGCCC-AAA
         ──────────
Consensus: ATGCCCAAA  ✓

When to Use PacBio

Ideal for:

  • De novo genome assembly
  • Resolving repetitive regions
  • Detecting structural variants
  • Full-length transcript sequencing
  • Phasing haplotypes

Not ideal for:

  • Large-scale population studies (cost)
  • When short reads are sufficient