Important Oral Questions (Core Exam Questions)

A focused collection of high-priority oral exam questions covering the most frequently tested topics. Master these before your exam!

Keyboard Shortcuts

Key	Action
`R`	Reveal all answers on page
`H`	Hide all answers on page
`Space` / `Enter`	Toggle focused card

⭐ High-Priority Topics

These questions cover concepts that are essential for oral exams. Pay special attention to understanding the reasoning behind experimental choices and the ability to compare techniques.

1. Experimental Design & Model Selection

⭐

Core Question Experimental Design

Hard

When reading a proteomics paper: Why did the researchers choose a particular cell line or model system? What factors influence this choice?

✓ Model Answer

Model/cell line selection depends on several factors:

Biological Relevance:

Does the model accurately represent the disease/condition being studied?
Does it express the proteins of interest?
Is it from the relevant tissue type?

Technical Considerations:

For SILAC: Cells must be able to grow in culture and incorporate labeled amino acids
Protein yield: Sufficient protein for analysis
Reproducibility: Well-characterized, stable cell lines preferred
Availability: Commercially available vs. primary cells

Common choices:

HeLa cells: Easy to culture, well-characterized
HEK293: High transfection efficiency
Primary cells: More physiologically relevant but harder to work with
Patient-derived cells: Most relevant for translational studies

💡 Key insight: Always be prepared to justify WHY a specific model was chosen — this shows critical thinking about experimental design.

⭐

Core Question Sample Strategy

Hard

What is the difference between pooled samples and single/individual samples in proteomics? When would you use each approach?

✓ Model Answer

Pooled Samples:

Multiple individual samples combined into one
Represents an "average" of the group
Advantages:
- Reduces individual biological variation
- Increases protein amount for analysis
- Reduces number of MS runs needed
- Cost-effective for initial screening
Disadvantages:
- Loses individual variation information
- Cannot identify outliers
- Cannot perform statistical analysis on individuals

Single/Individual Samples:

Each sample analyzed separately
Advantages:
- Captures biological variability
- Enables proper statistical analysis
- Can identify individual responders/non-responders
- Required for biomarker validation
Disadvantages:
- More expensive (more MS runs)
- More time-consuming
- May have limited sample amount per individual

💡 Best practice: Use pooled samples for discovery phase, then validate with individual samples. For clinical studies, individual samples are essential.

2. ESI (Electrospray Ionization)

⭐

Core Question ESI Mechanism

Hard

Explain ESI (Electrospray Ionization) and how it works. What types of ions are usually formed?

✓ Model Answer

ESI Mechanism (step-by-step):

Spray Formation: Sample solution is pumped through a capillary needle at high voltage (2-5 kV)
Taylor Cone: Electric field causes liquid to form a cone shape at the needle tip
Droplet Formation: Fine charged droplets are sprayed from the cone tip
Desolvation: Warm nitrogen gas assists solvent evaporation; droplets shrink
Coulombic Explosion: As droplets shrink, charge density increases until Rayleigh limit is reached → droplets explode into smaller droplets
Ion Release: Process repeats until fully desolvated, multiply charged ions are released

Types of ions formed:

MULTIPLY CHARGED ions — this is the key characteristic!
Positive mode: [M+nH]ⁿ⁺ (e.g., [M+2H]²⁺, [M+3H]³⁺)
Negative mode: [M-nH]ⁿ⁻
Creates a charge envelope (Gaussian distribution of charge states)

Why multiple charges matter:

m/z = mass / charge
Multiple charges reduce m/z values
Allows large proteins (>100 kDa) to be analyzed within typical mass analyzer range

Advantages of ESI:

Soft ionization (minimal fragmentation)
Directly compatible with LC (on-line coupling)
Very high sensitivity (attomole range)

Disadvantages:

Sensitive to salts and detergents (ion suppression)
Requires clean samples
More complex spectra due to multiple charge states

3. MALDI (Matrix-Assisted Laser Desorption/Ionization)

⭐

Core Question MALDI Complete

Hard

How does MALDI work? What types of ions are involved? What are the pros and cons? Are ions singly or multiply charged? Which analyzers are typically used?

✓ Model Answer

How MALDI works:

Sample Preparation: Analyte mixed with organic matrix (e.g., α-CHCA, DHB, sinapinic acid)
Crystallization: Mixture spotted on metal plate; solvent evaporates forming co-crystals
Laser Irradiation: UV laser (337 nm nitrogen or 355 nm Nd:YAG) hits the crystals
Matrix Absorption: Matrix absorbs photon energy, becomes electronically excited
Desorption: Matrix undergoes "micro-explosion," ejecting analyte into gas phase
Ionization: Proton transfer from matrix to analyte creates ions

Types of ions:

SINGLY CHARGED ions — key difference from ESI!
Positive mode: [M+H]⁺ (most common for peptides)
Negative mode: [M-H]⁻
Also: [M+Na]⁺, [M+K]⁺ (adducts)

Pros:

Simple spectra (singly charged = easy interpretation)
More tolerant to salts and contaminants than ESI
Very robust, high-throughput (~10⁴ samples/day)
Wide mass range (up to 500 kDa)
Easy to use

Cons:

Lower sensitivity than ESI (femtomole vs. attomole)
Not easily coupled to LC (off-line)
Matrix interference in low mass region
Shot-to-shot variability

Typical analyzers used with MALDI:

TOF (Time-of-Flight) — most common combination (MALDI-TOF)
TOF/TOF — for MS/MS analysis
Can also be coupled with: FT-ICR, Orbitrap

💡 Remember: MALDI = Singly charged, ESI = Multiply charged. This is a classic exam comparison!

4. SELDI vs MALDI

⭐

Core Question SELDI

Hard

What is SELDI? How does it compare to MALDI?

✓ Model Answer

SELDI (Surface-Enhanced Laser Desorption/Ionization):

A variation of MALDI where the target surface is chemically modified to selectively bind certain proteins.

Key difference from MALDI:

Feature	MALDI	SELDI
Surface	Inert metal plate	Chemically modified (active) surface
Sample prep	Simple spotting	Surface captures specific proteins
Selectivity	None (all proteins)	Surface-dependent selectivity
Complexity	Full sample complexity	Reduced (only bound proteins)
Washing	Not typical	Unbound proteins washed away

SELDI Surface Types:

Chemical surfaces:
- CM10: Weak cation exchange
- Q10: Strong anion exchange
- H50: Hydrophobic/reverse phase
- IMAC30: Metal affinity (binds His, phosphoproteins)
Biological surfaces:
- Antibody-coated
- Receptor-coated
- DNA/RNA-coated

SELDI Workflow:

Spot sample on modified surface
Specific proteins bind based on surface chemistry
Wash away unbound proteins
Apply matrix
Analyze by laser desorption (same as MALDI)

SELDI Advantages:

Reduces sample complexity (acts as "on-chip purification")
Good for biomarker discovery/profiling
Requires minimal sample preparation

SELDI Limitations:

Lower resolution than standard MALDI
Limited protein identification (profiling only)
Reproducibility issues
Largely replaced by LC-MS approaches

5. Peptide Mass Fingerprinting (PMF)

⭐

Core Question PMF Complete

Hard

What is Peptide Mass Fingerprinting (PMF)? How are the proteins digested into fragments? What is the specificity of the enzyme used?

✓ Model Answer

PMF (Peptide Mass Fingerprinting):

A protein identification method where a protein is enzymatically digested into peptides, and the resulting peptide masses are compared to theoretical masses from database proteins.

PMF Workflow:

Protein isolation: Usually from 2D gel spot
Destaining: Remove Coomassie/silver stain
Reduction & Alkylation: Break and block disulfide bonds
Enzymatic digestion: Typically with trypsin
Peptide extraction: From gel pieces
MALDI-TOF analysis: Measure peptide masses
Database search: Compare experimental masses to theoretical

Digestion enzyme — TRYPSIN:

Specificity:

Cleaves at the C-terminal side of:
Lysine (K) and Arginine (R)
EXCEPT when followed by Proline (P)

Why trypsin is the gold standard:

High specificity: Predictable cleavage sites
Optimal peptide size: 6-20 amino acids (ideal for MS)
Basic residues at C-terminus: Promotes ionization in positive mode
Robust: Works well across pH 7-9
Reproducible: Produces consistent results
Self-digestion peaks: Can be used for internal calibration

Other enzymes sometimes used:

Chymotrypsin: Cleaves after Phe, Tyr, Trp
Glu-C: Cleaves after Glu (and Asp at high pH)
Lys-C: Cleaves after Lys only
Asp-N: Cleaves before Asp

💡 Exam tip: "Trypsin cleaves C-terminal to K and R, except before P" — memorize this!

6. Bottom-Up vs Shotgun Proteomics

⭐

Core Question Approaches

Hard

What is the difference between Bottom-Up and Shotgun proteomics?

✓ Model Answer

Important clarification: Shotgun proteomics IS a type of bottom-up approach. The distinction is in the workflow:

Feature	Classical Bottom-Up (PMF)	Shotgun (Bottom-Up)
Protein separation	FIRST (2D-PAGE, then cut spots)	None or minimal
Digestion	Single isolated protein	Entire protein mixture
Peptide separation	Usually none	LC (often multi-dimensional)
MS analysis	MALDI-TOF (PMF)	LC-MS/MS
Identification	Mass matching	MS/MS sequencing
Throughput	One protein at a time	Thousands of proteins

Classical Bottom-Up Workflow:

Separate proteins by 2D-PAGE
Cut out individual spots
Digest each spot separately
Analyze by MALDI-TOF
PMF database search

Shotgun Workflow:

Lyse cells, extract all proteins
Digest entire mixture into peptides
Separate peptides by LC (MudPIT uses 2D-LC)
Analyze by MS/MS
Database search with MS/MS spectra

Why "Shotgun"?

Like a shotgun blast — analyzes everything at once
No pre-selection of proteins
Relies on computational deconvolution

💡 Key distinction: Classical bottom-up separates proteins first, shotgun separates peptides (after digesting the whole mixture).

7. Gel Electrophoresis Limitations

⭐

Core Question 2D-PAGE Limitations

Hard

What are the limitations of 2D gel electrophoresis?

✓ Model Answer

Sample-Related Limitations:

Hydrophobic proteins: Membrane proteins poorly soluble in IEF buffers → underrepresented
Extreme pI proteins: Very acidic (<3) or basic (>10) proteins difficult to focus
Extreme MW proteins:
- Large proteins (>200 kDa) don't enter gel well
- Small proteins (<10-15 kDa) may run off the gel
Low-abundance proteins: Masked by high-abundance proteins; below detection limit
Dynamic range: Limited (~10⁴), much less than proteome range (~10⁶-10⁷)

Technical Limitations:

Poor reproducibility: Gel-to-gel variation requires running in triplicate
Labor-intensive: Manual, time-consuming, hard to automate
Low throughput: Cannot be easily scaled up
Co-migration: Proteins with similar pI/MW appear in same spot
Quantification limited: Staining is semi-quantitative at best

Analytical Limitations:

Proteome coverage gap: Yeast example: 6,000 genes → 4,000 expressed proteins → only ~1,000 detected by 2DE
Requires MS for ID: 2DE is only separation; identification needs additional steps
PTM detection: May see multiple spots but hard to characterize modifications

Practical Issues:

Streaking/smearing from degradation
Background interference from staining
Keratin contamination common

💡 These limitations drove development of gel-free approaches like shotgun proteomics and MudPIT.

8. Hybrid Mass Spectrometry Systems

⭐

Core Question Hybrid MS

Hard

What is used in hybrid mass spectrometry systems? What are the limitations?

✓ Model Answer

Hybrid MS: Instruments combining two or more different mass analyzers to leverage their complementary strengths.

Common Hybrid Configurations:

Hybrid Type	Components	Strengths
Q-TOF	Quadrupole + TOF	High resolution, accurate mass, good for ID
Triple Quad (QqQ)	Q1 + Collision cell + Q3	Excellent for quantification (SRM/MRM)
Q-Orbitrap	Quadrupole + Orbitrap	Very high resolution + sensitivity
LTQ-Orbitrap	Linear ion trap + Orbitrap	High speed + high resolution
TOF-TOF	TOF + Collision + TOF	High-energy fragmentation with MALDI
Q-Trap	Quadrupole + Ion trap	Versatile, MRM + scanning modes

How they work (Q-TOF example):

Q1 (Quadrupole): Selects precursor ion of interest
Collision cell: Fragments the selected ion (CID)
TOF: Analyzes all fragments with high resolution and mass accuracy

Limitations of Hybrid Systems:

Cost: Very expensive instruments ($500K - $1M+)
Complexity: Requires expert operators
Maintenance: More components = more potential failures
Data complexity: Generates massive datasets
Duty cycle trade-offs: Can't optimize all parameters simultaneously
Ion transmission losses: Each analyzer stage loses some ions

Specific limitations by type:

Q-TOF: Lower sensitivity in MS/MS mode
Ion trap hybrids: Space charge effects limit dynamic range
Orbitrap hybrids: Slower scan speed than TOF

9. TUNEL Analysis

⭐

Core Question TUNEL Assay

Medium

What is TUNEL analysis? What does it detect and how does it work?

✓ Model Answer

TUNEL = Terminal deoxynucleotidyl transferase dUTP Nick End Labeling

Purpose: Detects apoptosis (programmed cell death) by identifying DNA fragmentation.

Principle:

During apoptosis, endonucleases cleave DNA between nucleosomes
This creates many DNA fragments with exposed 3'-OH ends ("nicks")
TUNEL labels these free 3'-OH ends

How it works:

TdT enzyme (terminal deoxynucleotidyl transferase) is added
TdT adds labeled dUTP nucleotides to 3'-OH ends of DNA breaks
Labels can be: fluorescent (FITC), biotin (detected with streptavidin), or other markers
Visualized by fluorescence microscopy or flow cytometry

Applications:

Detecting apoptosis in tissue sections
Studying cell death in disease models
Drug toxicity testing
Cancer research

Limitations:

Can also label necrotic cells (not specific to apoptosis)
False positives from mechanical DNA damage during sample prep
Should be combined with other apoptosis markers

Follow-up study suggestions:

Caspase activity assays (more specific for apoptosis)
Annexin V staining (early apoptosis marker)
Western blot for cleaved caspase-3 or PARP

10. Phage Display

⭐

Core Question Phage Display

Hard

What is Phage Display? What is its main limitation?

✓ Model Answer

Phage Display: A molecular biology technique where peptides or proteins are expressed ("displayed") on the surface of bacteriophage particles.

How it works:

Library Creation: DNA encoding peptides/proteins is inserted into phage coat protein gene
Expression: Phage expresses the foreign peptide fused to its coat protein (usually pIII or pVIII)
Panning: Library exposed to target molecule (bait) immobilized on surface
Selection: Non-binding phages washed away; binding phages retained
Amplification: Bound phages eluted and amplified in bacteria
Iteration: Process repeated 3-4 times to enrich for strong binders
Identification: DNA sequencing reveals the binding peptide sequence

Applications:

Antibody discovery and engineering
Finding protein-protein interaction partners
Epitope mapping
Drug target identification
Peptide ligand discovery

MAIN LIMITATIONS:

Bacterial expression system:
- No post-translational modifications (no glycosylation, phosphorylation)
- May not fold mammalian proteins correctly
- Codon bias issues
Size constraints: Large proteins difficult to display
Selection bias: Some peptides toxic to bacteria → lost from library
False positives: Selection for phage propagation, not just binding
Context-dependent: Displayed peptide may behave differently than free peptide
Limited to protein/peptide interactions: Cannot study interactions requiring membrane context

💡 Key limitation to mention: Prokaryotic expression = no eukaryotic PTMs and potential protein misfolding.

11. Energy Transfer Methods (FRET/BRET)

⭐

Core Question Energy Transfer

Hard

Explain energy transfer-based methods for studying protein interactions. What are donor and acceptor? What types of signals are obtained?

✓ Model Answer

Energy Transfer Methods: Techniques that detect protein-protein interactions based on the transfer of energy between two labeled molecules when they come into close proximity.

FRET (Förster Resonance Energy Transfer):

Donor: Fluorescent molecule that absorbs excitation light (e.g., CFP, GFP)
Acceptor: Fluorescent molecule that receives energy from donor (e.g., YFP, RFP)
Mechanism: Non-radiative energy transfer through dipole-dipole coupling
Distance requirement: 1-10 nm (typically <10 nm for efficient transfer)

BRET (Bioluminescence Resonance Energy Transfer):

Donor: Bioluminescent enzyme (e.g., Renilla luciferase)
Acceptor: Fluorescent protein (e.g., GFP, YFP)
Advantage: No external excitation needed → lower background

Signals Obtained:

When proteins are FAR apart:
- Only donor emission observed
- No energy transfer
When proteins INTERACT (close proximity):
- Donor emission decreases (quenching)
- Acceptor emission increases (sensitized emission)
- FRET efficiency can be calculated

Types of signals measured:

Sensitized emission: Acceptor fluorescence upon donor excitation
Donor quenching: Decrease in donor fluorescence intensity
Donor lifetime: Decrease in fluorescence lifetime (FLIM-FRET)
Acceptor photobleaching: Donor recovery after acceptor is bleached

Applications:

Detecting protein-protein interactions in living cells
Monitoring conformational changes
Studying signaling pathway activation
Biosensor development

Common FRET pairs:

CFP (cyan) → YFP (yellow)
BFP (blue) → GFP (green)
GFP → RFP/mCherry

💡 Key concept: FRET is a "molecular ruler" — efficiency depends on distance (1/r⁶), so it only works when proteins are very close (<10 nm), indicating direct interaction.

12. Quick Review - Core Concepts

Test yourself on these essential concepts:

Trypsin cleaves at the C-terminal of ❓ which residues? K (Lysine) and R (Arginine), except before P (Proline)

MALDI produces mainly ❓ charged ions Singly charged [M+H]⁺

ESI produces mainly ❓ charged ions Multiply charged [M+nH]ⁿ⁺

The main difference between SELDI and MALDI is ❓ SELDI uses chemically modified surfaces for selective binding

Shotgun proteomics separates ❓ proteins or peptides first? Peptides (digests whole mixture first)

Classical bottom-up (PMF) separates ❓ proteins or peptides first? Proteins (2D-PAGE, then digests individual spots)

TUNEL detects ❓ DNA fragmentation / Apoptosis

The main limitation of phage display is ❓ Prokaryotic expression (no PTMs, potential misfolding)

FRET requires donor and acceptor to be within ❓ nm <10 nm (typically 1-10 nm)

Q-TOF is a hybrid combining ❓ Quadrupole + Time-of-Flight

In pooled samples you lose ❓ Individual variation / ability to do statistics on individuals

The "proteomic gap" in 2DE refers to ❓ Proteins expressed but not detected by 2D electrophoresis

13. CID (Collision-Induced Dissociation)

⭐

Core Question Fragmentation

Hard

Describe the process of Collision-Induced Dissociation (CID) and its significance in tandem mass spectrometry.

✓ Model Answer

CID (Collision-Induced Dissociation): A fragmentation method where precursor ions are fragmented by colliding them with an inert gas.

How CID works:

Ion selection: Precursor ion selected in first mass analyzer (MS1)
Collision cell: Selected ion enters a chamber filled with inert gas (Argon, Nitrogen, or Xenon)
Collision: Ion collides with gas molecules, converting kinetic energy to internal energy
Fragmentation: Internal energy causes bonds to break, producing fragment ions
Analysis: Fragment ions analyzed in second mass analyzer (MS2)

Significance in MS/MS:

Generates b-ions and y-ions for peptide sequencing
Provides structural information about the parent ion
Enables amino acid sequence determination
Allows protein identification via database searching
Can reveal PTM locations

Other fragmentation methods:

HCD: Higher-energy Collisional Dissociation (used in Orbitrap)
ETD: Electron Transfer Dissociation (better for PTMs, larger peptides)
ECD: Electron Capture Dissociation (preserves labile modifications)

💡 Key point: CID is the most common fragmentation method and primarily breaks peptide bonds, generating predictable b- and y-ion series.

14. b-ions and y-ions

⭐

Core Question Fragment Ions

Hard

What are b-ions and y-ions? How are they used for peptide sequencing?

✓ Model Answer

Fragment ions from peptide backbone cleavage:

b-ions:

Contain the N-terminal portion of the peptide
Charge retained on the N-terminal fragment
Named b₁, b₂, b₃... (number = amino acids from N-terminus)

y-ions:

Contain the C-terminal portion of the peptide
Charge retained on the C-terminal fragment
Named y₁, y₂, y₃... (number = amino acids from C-terminus)

Visual representation:

        N-terminus ← → C-terminus
        H₂N-[AA₁]-[AA₂]-[AA₃]-[AA₄]-COOH
             ↓     ↓     ↓
            b₁    b₂    b₃    (N-terminal fragments)
                  y₃    y₂    y₁  (C-terminal fragments)

How sequencing works:

Mass differences between consecutive b-ions (or y-ions) = amino acid masses
b₂ - b₁ = mass of 2nd amino acid
y₃ - y₂ = mass of amino acid at position (n-2)
Complete series allows full sequence determination

Why both series are useful:

Complementary information confirms sequence
Gaps in one series may be filled by the other
b + y should equal precursor mass + 18 (water)

💡 Remember: b-ions = N-terminal, y-ions = C-terminal. The mass difference between consecutive ions reveals the amino acid identity.

15. Monoisotopic vs Average Mass

⭐

Core Question Mass Definitions

Medium

Define monoisotopic mass and average mass. How are they used in peptide mass fingerprinting?

✓ Model Answer

Monoisotopic Mass:

Mass calculated using the most abundant isotope of each element
For organic molecules: ¹²C, ¹H, ¹⁴N, ¹⁶O, ³²S
Corresponds to the first peak in the isotope distribution (M+0)
More precise, used for accurate mass measurements

Average Mass:

Weighted average of all naturally occurring isotopes
Takes into account natural isotope abundance
Corresponds to the centroid of the isotope envelope
Used when resolution is insufficient to resolve isotopes

Example (for Carbon):

Monoisotopic: ¹²C = 12.0000 Da
Average: (98.9% × 12.0000) + (1.1% × 13.0034) = 12.011 Da

Use in PMF:

Situation	Mass Type	Reason
High-resolution MS (MALDI-TOF)	Monoisotopic	Can resolve isotope peaks
Low-resolution MS	Average	Cannot resolve isotopes
Small peptides (<2000 Da)	Monoisotopic	First peak is tallest
Large proteins (>10 kDa)	Average	Monoisotopic peak too small to detect

💡 Key point: For PMF with MALDI-TOF, use monoisotopic masses of peptides for database matching — this gives the highest accuracy.

16. Mass Analyzers Comparison

⭐

Core Question Mass Analyzers

Hard

Compare the TOF, Quadrupole, and Orbitrap mass analyzers. How does each separate ions? Compare their resolution, mass accuracy, and sensitivity.

✓ Model Answer

How each analyzer separates ions:

TOF (Time-of-Flight):

Ions accelerated through same voltage, gain same kinetic energy
KE = ½mv² → lighter ions travel faster
Measures flight time through drift tube
Shorter time = lower m/z

Quadrupole:

Four parallel rods with oscillating RF/DC voltages
Creates oscillating electric field
Only ions with specific m/z have stable trajectories
Others collide with rods and are lost
Acts as a mass filter (scanning or SIM mode)

Orbitrap:

Ions trapped orbiting around central spindle electrode
Oscillate axially with frequency dependent on m/z
Measures oscillation frequency (image current)
Fourier transform converts frequency → m/z

Comparison table:

Parameter	TOF	Quadrupole	Orbitrap
Resolution	10,000-60,000	1,000-4,000 (low)	100,000-500,000+
Mass Accuracy	5-20 ppm	100-1000 ppm	<2-5 ppm
Sensitivity	High (femtomole)	High	High (attomole)
Mass Range	Unlimited (in principle)	Up to ~4000 m/z	Up to ~6000 m/z
Scan Speed	Very fast	Fast	Slower
Cost	Moderate	Low	High
Best for	MALDI, fast scanning	Quantification (SRM)	High accuracy ID

💡 Summary: Orbitrap = highest resolution/accuracy; Quadrupole = best for quantification; TOF = fastest, good all-rounder.

17. De Novo Sequencing

⭐

Core Question Sequencing

Hard

What is de novo sequencing? When would you use it instead of database searching?

✓ Model Answer

De Novo Sequencing: Determining the amino acid sequence of a peptide directly from its MS/MS spectrum, without relying on a sequence database.

How it works:

Acquire high-quality MS/MS spectrum
Identify b-ion and y-ion series
Calculate mass differences between consecutive peaks
Match mass differences to amino acid residue masses
Build sequence from N- to C-terminus (or reverse)
Validate with complementary ion series

When to use de novo sequencing:

Protein NOT in database:
- Novel organisms without sequenced genomes
- Uncharacterized proteins
- Organisms with incomplete proteome databases
Unexpected modifications: PTMs not predicted by database
Mutations/variants: Sequence differs from database entry
Antibody sequencing: Highly variable regions
Ancient proteins: Paleoproteomics
Validation: Confirming database search results

Challenges:

Requires high-quality spectra with complete ion series
Isobaric amino acids (Leu/Ile = 113 Da) cannot be distinguished
Labor-intensive and time-consuming
May have gaps in sequence coverage

Software tools: PEAKS, Novor, PepNovo, DeNovoGUI

💡 Key scenario: "What do you do if the protein is not in the database?" → Use de novo sequencing to determine sequence directly from MS/MS data.

18. Inteins

⭐

Core Question Protein Engineering

Medium

What are inteins? Explain their significance in protein engineering and purification.

✓ Model Answer

Inteins: Self-splicing protein segments that can excise themselves from a precursor protein, leaving behind the flanking exteins joined together.

Terminology:

Intein: INternal proTEIN (gets removed)
Extein: EXternal proTEIN (flanking sequences that remain)
N-extein — [INTEIN] — C-extein → N-extein—C-extein + free intein

Mechanism (protein splicing):

N-S or N-O acyl shift at N-terminus of intein
Transesterification
Asparagine cyclization releases intein
S-N or O-N acyl shift joins exteins with native peptide bond

Applications in protein engineering:

Self-cleaving affinity tags:
- Protein fused to intein + affinity tag (e.g., chitin-binding domain)
- Bind to affinity column
- Induce intein cleavage (pH, temperature, or thiol)
- Pure protein released, tag remains on column
- Advantage: No protease needed, no extra residues left
Protein ligation (Expressed Protein Ligation):
- Join two protein fragments with native peptide bond
- Useful for incorporating unnatural amino acids
- Creating segmentally labeled proteins for NMR
Protein cyclization: Create cyclic proteins
Conditional protein splicing: Control protein activity

💡 Main advantage: Inteins enable tag-free protein purification — the protein is released without any extra amino acids from the tag.

19. Interactomics Methods

⭐

Core Question Interactomics

Hard

What is interactomics? Describe the main experimental techniques used to study protein-protein interactions: Yeast Two-Hybrid, Co-IP, and AP-MS.

✓ Model Answer

Interactomics: The study of protein-protein interactions (PPIs) and the networks they form within biological systems.

1. Yeast Two-Hybrid (Y2H):

Principle: Reconstitution of transcription factor activity
Method:
- Bait protein fused to DNA-binding domain
- Prey protein fused to activation domain
- If bait and prey interact → transcription factor reconstituted → reporter gene expressed
Pros: High-throughput, detects direct binary interactions
Cons: In vivo but in yeast (not native environment), high false positive rate, only nuclear interactions

2. Co-Immunoprecipitation (Co-IP):

Principle: Antibody pulldown of protein complexes
Method:
- Lyse cells, add antibody against bait protein
- Antibody-protein complex captured on beads
- Wash away non-specific proteins
- Elute and analyze interacting proteins (Western blot or MS)
Pros: Detects endogenous interactions, physiological conditions
Cons: Requires good antibody, may miss transient interactions, cannot distinguish direct from indirect interactions

3. Affinity Purification-Mass Spectrometry (AP-MS):

Principle: Tagged bait protein pulls down interaction partners
Method:
- Express tagged bait protein (FLAG, HA, TAP tag)
- Lyse cells, capture bait + interactors on affinity resin
- Wash stringently
- Elute and identify interactors by MS
Pros: Unbiased identification, can detect entire complexes
Cons: Tag may affect interactions, overexpression artifacts, false positives from sticky proteins

Method	Throughput	Direct/Indirect	Environment
Y2H	High	Direct only	Yeast nucleus
Co-IP	Low	Both	Native
AP-MS	Medium	Both	Native (with tag)

20. What If Protein Is Not In Database?

⭐

Core Question Database Issues

Hard

What do you do if the protein is not in the database? How can you still identify an unknown protein?

✓ Model Answer

Strategies when protein is not in database:

1. De Novo Sequencing:

Determine peptide sequence directly from MS/MS spectrum
Calculate mass differences between fragment ions
Match to amino acid masses
Build sequence without database reference

2. Homology/Sequence Tag Searching:

Use short sequence tags from de novo to search related organisms
BLAST search against broader databases (NCBI nr)
MS-BLAST: Search with imperfect sequences
May find homologous protein in related species

3. Error-Tolerant Database Searching:

Allow for mutations, modifications, or sequence variants
Search with wider mass tolerance
Consider unexpected PTMs or SNPs

4. EST/Transcriptome Database Search:

Use expressed sequence tags (EST) databases
Search against RNA-seq data from same organism
May contain unannotated protein sequences

5. Spectral Library Searching:

Compare experimental spectrum to library of acquired spectra
May match even without sequence information

6. Genomic Six-Frame Translation:

If genome is available but not annotated
Translate genome in all 6 reading frames
Search MS data against translated sequences

Practical workflow:

First: Try error-tolerant search or related species database
Second: Perform de novo sequencing on best spectra
Third: BLAST de novo sequences against NCBI
Fourth: If genome available, try 6-frame translation

💡 Key answer: Use de novo sequencing to get peptide sequences directly from MS/MS data, then use these sequences to search broader databases or identify homologs.

21. 2D-PAGE Workflow

⭐

Core Question 2D Electrophoresis

Medium

Describe the 2D-PAGE workflow. What is separated in each dimension?

✓ Model Answer

2D-PAGE = Two-Dimensional Polyacrylamide Gel Electrophoresis

Principle: Separates proteins by TWO independent properties for maximum resolution.

First Dimension: Isoelectric Focusing (IEF)

Separates proteins by isoelectric point (pI)
Uses immobilized pH gradient (IPG) strips
Proteins migrate until net charge = 0
High voltage (up to 8000 V), long focusing time

Second Dimension: SDS-PAGE

Separates proteins by molecular weight (MW)
IPG strip equilibrated with SDS, placed on gel
SDS denatures proteins and provides uniform charge
Smaller proteins migrate faster

Complete workflow:

Sample preparation: Lysis, solubilization in urea/thiourea/CHAPS
Rehydration: Load sample onto IPG strip
IEF: Focus proteins by pI (12-24 hours)
Equilibration: Reduce (DTT) and alkylate (IAA) proteins in SDS buffer
SDS-PAGE: Separate by MW (4-6 hours)
Staining: Coomassie, silver, or fluorescent (SYPRO Ruby)
Image analysis: Detect spots, compare gels
Spot picking: Excise spots of interest
MS analysis: In-gel digestion → MALDI-TOF (PMF) or LC-MS/MS

Dimension	Property	Method	Direction
1st	pI (charge)	IEF	Horizontal
2nd	MW (size)	SDS-PAGE	Vertical

💡 Remember: 1st dimension = pI (IEF), 2nd dimension = MW (SDS-PAGE). This gives "orthogonal" separation for maximum resolution.

22. Quick Review - Additional Concepts

Test yourself on these additional essential concepts:

CID stands for ❓ Collision-Induced Dissociation

b-ions contain the ❓ terminus N-terminus

y-ions contain the ❓ terminus C-terminus

Monoisotopic mass uses the ❓ isotope Most abundant isotope of each element

Which mass analyzer has the highest resolution? ❓ Orbitrap (100,000-500,000+)

Which mass analyzer is best for quantification (SRM)? ❓ Quadrupole (Triple Quad)

De novo sequencing is used when ❓ Protein is not in the database

Inteins are useful for ❓ Tag-free protein purification / protein ligation

Y2H detects ❓ interactions only Direct binary interactions

In 2D-PAGE, the 1st dimension separates by ❓ pI (isoelectric point) using IEF

Leucine and Isoleucine cannot be distinguished because ❓ They have identical mass (113 Da) - isobaric

TOF separates ions by their ❓ Flight time through drift tube

Bioinformatics Forever