Lecture 1 — Introduction: Foundational Genetics & Genomics Concepts
1. Classical Genetics (Transmission/Formal Genetics): Focuses on how traits are passed from parents to offspring using breeding experiments and phenotypic analysis to infer genetic laws (e.g., Mendel's laws).
2. Molecular Genetics: Investigates the structure and function of genes at the molecular level (DNA, RNA, proteins), including gene expression, mutation, and gene regulation.
3. Population Genetics: Studies the distribution and change of allele frequencies within populations, linking genetics with evolutionary biology.
4. Quantitative Genetics: Analyzes traits controlled by multiple genes using statistical models to estimate genetic contribution to phenotypic variation (e.g., height, milk production).
Step 1: Calculate allele frequencies
Step 2: Expected genotype frequencies under HWE
Step 3: Comparison
Observed: 90 AA, 40 Aa, 70 aa
Expected: 60.5 AA, 99 Aa, 40.5 aa
There is a large excess of homozygotes and a deficit of heterozygotes (40 observed vs. 99 expected). The population is NOT in HWE. This deviation could be caused by inbreeding, population structure, selection, or non-random mating.
Definition: Linkage disequilibrium (LD) is the non-random association of alleles at two or more loci in a population. Certain allele combinations occur together more (or less) frequently than expected under independence.
LD vs. Physical Linkage: LD ≠ physical linkage. Physical linkage refers to genes being on the same chromosome. Physical linkage contributes to LD (close genes recombine less), but LD can also arise from other forces: small population size, genetic drift, selection, population admixture, or new mutations. Conversely, physically linked genes can have low LD if enough recombination has occurred over time.
Real-world example: Lactose tolerance in humans — a variant near the LCT gene (lactose digestion) is in high LD with nearby SNPs, forming a haplotype block maintained by positive selection because individuals with this haplotype digest lactose better.
The .ped file is a text file with no header, where each line corresponds to one individual. The columns are:
Column 1 — Family ID: Identifier for the family, used to group related individuals.
Column 2 — Individual ID: Unique identifier for each individual.
Column 3 — Paternal ID: Father's ID (0 if unknown).
Column 4 — Maternal ID: Mother's ID (0 if unknown).
Column 5 — Sex: 1 = Male, 2 = Female, 0 = Unknown.
Column 6 — Phenotype: 1 = control, 2 = case, -9 or 0 = missing.
Column 7+ — Allele data: Genotype information with two alleles per locus (e.g., A A, G T). The number of loci can be as many as the dataset supports.
Genomic positions for each locus are specified in the associated .map or .bim file (chromosome, SNP ID, genetic distance, physical position).
The equation: Phenotype = Genotype effect + Environmental effect, or Var(P) = Var(G) + Var(E)
Three components of the genotype effect:
1. Additive genetic effect: The sum of individual allele effects across all loci contributing to the trait.
2. Dominance effect (intragenic): Interaction between alleles at the same gene (e.g., how Tt differs from the average of TT and tt).
3. Epistatic effect (intergenic): Interaction between alleles at different genes.
Heritability (h²): Measures how much of the phenotypic variation in a population is due to genetic differences. It is calculated by comparing related individuals (since unrelated individuals don't share genetic background). Ranges: Low (<0.1), Medium (0.1–0.4), High (>0.4). It tells us what proportion of Var(P) is attributable to Var(G).