Lecture 14 – Software for Population Genomic Analysis (PLINK)
--file is used to load text-format PED + MAP files (e.g., --file Altamurana looks for Altamurana.ped and Altamurana.map). --bfile is used for binary files (FAM, BIM, BED). This is an important and frequently tested distinction.1 3 2 1 1 1 A A T C. What can we conclude about individual 3?--make-bed converts the input to binary format (.bed/.bim/.fam). --recode does the opposite — it outputs text PED/MAP format. --bfile is an input flag (to load binary files), not a conversion flag. Example: ./plink --file Altamurana --make-bed --out Altamurana_binary.--mind 0.1 do during quality control?--mind filters individuals (samples), not SNPs. It removes samples with a missing genotype rate above the specified threshold (here 10%). The equivalent filter for SNPs is --geno. This is a classic exam trap: confusing --mind (individuals) with --geno (SNPs).--geno 0.1 includes only SNPs with ≤10% missing data (i.e., ≥90% genotyping rate). Option A confuses --geno (SNPs) with --mind (individuals). --maf 0.05 includes SNPs with MAF ≥ 0.05 (not removes them). --hwe 0.01 includes SNPs with HWE p-value ≥ 0.01 (removes those significantly deviating from HWE).--maf is used without a specified value?--freq generates a .frq file with allele frequencies (CHR, SNP, A1, A2, MAF, NCHROBS). --hardy tests for Hardy-Weinberg equilibrium. --maf is a QC filter, not a statistics command. --assoc performs association testing.--genome, which produces a .genome file; (2) load that file with --read-genome and run --cluster --mds-plot N where N is the number of dimensions. The lecture shows: step 1: --genome --out Al_Ap-Ba_genome, step 2: --read-genome ... --cluster --mds-plot 2.--homozyg-kb 1000 (1000 kbp = 1 Mbp window), --homozyg-window-het 0 (no heterozygous SNPs), --homozyg-window-missing 5 (max 5 missing), --homozyg-snp 15 (minimum 15 SNPs), --homozyg-density 100 (1 SNP per 100 kb). Option C is the main trap — it allows 1 heterozygous SNP, but the lecture explicitly sets this to 0..hom.indiv output file contain?.hom.indiv file provides a per-individual summary with columns FID, IID, PHE, NSEG (number of segments), KB (total ROH length), and KBAVG (average ROH size). The .hom file (not .hom.indiv) contains one row per individual ROH region. These two are often confused.--assoc flag with quantitative phenotypes uses linear regression.--assoc performs association testing. For quantitative traits it produces a .qassoc file. The lecture example: ./plink --file Cattle --assoc --out GWAS_stature_cattle_no_covariates. --genome computes IBS distances, --homozyg detects ROH, and --freq calculates allele frequencies..hom output file, which columns describe the boundaries of an identified ROH?wc -l Altamurana.ped and get 24. You also run wc -l Altamurana.map and get 54241. What do these numbers tell you?wc -l.The formula for the genomic inbreeding coefficient is:
This means about 16% of this individual's autosomal genome is covered by runs of homozygosity, indicating a moderate level of genomic inbreeding. The population mean FROH would be calculated as the average FROH across all individuals.
--mind, --geno, --maf, --hwe). For each, state what it filters (individuals or SNPs) and what criterion is applied.--mind [threshold]: Filters individuals. Excludes samples with a proportion of missing genotypes exceeding the threshold. Example: --mind 0.1 removes individuals with >10% missing data.
--geno [threshold]: Filters SNPs. Excludes markers with a proportion of missing genotypes exceeding the threshold. Example: --geno 0.1 removes SNPs with >10% missing data (i.e., keeps SNPs with ≥90% call rate).
--maf [threshold]: Filters SNPs. Excludes markers with a minor allele frequency below the threshold. Example: --maf 0.05 removes SNPs with MAF < 0.05 (removes very rare variants or monomorphic markers). Default is 0.01.
--hwe [threshold]: Filters SNPs. Excludes markers whose Hardy-Weinberg equilibrium test p-value falls below the threshold. Example: --hwe 0.01 removes SNPs with HWE p < 0.01 (those significantly deviating from HWE, which may indicate genotyping errors).
Short ROH (e.g., 1–4 Mb): Originate from remote common ancestors. Over many generations, recombination breaks long ancestral haplotypes into smaller fragments. A breed with predominantly short ROH likely experienced background relatedness long ago but has maintained a relatively large effective population size recently.
Long ROH (e.g., >8 Mb or >16 Mb): Indicate recent inbreeding, because few meiotic recombination events have occurred since the common ancestor. A high frequency of long ROH suggests recent bottlenecks, small population sizes, or close mating.
Demographic reconstruction: By plotting the frequency distribution of ROH across length classes, researchers can infer the timing and severity of inbreeding events. A breed with many long ROH has experienced recent, intense inbreeding. A breed with mainly short ROH has ancient background inbreeding but recent outcrossing. Additionally, plotting total ROH coverage (SROH) vs. number of ROH segments per individual helps distinguish populations: many short segments = ancient inbreeding; fewer but longer segments = recent inbreeding.
Remember: columns are FamilyID, IndividualID, PaternalID, MaternalID, Sex (1=M, 2=F), Phenotype, then 2 columns per locus. No header row!
Key details: (1) No header; (2) Unknown parents = 0; (3) Sex: A is female → 2, B and C are male → 1; (4) Each genotype takes 2 columns (one per allele); (5) Individual C has father=B and mother=A (paternal before maternal). Total columns = 6 + 2×5 = 16.