String Metrics and Alignments

1. String Metrics and Alignments

What is a String Metric?

A string metric measures the distance or similarity between two text strings. We use these in bioinformatics to compare DNA, RNA, or protein sequences.

Edit Distance

Edit distances measure how many operations you need to transform one string into another. The operations can be:

Substitution: Replace one character with another

CAT → CUT
 |      
 A → U

Insertion/Deletion (Indel): Add or remove a character

MONKEY → MON-EY (deletion of K, or viewing it differently, insertion of gap)

Transposition: Swap adjacent characters

FORM → FROM

Each operation has a cost. The total cost defines the distance.

Distance vs Similarity

Distance metric: Range is [0, +∞)

Score of 0 means identical strings
Higher scores mean more different strings

Similarity metric: Range is (-∞, +∞)

High positive score means similar strings
Negative score means distant strings

In bioinformatics, we typically work with similarity because we want to find sequences that are alike.

2. Hamming Distance

The simplest edit distance. It only counts substitutions and requires strings of equal length.

Example

ACCCTCGCTAGATAAATAGATCTGATAG
||x||||||||||x|||||x||||x|||
ACTCTCGCTAGATGAATAGGTCTGTTAG

Count the mismatches (marked with x): positions 3, 14, 19, 25

Hamming Distance = 4

Converting to Similarity

For the example above:

Total positions: 28
Matches: 24
Mismatches: 4

Similarity = +24 - 4 = 20

(We add +1 for each match, subtract 1 for each mismatch)

Implementation: AlnSeq Class (Basic)

We start by creating a class to hold two sequences and compute Hamming metrics.

Step 1: Define the class and constructor

class AlnSeq:
    def __init__(self, seq1, seq2):
        if len(seq1) != len(seq2):
            raise ValueError("Sequences must have equal length")
        
        self.seq1 = seq1
        self.seq2 = seq2

Step 2: Add Hamming distance method

    def compute_hamming_distance(self):
        self.distance = 0
        for i in range(len(self.seq1)):
            if self.seq1[i] != self.seq2[i]:
                self.distance += 1

        return self.distance

Step 3: Add Hamming similarity method

    def compute_hamming_similarity(self):
        if getattr(self, 'distance', None) == None:
            self.compute_hamming_distance()
            # +1 * numbers of matchs + (-1 * numbers of mismatchs)
            # It's possible simplify this math experssion, but left for readabilty
        return (len(self.seq1) - self.distance) - self.distance

Usage

aln = AlnSeq("ACCCTCGCTAG", "ACTCTCGCTAG")
print(aln.compute_hamming_distance())    # 1
print(aln.compute_hamming_similarity())  # 9

3. Biologically Relevant Substitution Matrices

Not all substitutions are equal in biology. Some amino acids or nucleotides are chemically similar, so swapping between them is "less bad" than swapping very different ones.

Transition/Transversion Matrix (for DNA)

Transitions are mutations between chemically similar bases:

A ↔ G (both purines)
T ↔ C (both pyrimidines)

Transversions are mutations between different types:

A ↔ T, A ↔ C, G ↔ T, G ↔ C

      A    T    C    G
  A   2   -1   -1    0
  T  -1    2    0   -1
  C  -1    0    2   -1
  G   0   -1   -1    2

Example calculation:

AAAA
|XX/
ATCG

Score = Score(A,A) + Score(A,T) + Score(A,C) + Score(A,G) = 2 + (-1) + (-1) + 0 = 0

PAM Matrices (for Proteins)

PAM = Point Accepted Mutation

PAMn[i,j] gives the likelihood of residue i being replaced by residue j through evolutionary mutations over a time when n mutations occur per 100 residues.

Most commonly used: PAM250

BLOSUM Matrices (for Proteins)

BLOSUMn[i,j] is based on observed substitution frequencies in aligned protein sequences that share n% sequence identity.

Most commonly used: BLOSUM62

Implementation: SubstitutionMatrix Class

A class to store and access substitution matrix values.

Step 1: Define the class with matrix data

class SubstitutionMatrix:
    def __init__(self, matrix_dict):
        # matrix_dict is a nested dictionary
        # matrix_dict['A']['T'] gives score for A->T
        self.matrix = matrix_dict

Step 2: Implement getitem for easy indexing

This lets us use matrix['A', 'T'] syntax.

    def __getitem__(self, key):
        # key is a tuple like ('A', 'T')
        res1, res2 = key
        return self.matrix[res1][res2]

Step 3: Create the Transition/Transversion matrix

# Define the matrix as nested dictionaries
tt_matrix_data = {
    'A': {'A': 2, 'T': -1, 'C': -1, 'G': 0},
    'T': {'A': -1, 'T': 2, 'C': 0, 'G': -1},
    'C': {'A': -1, 'T': 0, 'C': 2, 'G': -1},
    'G': {'A': 0, 'T': -1, 'C': -1, 'G': 2}
}

tt_matrix = SubstitutionMatrix(tt_matrix_data)

Usage

print(tt_matrix['A', 'A'])   # 2  (match)
print(tt_matrix['A', 'G'])   # 0  (transition)
print(tt_matrix['A', 'T'])   # -1 (transversion)

Implementation: Extend AlnSeq with Matrix-Based Similarity

Add a method to compute similarity using any substitution matrix.

class AlnSeq:
    # ... previous methods ...
    
    def hamming_similarity_matrix(self, sub_matrix):
        if len(self.seq1) != len(self.seq2):
            raise ValueError("Sequences must have equal length")
        
        score = 0
        for i in range(len(self.seq1)):
            # Use the substitution matrix for scoring
            score += sub_matrix[self.seq1[i], self.seq2[i]]
        return score

Usage

aln = AlnSeq("AAAA", "ATCG")
print(aln.hamming_similarity_matrix(tt_matrix))  # 2 + (-1) + (-1) + 0 = 0

4. Levenshtein Distance

Unlike Hamming distance, Levenshtein allows insertions and deletions (indels), not just substitutions.

Example

KITTEN → SITTING

Step 1: K → S     (substitution)
Step 2: E → I     (substitution)  
Step 3: insert G  (insertion)

Levenshtein Distance = 3

Levenshtein vs Hamming

For strings of the same length: Hamming Distance ≥ Levenshtein Distance

FLAW    vs    LAWN

Hamming (no gaps allowed):
FLAW
XXXX
LAWN
Hamming Distance = 4

Levenshtein (gaps allowed):
-FLAW
 |||^
-LAWN
Levenshtein Distance = 2 (delete F, insert N)

This shows why we need sequence alignment.

5. Sequence Alignment

Each character of one sequence is matched with:

A character in the other sequence, OR
A gap (-)

After alignment, both gapped strings have the same length.

How Many Possible Alignments?

Simple case (no internal gaps): Just slide sequences past each other

--TCA    -TCA    TCA    TCA    TCA-    TCA--
TA---    TA--    TA-    -TA    --TA    ---TA

Number of alignments = len1 + len2 + 1

Complex case (internal gaps allowed): The number explodes

                    | 1           if len1 = 0
N(len1, len2) =     | 1           if len2 = 0
                    | N(len1-1, len2) + N(len1, len2-1) + N(len1-1, len2-1)  otherwise

Examples:

N(3,3) = 63
N(5,5) = 1,683
N(9,9) = 1,462,563

Brute force is not feasible. We need a smarter approach.

Implementation: Counting Possible Alignments

A recursive function to calculate the number of possible alignments.

def count_alignments(len1, len2):
    # Base cases: if one sequence is empty,
    # only one alignment is possible (all gaps)
    if len1 == 0:
        return 1
    if len2 == 0:
        return 1
    
    # Recursive case: three choices at each position
    return (count_alignments(len1 - 1, len2) +      # gap in seq2
            count_alignments(len1, len2 - 1) +      # gap in seq1
            count_alignments(len1 - 1, len2 - 1))   # match/mismatch

Usage

print(count_alignments(3, 3))   # 63
print(count_alignments(5, 5))   # 1683
print(count_alignments(9, 9))   # 1462563

This function is slow because it recalculates the same subproblems many times. That is exactly why we need dynamic programming.

6. Dynamic Programming

An algorithmic paradigm that works when a problem has:

Optimal Substructure: The optimal solution depends on solutions to smaller sub-problems
Overlapping Subproblems: The same sub-problems are solved multiple times

Key idea: Store solutions to sub-problems and build up to the full solution.

7. Needleman-Wunsch Algorithm (Global Alignment)

Finds the best alignment across the entire length of both sequences.

Setup

Given:

Sequence 1: TCA
Sequence 2: TA
Gap penalty: -2
Substitution scores: Transition/Transversion matrix

Step 1: Initialize the Matrix

Create a matrix with sequence 1 across the top and sequence 2 down the side. Add a gap column/row at the start.

        -     T     C     A
   -    0    -2    -4    -6
   T   -2
   A   -4

First row: Each cell = previous + gap penalty (0, -2, -4, -6) First column: Same logic (0, -2, -4, -6)

Step 2: Fill the Matrix

For each cell (i,j), take the maximum of three options:

Cell[i,j] = max(
    Cell[i-1, j] + gap,           // come from above (gap in seq1)
    Cell[i, j-1] + gap,           // come from left (gap in seq2)  
    Cell[i-1, j-1] + match(i,j)   // come from diagonal (match/mismatch)
)

Filling cell (T,T):

max(-2 + (-2), 0 + match(T,T), -2 + (-2))
= max(-4, 0 + 2, -4)
= max(-4, 2, -4)
= 2

Complete filled matrix:

        -     T     C     A
   -    0    -2    -4    -6
   T   -2     2     0    -2
   A   -4     0     1     2

The bottom-right cell (2) is the optimal alignment score.

Step 3: Backtracking

Start from bottom-right. At each cell, go back the way you came:

Diagonal: match characters
Up: gap in sequence 1
Left: gap in sequence 2

        -     T     C     A
   -    0 ←  -2 ←  -4 ←  -6
        ↑    ↖↑    ↖↑    ↖↑
   T   -2 ←   2  ←  0  ← -2
        ↑    ↖     ↖     ↖
   A   -4 ←   0  ←  1  ← [2]

Path from [2]: diagonal → diagonal → diagonal

Final Alignment:

TCA
|X|
T-A

Score = 2

8. Smith-Waterman Algorithm (Local Alignment)

Finds the best alignment between substrings of the sequences.

Differences from Needleman-Wunsch

Initialization: First row and column are all zeros (no penalty for starting/ending gaps)
Filling: Add a fourth option, zero:

Cell[i,j] = max(
    Cell[i-1, j] + gap,
    Cell[i, j-1] + gap,
    Cell[i-1, j-1] + match(i,j),
    0                              // NEW: discard negative paths
)

Backtracking: Start from the highest value in the matrix (not bottom-right). Stop when you hit zero.

Example

Sequences: TGA and GA, Gap penalty: -2

Initialization:

        -     T     G     A
   -    0     0     0     0
   G    0
   A    0

Filling cell (G,T):

max(0 + (-2), 0 + match(T,G), 0 + (-2), 0)
= max(-2, -1, -2, 0)
= 0

Complete matrix:

        -     T     G     A
   -    0     0     0     0
   G    0     0     2     0
   A    0     0     0    [4]

Backtracking from highest value [4]:

(A,A): came from diagonal, match A-A
(G,G): came from diagonal, match G-G
(G,T): value is 0, STOP

Final Local Alignment:

GA
||
GA

Score = 4

The algorithm found the best matching substring, ignoring the T at the beginning.

Implementation: SeqPair Class (Needleman-Wunsch)

Step 1: Define the class

class SeqPair:
    def __init__(self, seq1, seq2, sub_matrix, gap_penalty=-2):
        self.seq1 = seq1
        self.seq2 = seq2
        self.sub_matrix = sub_matrix
        self.gap = gap_penalty

Step 2: Initialize the scoring matrix

    def _init_matrix(self, rows, cols):
        # Create matrix filled with zeros
        matrix = [[0] * cols for _ in range(rows)]
        return matrix

Step 3: Needleman-Wunsch matrix filling

    def needleman_wunsch(self):
        rows = len(self.seq2) + 1
        cols = len(self.seq1) + 1
        
        # Initialize score matrix
        score = self._init_matrix(rows, cols)
        
        # Initialize traceback matrix
        # 0 = diagonal, 1 = up, 2 = left
        trace = self._init_matrix(rows, cols)
        
        # Fill first row (gaps in seq2)
        for j in range(1, cols):
            score[0][j] = score[0][j-1] + self.gap
            trace[0][j] = 2  # came from left
        
        # Fill first column (gaps in seq1)
        for i in range(1, rows):
            score[i][0] = score[i-1][0] + self.gap
            trace[i][0] = 1  # came from up
        
        # Fill rest of matrix
        for i in range(1, rows):
            for j in range(1, cols):
                # Three options
                diag = score[i-1][j-1] + self.sub_matrix[self.seq1[j-1], self.seq2[i-1]]
                up = score[i-1][j] + self.gap
                left = score[i][j-1] + self.gap
                
                # Take maximum
                score[i][j] = max(diag, up, left)
                
                # Record which direction we came from
                if score[i][j] == diag:
                    trace[i][j] = 0
                elif score[i][j] == up:
                    trace[i][j] = 1
                else:
                    trace[i][j] = 2
        
        self.nw_score = score
        self.nw_trace = trace
        return score[rows-1][cols-1]  # final score

Step 4: Backtracking to get alignment

    def nw_traceback(self):
        aligned1 = ""
        aligned2 = ""
        
        # Start from bottom-right
        i = len(self.seq2)
        j = len(self.seq1)
        
        while i > 0 or j > 0:
            if i > 0 and j > 0 and self.nw_trace[i][j] == 0:
                # Diagonal: match/mismatch
                aligned1 = self.seq1[j-1] + aligned1
                aligned2 = self.seq2[i-1] + aligned2
                i -= 1
                j -= 1
            elif i > 0 and self.nw_trace[i][j] == 1:
                # Up: gap in seq1
                aligned1 = "-" + aligned1
                aligned2 = self.seq2[i-1] + aligned2
                i -= 1
            else:
                # Left: gap in seq2
                aligned1 = self.seq1[j-1] + aligned1
                aligned2 = "-" + aligned2
                j -= 1
        
        return aligned1, aligned2

Usage

seq_pair = SeqPair("TCA", "TA", tt_matrix, gap_penalty=-2)
score = seq_pair.needleman_wunsch()
print(f"Score: {score}")  # Score: 2

aln1, aln2 = seq_pair.nw_traceback()
print(aln1)  # TCA
print(aln2)  # T-A

Implementation: Smith-Waterman (Extend SeqPair)

Add local alignment method

    def smith_waterman(self):
        rows = len(self.seq2) + 1
        cols = len(self.seq1) + 1
        
        score = self._init_matrix(rows, cols)
        trace = self._init_matrix(rows, cols)
        
        # First row and column stay 0 (no initialization needed)
        
        # Track position of maximum score
        max_score = 0
        max_pos = (0, 0)
        
        # Fill matrix
        for i in range(1, rows):
            for j in range(1, cols):
                diag = score[i-1][j-1] + self.sub_matrix[self.seq1[j-1], self.seq2[i-1]]
                up = score[i-1][j] + self.gap
                left = score[i][j-1] + self.gap
                
                # Take maximum, but floor at 0
                score[i][j] = max(diag, up, left, 0)
                
                # Record direction (only if not 0)
                if score[i][j] == 0:
                    trace[i][j] = -1  # stop here
                elif score[i][j] == diag:
                    trace[i][j] = 0
                elif score[i][j] == up:
                    trace[i][j] = 1
                else:
                    trace[i][j] = 2
                
                # Update max position
                if score[i][j] > max_score:
                    max_score = score[i][j]
                    max_pos = (i, j)
        
        self.sw_score = score
        self.sw_trace = trace
        self.sw_max_pos = max_pos
        return max_score

Add local alignment backtracking

    def sw_traceback(self):
        aligned1 = ""
        aligned2 = ""
        
        # Start from maximum position
        i, j = self.sw_max_pos
        
        # Stop when we hit 0
        while i > 0 and j > 0 and self.sw_trace[i][j] != -1:
            if self.sw_trace[i][j] == 0:
                aligned1 = self.seq1[j-1] + aligned1
                aligned2 = self.seq2[i-1] + aligned2
                i -= 1
                j -= 1
            elif self.sw_trace[i][j] == 1:
                aligned1 = "-" + aligned1
                aligned2 = self.seq2[i-1] + aligned2
                i -= 1
            else:
                aligned1 = self.seq1[j-1] + aligned1
                aligned2 = "-" + aligned2
                j -= 1
        
        return aligned1, aligned2

Usage

seq_pair = SeqPair("TGA", "GA", tt_matrix, gap_penalty=-2)
score = seq_pair.smith_waterman()
print(f"Score: {score}")  # Score: 4

aln1, aln2 = seq_pair.sw_traceback()
print(aln1)  # GA
print(aln2)  # GA

9. Global vs Local: When to Use Which

Needleman-Wunsch (Global):

Use when comparing sequences of similar length
Use when you expect the sequences to be related across their entire length
Example: comparing two versions of the same gene

Smith-Waterman (Local):

Use when looking for conserved regions within longer sequences
Use when sequences have very different lengths
Example: finding a motif within a larger protein

10. Implementation Classes

AlnSeq Class

Holds two sequences and computes:

Hamming distance
Hamming similarity
Levenshtein similarity (for pre-aligned sequences)
Custom str for visual output:
```
ACTG
|X|^
AGTG-
```
Where: | = match, X = substitution, ^ = insertion, v = deletion

SubstitutionMatrix Class

Stores the scoring matrix values
Implements getitem for easy access: matrix['A', 'T'] returns the score

SeqPair Class

Computes alignment matrices (Needleman-Wunsch and Smith-Waterman)
Reconstructs alignments through backtracking

Implementation: Extend AlnSeq with Levenshtein and str

Levenshtein similarity for aligned sequences

This assumes sequences are already aligned (may contain gaps).

class AlnSeq:
    # ... previous methods ...
    
    def levenshtein_similarity(self, sub_matrix, gap_penalty=-2):
        # For aligned sequences (with gaps already inserted)
        if len(self.seq1) != len(self.seq2):
            raise ValueError("Aligned sequences must have equal length")
        
        score = 0
        for i in range(len(self.seq1)):
            c1 = self.seq1[i]
            c2 = self.seq2[i]
            
            if c1 == '-' or c2 == '-':
                # Gap penalty
                score += gap_penalty
            else:
                # Use substitution matrix
                score += sub_matrix[c1, c2]
        
        return score

Custom str method for visualization

    def __str__(self):
        if len(self.seq1) != len(self.seq2):
            return f"{self.seq1}\n{self.seq2}"
        
        # Build match string
        match_str = ""
        for i in range(len(self.seq1)):
            c1 = self.seq1[i]
            c2 = self.seq2[i]
            
            if c1 == c2:
                match_str += "|"      # identity
            elif c1 == '-':
                match_str += "^"      # insertion (gap in seq1)
            elif c2 == '-':
                match_str += "v"      # deletion (gap in seq2)
            else:
                match_str += "X"      # substitution
        
        return f"{self.seq1}\n{match_str}\n{self.seq2}"

Usage

aln = AlnSeq("TCA", "T-A")
print(aln)
# Output:
# TCA
# |v|
# T-A

print(aln.levenshtein_similarity(tt_matrix, gap_penalty=-2))  # 2 + (-2) + 2 = 2

Complete AlnSeq Class

class AlnSeq:
    def __init__(self, seq1, seq2):
        self.seq1 = seq1
        self.seq2 = seq2
    
    def hamming_distance(self):
        if len(self.seq1) != len(self.seq2):
            raise ValueError("Sequences must have equal length")
        distance = 0
        for i in range(len(self.seq1)):
            if self.seq1[i] != self.seq2[i]:
                distance += 1
        return distance
    
    def hamming_similarity(self):
        if len(self.seq1) != len(self.seq2):
            raise ValueError("Sequences must have equal length")
        score = 0
        for i in range(len(self.seq1)):
            if self.seq1[i] == self.seq2[i]:
                score += 1
            else:
                score -= 1
        return score
    
    def hamming_similarity_matrix(self, sub_matrix):
        if len(self.seq1) != len(self.seq2):
            raise ValueError("Sequences must have equal length")
        score = 0
        for i in range(len(self.seq1)):
            score += sub_matrix[self.seq1[i], self.seq2[i]]
        return score
    
    def levenshtein_similarity(self, sub_matrix, gap_penalty=-2):
        if len(self.seq1) != len(self.seq2):
            raise ValueError("Aligned sequences must have equal length")
        score = 0
        for i in range(len(self.seq1)):
            c1 = self.seq1[i]
            c2 = self.seq2[i]
            if c1 == '-' or c2 == '-':
                score += gap_penalty
            else:
                score += sub_matrix[c1, c2]
        return score
    
    def __str__(self):
        if len(self.seq1) != len(self.seq2):
            return f"{self.seq1}\n{self.seq2}"
        match_str = ""
        for i in range(len(self.seq1)):
            c1 = self.seq1[i]
            c2 = self.seq2[i]
            if c1 == c2:
                match_str += "|"
            elif c1 == '-':
                match_str += "^"
            elif c2 == '-':
                match_str += "v"
            else:
                match_str += "X"
        return f"{self.seq1}\n{match_str}\n{self.seq2}"

Complete SubstitutionMatrix Class

class SubstitutionMatrix:
    def __init__(self, matrix_dict):
        self.matrix = matrix_dict
    
    def __getitem__(self, key):
        res1, res2 = key
        return self.matrix[res1][res2]

# Transition/Transversion Matrix
tt_matrix = SubstitutionMatrix({
    'A': {'A': 2, 'T': -1, 'C': -1, 'G': 0},
    'T': {'A': -1, 'T': 2, 'C': 0, 'G': -1},
    'C': {'A': -1, 'T': 0, 'C': 2, 'G': -1},
    'G': {'A': 0, 'T': -1, 'C': -1, 'G': 2}
})

Complete SeqPair Class

class SeqPair:
    def __init__(self, seq1, seq2, sub_matrix, gap_penalty=-2):
        self.seq1 = seq1
        self.seq2 = seq2
        self.sub_matrix = sub_matrix
        self.gap = gap_penalty
    
    def _init_matrix(self, rows, cols):
        return [[0] * cols for _ in range(rows)]
    
    # --- Needleman-Wunsch (Global) ---
    def needleman_wunsch(self):
        rows = len(self.seq2) + 1
        cols = len(self.seq1) + 1
        
        score = self._init_matrix(rows, cols)
        trace = self._init_matrix(rows, cols)
        
        for j in range(1, cols):
            score[0][j] = score[0][j-1] + self.gap
            trace[0][j] = 2
        
        for i in range(1, rows):
            score[i][0] = score[i-1][0] + self.gap
            trace[i][0] = 1
        
        for i in range(1, rows):
            for j in range(1, cols):
                diag = score[i-1][j-1] + self.sub_matrix[self.seq1[j-1], self.seq2[i-1]]
                up = score[i-1][j] + self.gap
                left = score[i][j-1] + self.gap
                
                score[i][j] = max(diag, up, left)
                
                if score[i][j] == diag:
                    trace[i][j] = 0
                elif score[i][j] == up:
                    trace[i][j] = 1
                else:
                    trace[i][j] = 2
        
        self.nw_score = score
        self.nw_trace = trace
        return score[rows-1][cols-1]
    
    def nw_traceback(self):
        aligned1 = ""
        aligned2 = ""
        i = len(self.seq2)
        j = len(self.seq1)
        
        while i > 0 or j > 0:
            if i > 0 and j > 0 and self.nw_trace[i][j] == 0:
                aligned1 = self.seq1[j-1] + aligned1
                aligned2 = self.seq2[i-1] + aligned2
                i -= 1
                j -= 1
            elif i > 0 and self.nw_trace[i][j] == 1:
                aligned1 = "-" + aligned1
                aligned2 = self.seq2[i-1] + aligned2
                i -= 1
            else:
                aligned1 = self.seq1[j-1] + aligned1
                aligned2 = "-" + aligned2
                j -= 1
        
        return aligned1, aligned2
    
    # --- Smith-Waterman (Local) ---
    def smith_waterman(self):
        rows = len(self.seq2) + 1
        cols = len(self.seq1) + 1
        
        score = self._init_matrix(rows, cols)
        trace = self._init_matrix(rows, cols)
        
        max_score = 0
        max_pos = (0, 0)
        
        for i in range(1, rows):
            for j in range(1, cols):
                diag = score[i-1][j-1] + self.sub_matrix[self.seq1[j-1], self.seq2[i-1]]
                up = score[i-1][j] + self.gap
                left = score[i][j-1] + self.gap
                
                score[i][j] = max(diag, up, left, 0)
                
                if score[i][j] == 0:
                    trace[i][j] = -1
                elif score[i][j] == diag:
                    trace[i][j] = 0
                elif score[i][j] == up:
                    trace[i][j] = 1
                else:
                    trace[i][j] = 2
                
                if score[i][j] > max_score:
                    max_score = score[i][j]
                    max_pos = (i, j)
        
        self.sw_score = score
        self.sw_trace = trace
        self.sw_max_pos = max_pos
        return max_score
    
    def sw_traceback(self):
        aligned1 = ""
        aligned2 = ""
        i, j = self.sw_max_pos
        
        while i > 0 and j > 0 and self.sw_trace[i][j] != -1:
            if self.sw_trace[i][j] == 0:
                aligned1 = self.seq1[j-1] + aligned1
                aligned2 = self.seq2[i-1] + aligned2
                i -= 1
                j -= 1
            elif self.sw_trace[i][j] == 1:
                aligned1 = "-" + aligned1
                aligned2 = self.seq2[i-1] + aligned2
                i -= 1
            else:
                aligned1 = self.seq1[j-1] + aligned1
                aligned2 = "-" + aligned2
                j -= 1
        
        return aligned1, aligned2

Bioinformatics Forever