Real-World Applications of Edit Distance

  • Spell Checking and Auto-Correction
  • DNA Sequence Alignment
  • Plagiarism Detection
  • Natural Language Processing
  • Version Control Systems
  • String Matching

https://youtu.be/Thv3TfsZVpw
 



Edit Distance

Given two strings str1 and str2 of length M and N respectively and below operations that can be performed on str1. Find the minimum number of edits (operations) to convert ‘str1‘ into ‘str2‘.  

  • Operation 1 (INSERT): Insert any character before or after any index of str1
  • Operation 2 (REMOVE): Remove a character of str1
  • Operation 3 (Replace): Replace a character at any index of str1 with some other character.

Note: All of the above operations are of equal cost. 

Examples: 

Input:   str1 = “geek”, str2 = “gesek”
Output:  1
Explanation: We can convert str1 into str2 by inserting a ‘s’ between two consecutive ‘e’ in str2.

Input:   str1 = “cat”, str2 = “cut”
Output:  1
Explanation: We can convert str1 into str2 by replacing ‘a’ with ‘u’.

Input:   str1 = “sunday”, str2 = “saturday”
Output:  3
Explanation: Last three and first characters are same.  We basically need to convert “un” to “atur”.  This can be done using below three operations. Replace ‘n’ with ‘r’, insert t, insert a

Illustration of Edit Distance:

Let’s suppose we have str1=”GEEXSFRGEEKKS” and str2=”w3wiki”
Now to convert str1 into str2 we would require 3 minimum operations:
Operation 1: Replace ‘X‘ to ‘K
Operation 2: Insert ‘O‘ between ‘F‘ and ‘R
Operation 3: Remove second last character i.e. ‘K

Refer the below image for better understanding.

Recommended Practice

Similar Reads

Edit Distance using Recursion:

Subproblems in Edit Distance:...

Edit Distance Using Dynamic Programming (Memoization):

In the above recursive approach, there are several overlapping subproblems:Edit_Distance(M-1, N-1) is called Three times Edit_Distance(M-1, N-2) is called Two times Edit_Distance(M-2, N-1) is called Two times. And so on… So, we can use Memoization technique to store the result of each subproblems to avoid recalculating the result again and again. Below are the illustration of overlapping subproblems during the recursion....

Edit Distance Using Dynamic Programming (Bottom-Up Approach):

Use a table to store solutions of subproblems to avoiding recalculate the same subproblems multiple times. By doing this, if same subproblems repeated during, we retrieve the solutions from the table itself....

Edit Distance Using Dynamic Programming (Optimization in Space Complexity):

Optimized Space Complexity Solution: In the above bottom up approach we require O(m x n) space. Let’s take an observation and try to optimize our space complexity: To fill a row in DP array we require only one row i.e. the upper row. For example, if we are filling the row where i=10 in DP array then we require only values of 9th row. So we simply create a DP array of 2 x str1 length. This approach reduces the space complexity from O(N*M) to O(2*N)....

Edit Distance Using Dynamic Programming (Further Optimization in Space Complexity):

As discussed the above approach uses two 1-D arrays, now the question is can we achieve our task by using only a single 1-D array? The answer is Yes and it requires a simple observation as mentioned below:In the previous approach The curr[] array is updated using 3 values only : Value 1: curr[j] = prev[j-1] when str1[i-1] is equal to str2[j-1]Value 2: curr[j] = prev[j] when str1[i-1] is not equal to str2[j-1] Value 3: curr[j] = curr[j-1] when str1[i-1] is not equal to str2[j-1] By keeping the track of these three values we can achiever our task using only a single 1-D array...

Real-World Applications of Edit Distance:

Spell Checking and Auto-CorrectionDNA Sequence AlignmentPlagiarism DetectionNatural Language ProcessingVersion Control SystemsString Matching...

Contact Us